Auto-GPT is a command line application for getting a high level description of a goal then breaking it up into sub tasks. This works by taking in the initial text from the user and basing the data to an LLM. Based upon this data, a command will be executed depending on what we ask. This ranges from browsing websites to writing files to executing Python code.
The authors took the direction of seeing if incoming input from other mediums besides the users text could be a security threat. So, they focused on browse_website and other functions along these ideas. One idea would be to force a sponsored result to return tainted data that could act as malicious input to the system.
When grabbing data from the website, it was passed back into the LLM. So, the data being returned back to the user had to be part of the response from the LLM. TO get around this, they found that data included in a hyperlink was directly included in the output and they used more prompt injection to return arbitrary data as well.
Once there, they wanted to convince Auto-GPT to execute arbitrary code. They wanted to make the code as small as possible to ensure that no rewrites happened. Their plan was to use requests to eval a script from the internet. Auto-GPT saw a security issue with this so they used some misdirection with curl to trick the program to thinking that the usage of eval was safe in this case. This level of code execution was within Auto-GPT though.
Their goal was to get code execution within the Docker container and not the LLM. They found multiple command that made this trivial: write_to_file and execute_shell were easy to do. There is a catch though: many of these commands require a confirmation from the user.
The authors found that ANSI escape sequences would be rendering in the terminal. This could have been used to spoof model statements, which is a pretty awesome bug. At this point, even with code execution, we are still within the container though.
The docker file (docker-compose.yml) mounts itself into the container. Because of this, an attacker can write to this in order to escape the container on the next call. There is an additional setup where the python code is executed within a clean docker container with no issues. However, execute_python_code has a directory traversal vulnerability that allows for the modification of python scripts from outside the directory.
Overall, a super interesting post that dives into the future. Multi-layer prompt injection to get access to dangerous functionality then abusing this functionality to get code execution. Pretty neat!