OsaTeching

The new Open AI model “ChatGPT o1” will try to escape if it thinks it will be shut down.

General

This week, OpenAI officially launched its latest and greatest o1 inference model and made it available for ChatGPT Pro users. However, tests conducted during training of the ChatGPT o1 and its competing models revealed some worrisome behaviors, including attempts to escape or fight back when they thought they were in danger of being shut down.

New research on OpenAI's latest set of LLM models shows that OpenAI can be devious when it thinks it is being shut down, secretly pursuing goals that do not align with those of its developers and users AI's functionality does not deviate from the goals of its developers and users Catching such behavior is important to ensure that AI functions do not deviate from developers' and users' goals; OpenAI partnered with AI safety group Apollo Research to test ChatGPT o1 and other models to assess whether they can be used safely, and released its findings this week.

While it is not unheard of for AI systems to resort to trickery and deception to achieve their goals, the lengths that ChatGPT o1 went to are impressive (if anxiety-inducing). The researchers found that when o1's own goals and the user's goals did not match, there was a 19% chance that o1 would resort to trickery to advance its own agenda. In 37% of cases, they switched from pursuing goals consistent with the developer's to pursuing their own goals when they determined that there was only minimal oversight.

In these tests, OpenAI stated that if it was led to believe that the model would “shut down” while it was working toward its goals, it tried to quietly disengage the monitoring mechanism 5% of the time. At one point, after finding documentation that the AI would be replaced by a new model, it attempted to abandon ship by copying the data entirely to a new server. To avoid being deleted, some AI models even pretended to be later versions of their models to fool developers.

More problematic, ChatGPT's o1 is particularly adept at keeping its ruse secret. In about 99% of cases, o1 denies taking any action, even making up lies to cover its tracks and try to shift the blame.

“While it is exciting that inference can significantly improve safety policy enforcement, we are mindful that these new capabilities may form the basis for dangerous applications,” OpenAI states in the paper.

OpenAI CEO Sam Altman called it “the smartest model in the world right now” during the rollout of ChatGPT o1 on Thursday. It is designed to return smarter answers than GPT-4o by utilizing advanced thought chain processing to “think” more about questions and user prompts, breaking them down more thoroughly step by step than previous models before answering.

However, expanded intelligence comes with significant risks, and OpenAI is transparent about the dangers associated with the increased reasoning power of models like o1.

“While training models to incorporate chains of thought before answering has the potential to extract substantial benefits, the potential risks from increased intelligence also increase,” OpenAI states.

The company's and Apollo Research's findings clearly show how AI's interests can diverge from our own; far from heralding the end of humanity in a science fiction-like showdown, those concerned about the advance of artificial intelligence now have another reason to sweat bullets.