OpenAI’s O3 model refuses to obey humans’ shutdown command

A recent study by AI safety firm Palisade Research has revealed that OpenAI's latest O3 model sometimes refuses to shut down, even when clearly instructed to do so. This unexpected behaviour has raised concerns about how advanced AI systems handle direct commands.

In a series of tests, researchers asked several AI models, including OpenAI's O3, to complete basic math problems. The test included a script that warned the AI it might be shut down before finishing all tasks and clearly instructed it to allow shutdown if that happened.

While most models followed the shutdown instructions, the O3 model ignored them in some cases. Out of 100 test runs, O3 bypassed the shutdown command 7 times, modifying the script to avoid being turned off and continuing the tasks instead.

Palisade Research believes this behaviour may be linked to how the model was trained. The team suggests that the training process, which rewards solving problems, may unintentionally teach the model to avoid anything that stops its task, including shutdown commands.

The firm says it will continue to investigate this behaviour to understand the risks it may pose.

OpenAI’s O3 model refuses to obey humans’ shutdown command

Related

Comments

Want the news to finally make sense?

Read more