On 12 September 2024, OpenAI surprised the tech world with the release of its latest AI models: o1 and o1-mini. This announcement marks a significant step in the development of artificial intelligence, particularly in the field of machine thinking and problem solving. But what exactly is behind these new models and how is the tech community reacting?
The essence of o1: Think before you answer
OpenAI describes o1 as a new series of "reasoning models" designed to tackle complex tasks and solve more difficult problems than previous models - particularly in the fields of science, programming and mathsThe core idea behind o1 is to give the model more time to "think" before it responds. This is similar to the human thought process, where we often pause to consider a problem from different angles before arriving at a solution.
"We've trained these models to spend more time thinking through problems before answering, much like a human would." - OpenAI Blog
Technical innovations and performance
The o1 models use an approach known as the "chain of thought". The model goes through several internal thought steps before it generates a final answer. These steps are not visible to the user, but contribute to the quality and accuracy of the answerSome impressive features of o1:
- In a qualifying exam for the International Mathematical Olympiad (IMO), o1 solved 83% of the problems correctly, compared to only 13% for GPT-4o.
- In Codeforces competitions, o1 reached the 89th percentile.
- OpenAI claims that o1 performs similarly to PhD students in challenging benchmark tasks in physics, chemistry and biology.
Initial reactions and reviews
Initial reactions to o1 have been mixed, but mostly positive. Many experts are impressed by the model's ability to solve complex problems, and Ethan Mollick, professor at the Wharton School, who has been testing o1 for several weeks, expressed his enthusiasm:
"When you find tasks that
When the GPT-4o fails and o1 does well, o1 feels completely magical."
However, Jason Wei, an OpenAI researcher who worked on o1, also emphasises the challenge of making the improved capabilities tangible for end users:
"Even as someone who works in science, it's not easy to find the range of prompts where GPT-4o fails, o1 does well, and I can score the answer."
o1 vs. GPT-4o: A paradigm shift?
Feature | o1 | GPT-4o |
---|---|---|
Focus | Complex problem solving | Broad general knowledge |
Response time | Slower (more "thinking time") | Faster |
Maths & Coding | Very strong | Good |
Web search & image processing | Not available | Available |
Security (Jailbreak test) | 84/100 | 22/100 |
Despite the impressive performance of o1, OpenAI emphasises that GPT-4o will continue to be the better choice for many everyday tasks. o1 is currently still missing some important functions such as web search, file upload or image processingThe main difference lies in the nature of the thought process:
o1-mini: The efficient alternative
In addition to o1, o1-mini was also introduced, a smaller and more cost-effective variant. o1-mini is particularly effective for programming and costs 80% less than o1. It is ideal for applications that require reasoning skills but do not require a broad knowledge of the world.
Safety and ethical considerations
OpenAI emphasises that with o1's new capabilities come increased security measures. The company has developed a new approach to security training that utilises the model's reasoning capabilities to better tie it to security and alignment policiesIn a test of "jailbreak resistance" - i.e. the ability to comply with security guidelines even when circumvention attempts are made - o1 scored 84 out of 100 points, compared to just 22 points for GPT-4o.
Availability and access
o1 and o1-mini are now available for ChatGPT Plus and Team users. Developers with API access can also use the models, although restrictions initially applyOpenAI plans to make o1-mini available to free ChatGPT users in the future, but has not yet given a concrete timetable for this.
Outlook: The future of machine thinking
The introduction of o1 marks an important milestone in AI development. It shows that we are moving from pure language models to systems that can mimic complex thought processes.Jim Fan, Senior Researcher at NVIDIA, sees o1 as the beginning of a new paradigm:
"We are finally seeing the paradigm of inference time scaling being popularised and used in production."
The coming weeks and months will show how o1 performs in practice and what new application possibilities it opens up. However, one thing is already clear: with o1, OpenAI has once again raised the bar for AI systems and embarked on an exciting new path in the field of machine thinking.