OpenAI unveils new ChatGPT that can reason through math and science
Online chatbots like ChatGPT from OpenAI and Gemini from Google sometimes struggle with simple math problems. The computer code they produce is often buggy and incomplete. From time to time, they even stuff up.
On Thursday, OpenAI unveiled a new version of ChatGPT that may address these flaws. The company said the chatbot, underpinned by a new artificial intelligence technology called OpenAI o1, can “reason” through tasks involving math, coding and science.
“With previous models like ChatGPT, you ask them a question and they immediately start answering,” said Jakub Pachocki, chief scientist at OpenAI. “This model may take its time. It can think through the problem — in English — and try to break it down and look for angles in an effort to come up with the best answer.”
In a demonstration for The New York Times, OpenAI Technical Fellows Dr. Pachoki and Szymon Sidor chatbot solve an acrostic, a type of word puzzle that is significantly more complex than a simple crossword puzzle. The chatbot also answered a PhD-level chemistry question and diagnosed an illness based on a detailed report on the patient's symptoms and history.
The new technology is part of a larger effort to build AI that can reason through complex tasks. Companies like Google and Meta are developing similar technologies, while Microsoft and its subsidiary GitHub are working to incorporate OpenAI's new system into their products.
The goal is to create systems that can carefully and logically solve a problem through a series of discrete steps, each building on the next, similar to human reasoning. These technologies can be particularly useful for computer programmers who use AI systems to write code. They can also develop automated tutors for math and other subjects.
OpenAI says its new technology can help physicists create complex mathematical formulas and healthcare researchers in their experiments.
With the debut of ChatGPT in late 2022, OpenAI has shown that machines can handle human-like requests, answer questions, write term papers, and even generate computer code. But the response was sometimes flawed.
ChatGPT learned its skills by analyzing large amounts of text across the Internet, including Wikipedia articles, books, and chat logs. By identifying all text patterns, it learns to generate text by itself.
(The New York Times sued OpenAI and Microsoft in December for copyright infringement of news content about AI systems.)
Because the Internet is full of false information, technology has learned to repeat the same falsehoods. Sometimes, it makes things.
Dr. Pachocki, Mr. Sidor and their colleagues have tried to minimize those errors. They developed OpenAI's new system called Reinforcement Learning. Through this process – which can extend over weeks or months – a system can learn behavior through extensive trial and error.
By working through different math problems, for example, learning which methods lead to correct answers and which do not. If it repeats this process with a large number of problems, it can identify patterns. But systems may not necessarily reason like humans. And it can still make mistakes and hallucinate.
“It's not going to be perfect,” Mr. Sidor said. “But you can trust it will work harder and be more likely to produce the right answer.”
Access to the new technology began Thursday for consumers and businesses subscribing to the company's ChatGPT Plus and ChatGPT Team services. The company is selling the technology to software developers and building their own AI applications.
OpenAI said the new technology performed better than previous technologies in some standardized tests. Its previous technology scored 13 percent in the qualifying test for the International Mathematical Olympiad, or IMO — the premier math competition for high school students. OpenAI o1, the company said, scored 83 percent.
Still, standardized tests aren't always a good judge of how technologies will perform in real-world situations, and while the system may be good at math test questions, it may still struggle to teach math.
“There's a difference between problem solving and support,” said Angela Fan, a research scientist at Mater. “The new model may solve that problem. But that's very different from helping someone with their homework.”