Almost eight months after ChatGPT took the world by storm, its performance seems to be declining. A study by Stanford University and UC Berkley researchers, published in July 2023, claims a significant decline in ChatGPT responses.
A group of researchers from Stanford University and UC Berkeley investigated the extent of the degradation and quantified the scale of the negative changes. The decline in ChatGPT quality was confirmed; it was not a perception.
Matei Zaharia, Lingjiao Chen, and James Zou, from Stanford University and UC Berkley, worked together on a research project. They recently shared their findings on Twitter. Professor Zaharia, who teaches Computer Science at UC Berkeley, posted about a surprising discovery:
Researchers gave 500 problems to the model, which had to determine if an integer was prime. The outcomes were problematic. The March model correctly solved 488 problems, while the June model managed only 12 correct responses. This indicates a significant decline in accuracy, plummeting from an impressive 97.6% to a disappointing 2.4%!
The researchers conducted additional investigations to assess the qualitative aspects of ChatGPT’s underlying large language models GPT-4 and GPT-3.5. They tested the model in 4 areas:
- Writing code
- Math problems
- Visual reasoning
- Sensitive questions
Over time, the “same” large language model (LLM) service provides varying responses to queries, showing significant differences even within a short period.
OpenAI frequently updates the model, but their methods to assess its progress or regress are unclear. They might be using multiple, more specialized, and smaller GPT-4 models that replicate the functions of a large model.
When a user submits a query, the system selects the most suitable model to handle the request. While this approach is more cost-effective and faster, it raises concerns about whether it could contribute to the decline in output quality.
This is a worrisome situation because AI is quickly integrated into every aspect of our lives. Glitches like this can have significant adverse effects.