What if AI Doesn’t Get Much Better Than This?
In the years since ChatGPT’s launch in late 2022, it’s been hard not to get swept up in feelings of euphoria or dread about the looming impacts of generative AI. This reaction has been fueled, in part, by the confident declarations of tech CEOs, who have veered toward increasingly bombastic rhetoric.
“AI is starting to get better than humans at almost all intellectual tasks,” Anthropic CEO Dario Amodei recently told Anderson Cooper. He added that half of entry-level white collar jobs might be “wiped out” in the next one to five years, creating unemployment levels as high as 20%—a peak last seen during the Great Depression.
Meanwhile, OpenAI’s Sam Altman said that AI can now rival the abilities of a job seeker with a PhD, leading one publication to plaintively ask, “So what’s left for grads?”
Not to be outdone, Mark Zuckerberg claimed that superintelligence is “now in sight.” (His shareholders hope he’s right, as he’s reportedly offering compensation packages worth up to $300 million to lure top AI talent to Meta.)
But then, two weeks ago, OpenAI finally released its long-awaited GPT-5, a large language model that many had hoped would offer leaps in capabilities, comparable to the head-turning advancements introduced by previous major releases, such as GPT-3 and GPT-4. But the resulting product seemed to be just fine.
GPT-5 was marginally better than previous models in certain use cases, but worse in others. It had some nice new usability updates, but others that some found annoying. (Within days, more than 4,000 ChatGPT users signed a change.org petition asking OpenAI to make their previous model, GPT-4o, available again, as they preferred it to the new release.) An early YouTube reviewer concluded that GPT-5 was a product that “was hard to complain about,” which is the type of thing you’d say about the iPhone 16, not a generation-defining technology. AI commentator Gary Marcus, who had been predicting this outcome for years, summed up his early impressions succinctly when he called GPT-5 “overdue, overhyped, and underwhelming.”
This all points to a critical question that, until recently, few would have considered: Is it possible that the AI we are currently using is basically as good as it’s going to be for a while?
In my most recent article for The New Yorker, which came out last week, I sought to answer this question. In doing so, I ended up reporting on a technical narrative that’s not widely understood outside of the AI community. The breakthrough performance of the GPT-3 and GPT-4 language models was due to improvements in a process called pretraining, in which a model digests an astonishingly large amount of text, effectively teaching itself to become smarter. Both of these models’ acclaimed improvements were caused by increasing their size as well as the amount of text on which they were pretrained.
At some point after GPT-4’s release, however, the AI companies began to realize that this approach was no longer as effective as it once was. They continued to scale up model size and training intensity, but saw diminishing returns in capability gains.
In response, starting around last fall, these companies turned their attention to post-training techniques, a form of training that takes a model that has already been pretrained and then refines it to do better on specific types of tasks. This allowed AI companies to continue to report progress on their products’ capabilities, but these new improvements were now much more focused than before.
Here’s how I explained this shift in my article:
“A useful metaphor here is a car. Pre-training can be said to produce the vehicle; post-training soups it up. [AI researchers had] predicted that as you expand the pre-training process you increase the power of the cars you produce; if GPT-3 was a sedan, GPT-4 was a sports car. Once this progression faltered, however, the industry turned its attention to helping the cars that they’d already built to perform better.”
The result was a confusing series of inscrutably named models—o1, o3-mini, o3-mini-high, -4-mini-high—each with bespoke post-training upgrades. These models boasted widely-publicized increases on specific benchmarks, but no longer the large leaps in practical capabilities we once expected. “I don’t hear a lot of companies using AI saying that 2025 models are a lot more useful to them than 2024 models, even though the 2025 models perform better on benchmarks,” Gary Marcus told me.
The post-training approach, it seems, can lead to incrementally better products, but not the continued large leaps in ability that would be necessary to fulfill the tech CEO’s more outlandish predictions.
None of this, of course, implies that generative AI tools are worthless. They can be very cool, especially when used to help with computer programming (though maybe not as much as some thought), or to conduct smart searches, or to power custom tools for making sense of large quantities of text. But this paints a very different picture from one in which AI is “better than humans at almost all intellectual tasks.”
For more details on this narrative, including a concrete prediction for what to actually expect from this technology in the near future, read the full article. But in the meantime, I think it’s safe, at least for now, to turn your attention away from the tech titans’ increasingly hyperbolic claims and focus instead on things that matter more in your life.
The post What if AI Doesn’t Get Much Better Than This? appeared first on Cal Newport.

Cal Newport's Blog
- Cal Newport's profile
- 9836 followers

It seems increasingly likely that LLMs will keep advancing step by step, but without the kind of sudden, revolutionary 5x leaps. In this moment it seems more logical that evolution will resemble that of for example Google search and the iPhone over the last decade: steady improvements, but rarely breakthroughs that feel world-altering.
As for the CEOs, there clearly are a lot of biases present. Most fundamental is the role of incentives, their rhetoric is indeed shaped by it. The great Charlie Munger once said, “Show me the incentive and I will show you the outcome.” In today’s climate, overspending on AI and producing something underwhelming will probably be forgiven; underspending and missing a genuine breakthrough would leave you in the history books as a fool. Small wonder their public messaging trends toward the maximum bullish outcome.
Layered on top of this is the standard human tendency to overreact to extremes. Robert Shiller made a similar point when contrasting actual corporate earnings with market reactions—illustrating how easily narratives and expectations can run ahead of fundamentals; there is little doubt there is a "bubble" of AI-optimism, at least in the short term.
Again, I appreciated this piece—as well as your books.
All the best,
Martin