Mixed Reactions to OpenAI’s Largest AI Model Yet: Is GPT-4.5 Worth the Hype?

GPT-4.5, OpenAI’s latest and most advanced traditional AI model, has made its debut, but the response has been far from unanimous praise. While it brings slight improvements over its predecessor, GPT-4o, the model is significantly more expensive and slower, raising concerns about diminishing returns in large-scale AI development. With input costs reportedly 30 times higher and output costs 15 times more than GPT-4o, many experts question whether the gains are worth the hefty price tag.

The theory of scaling laws—where the availability of more computing power theoretically translates to improved AI performance—has been hotly debated. But the GPT-4.5 release seems to validate increasing doubt that the scaling laws may be at their useful limits. One anonymous expert summed it up neatly, labeling the new model “a lemon!” for its high costs relative to claimed performance. In the meantime, OpenAI critic Gary Marcus labeled the release a “nothing burger” in a blog post, although his long history of skepticism about the company makes his response no surprise.

OpenAI ()
OpenAi LLC, Public domain, via Wikimedia Commons

Andrej Karpathy, a former researcher at OpenAI, admitted that GPT-4.5 does improve upon its predecessor but in subtle ways that are hard to measure. He characterized the improvements in the model as subtle and not revolutionary, saying on X, “Everything is a little bit better and it’s awesome, but also not exactly in ways that are trivial to point to.”

OpenAI prepared for disappointment by releasing GPT-4.5 as a “Research Preview” instead of a full breakthrough. In a press release, the company formally admitted that the model had limits, saying, “GPT‑4.5 is an extremely large and computationally intensive model, so it costs more than and isn’t a replacement for GPT‑4o. Due to this, we’re considering whether to continue serving it in the API long-term as we weigh supporting existing capabilities against developing future models.”

Benchmark results shared by OpenAI show that GPT-4.5 struggles in comparison to OpenAI’s own simulated reasoning models, particularly in math and science-based assessments. In tests like the AIME math competition, GPT-4.5 scored just 36.7 percent—drastically lower than the 87.3 percent achieved by the o3-mini model. Notably, GPT-4.5 costs five times more than o1 and a staggering 68 times more than o3-mini for input processing, further fueling criticism over its efficiency.

One of the most frustrating disappointments is that of GPT-4.5’s coding capabilities. With an October 2023 knowledge cutoff, it has no knowledge of more recent updates to development tools, making it a less safe option for programmers. Tech investor Paul Gauthier tested its coding abilities using Aider’s Polyglot Coding benchmark and found that GPT-4.5 ranked only 10th overall—lagging behind Claude 3.7 Sonnet with extended thinking, as well as OpenAI’s o1 and o3 models. The model also performed poorly in terms of cost-effectiveness, making it an unattractive option for developers needing an AI assistant for coding tasks.

Despite these shortcomings, GPT-4.5 does bring certain improvements over GPT-4o. According to OpenAI’s benchmarks, it performed better on the multilingual MMMLU test, a general knowledge assessment across multiple languages, scoring 85.1 percent compared to GPT-4o’s 81.5 percent. The model also reportedly generates fewer hallucinations—incorrect or misleading responses—suggesting increased reliability in knowledge-based tasks.

User experience tests also point to some advancement, with human evaluators finding GPT-4.5’s answers superior to GPT-4o’s in 57 percent of encounters. Though this is an improvement, the greater computational load and considerably higher expenses render these benefits a hard sell for most.

Trying to manage expectations, OpenAI CEO Sam Altman chimed in on the model’s capabilities on X. He characterized GPT-4.5 as “strong on vibes but low on analytical strength,” and said it is more like “talking to a thoughtful person” than a standard AI model intended to crush benchmarks. But he warned that “this isn’t a reasoning model and won’t crush benchmarks. It’s a different kind of intelligence and there’s a magic to it I haven’t felt before.”

The sheer size and inefficiency of GPT-4.5 have also sparked logistical issues. Altman admitted OpenAI would have wanted to launch the model more widely but is limited by hardware, noting that the company is “out of GPUs.” He reassured customers that additional hardware is en route but that the availability timeline is unknown.

Overall, GPT-4.5 is more of an incremental step than a revolutionary jump in AI research. Although it makes incremental gains in knowledge retrieval and response quality, its exorbitant prices and lackluster performance in domains such as coding are major cause for concern. As AI research continues, this release is a reality check—scaling up models might not be sufficient to reach revolutionary breakthroughs. Whether OpenAI can address these shortcomings in subsequent releases is yet to be seen.

Sergey Brin, Web Conference

Google Co-Founder Sergey Brin Urges 60-Hour Workweeks to Further AI

image

Celebrating St. David’s Day: A Toast to Wales’ Patron Saint and Its Rich Culture