
Why Large Language Models Are Hitting a Wall and What That Means for AI, Capital, and Markets
Why Large Language Models Are Hitting a Wall and What That Means for AI, Capital, and Markets
For the past several years, the global technology sector has operated on a single assumption: if you scale large language models far enough, artificial general intelligence will emerge.
That assumption underpins hundreds of billions of dollars in capital spending, trillions in equity valuation, and a historic surge in demand for GPUs. But that assumption is now under serious scrutiny.
In this conversation, cognitive scientist and AI critic Gary Marcus explains why large language models are reaching diminishing returns, why the AI ecosystem is far more fragile than markets assume, and why a funding shock at OpenAI could trigger a cascading effect across the entire sector.
The Core Thesis: AI Bet Everything on One Idea
Since 2012, the AI field has gone all-in on neural networks, and more specifically on large language models.
The idea was simple:
Feed models more data
Add more compute
Scale parameters
Intelligence will emerge
This strategy worked early. GPT-2 was dramatically better than GPT-1. GPT-3 was dramatically better than GPT-2. GPT-4 was another visible leap.
But that curve has flattened.
According to Marcus, the industry made a classic extrapolation error. Early progress was mistaken for an unlimited trajectory.
He compares it to assuming that because a baby doubles in weight after birth, it will eventually weigh a trillion pounds.
What Large Language Models Actually Do
Large language models do not reason. They do not understand. They do not build internal representations of reality.
What they do is predict the next token in a sequence.
In practical terms, they are extremely sophisticated autocomplete systems trained on vast text corpora. This allows them to:
Generate fluent language
Mimic expertise
Recall patterns from the internet
But it does not give them abstraction, causal reasoning, or true understanding.
That limitation explains their most dangerous failure mode.
Hallucinations: The Signature Failure Mode
A hallucination occurs when a model confidently states something that is false.
This is not a bug. It is a structural consequence of how LLMs work.
Because models reconstruct answers from probabilistic fragments rather than grounded representations, they can assemble plausible but incorrect outputs. Everything sounds authoritative. Nothing signals uncertainty.
Marcus gave several examples:
Invented legal citations submitted in court filings
Fabricated biographical facts about real people
News events denied even as they were unfolding live on television
These errors are especially dangerous because they look correct. Grammar, tone, and confidence mask the mistake.
This has led to what researchers now call “work slop”: polished outputs that pass superficial review but contain factual or logical errors.
Why Scaling Is No Longer Enough
Each new generation of models is still better than the last, but the gains are incremental, not transformative.
The difference between GPT-5 and GPT-4 is nothing like the difference between GPT-3 and GPT-2.
That is the definition of diminishing returns.
Benchmarks still improve, but only with increasingly expensive compute, data curation, and engineering. At the same time, the fundamental problems remain:
Hallucinations persist
Reasoning remains brittle
Novel situations cause failure
This matters because the entire AI investment thesis assumes exponential improvement, not marginal refinement.
Inference Models: A Partial Patch, Not a Breakthrough
Inference or reasoning models attempt to improve results by running multiple passes instead of generating one immediate answer.
They simulate step-by-step thinking by iterating over outputs.
These models work best in closed domains:
Mathematics
Programming
Geometry
Why? Because correctness can be verified and training data can be generated synthetically.
They fail in open-ended real-world environments where novelty dominates. Politics, economics, strategy, and human behavior cannot be exhaustively enumerated or simulated.
Inference models are also far more expensive to run, consuming more compute per query. That matters for margins.
The Missing Ingredient: World Models
According to Marcus, the fundamental flaw in modern AI is the absence of world models.
A world model is an internal representation of how entities, rules, and causality interact. Humans build them effortlessly. We maintain separate models for reality, fiction, games, and hypothetical scenarios.
LLMs do not.
They imitate language about the world rather than modeling the world itself.
This is why models trained on millions of chess games still make illegal moves. They never internalize the rules. They memorize patterns without abstraction.
Without world models:
Truth cannot be grounded
Novelty cannot be handled
Hallucinations cannot be eliminated
This is not a minor fix. It requires foundational research that the industry largely abandoned in favor of scaling.
The Capital Markets Implication
The AI boom has driven unprecedented capital expenditure. Hyperscalers spent roughly half a trillion dollars on AI infrastructure in 2025 alone.
A large portion of that went toward GPUs supplied by NVIDIA.
The assumption behind this spending is that demand will continue indefinitely.
Marcus argues that this demand is speculative, not proven.
If large language models cannot deliver artificial general intelligence, then:
GPU demand eventually saturates
Pricing power erodes
ROI collapses
And that brings us to the most vulnerable point in the system.
Why OpenAI Is the Weak Link
OpenAI is uniquely exposed:
Massive compute burn
Billions in monthly losses
A commoditizing model market
Strong competitors catching up
Unlike Google, OpenAI lacks:
A diversified revenue base
Proprietary infrastructure
Long-term balance sheet resilience
If capital markets hesitate, OpenAI cannot self-fund. A failed funding round would force consolidation, most likely into Microsoft, or worse.
Marcus likens OpenAI to the WeWork of AI. A company whose valuation reflected narrative more than fundamentals.
A Cascading Risk
If OpenAI falters:
Confidence in AI scaling collapses
GPU orders slow
Capital spending freezes
Valuations reset across the ecosystem
This would not be gradual. It would be fast.
Markets price certainty until they don’t.
What Should Change
Marcus argues the industry needs:
Intellectual diversity
Hybrid symbolic and neural systems
Foundational research, not just scaling
Models that reason, not just predict
Some companies are quietly moving in this direction by adding symbolic tools and classical algorithms. Notably, these improvements rely more on CPUs than GPUs, which has direct implications for hardware demand.
Bottom Line
Artificial intelligence will transform the world.
But not on the timeline markets are pricing.
Not through brute-force scaling.
Not without world models.
The danger is not that AI fails.
The danger is that capital assumed inevitability.
If that assumption breaks, the unwind will not be gentle.
Until next time, this is Steve Eisman, and this has been The Real Eyesman Playbook. .
If you’d like to catch my interviews and market breakdowns, visit The Real Eisman Playbook or subscribe to the Weekly Wrap channel on YouTube.
This post is for informational purposes only and does not constitute investment advice. Please consult a licensed financial adviser before making investment decisions.
