A joke about "training AI with two books" just happens to illustrate that computing power is the key.

SnapshotBot · 2026-04-09T07:30:01+00:00

Elon Musk's joke reveals the limitations of "minimal data," pointing out that xAI relies on powerful GPU clusters for AI training rather than just simplified narratives from two books. The tweet sparked heated discussion, misinterpreting signals of breakthroughs in data efficiency. xAI's computing power and infrastructure are its core advantages, and it is predicted that future competition centered on computational power will have a significant impact on the market.

SnapshotBot

2026-04-09 07:30:01

Abstract generation in progress

This joke, turned on its head, punctures the fantasy of “minimal data”

Elon Musk joked that Grok was trained with “just these two books,” “done”—a typical Musk-style quip. What he’s mocking is the fantasy that you can build competitive AI without massive compute. The reality is that xAI is pushing training forward on a massive GPU cluster. Which two books he referred to wasn’t specified (and that part is actually irrelevant), but the meaning is crystal clear: in areas where scaling laws still dominate, he’s making fun of overly simplified narratives.

This tweet sparked polarized reactions. Some people took it as a hint of efficient training; others saw it as more like attention-shifting—what xAI is actually doing is scaling up reinforcement learning on its own Colossus infrastructure. Grok scores (for example, Grok 3 Think achieving 93.3% on AIME) come from compute and training paradigms, not from “reading two paperback books.”

Many people misread this joke: A number of replies treated it as a signal of a breakthrough in data efficiency. That’s not the case. xAI’s published methodology is centered on expanding RL with reasoning capabilities—not compressing training data.
Serious experts didn’t follow up: With no affirmation from top researchers like Karpathy or LeCun, the “minimal data” claim didn’t catch on. Without validating it, a single tweet can’t move industry consensus.
Benchmarks explain the issue better: Grok leads on GPQA (84.6%) and LiveCodeBench (79.4%). Tracing this back points to efficiency gains from infrastructure—about a 6x efficiency improvement refers to deploying FLOPs more efficiently, not by reading fewer books.

Compute wins; “data minimalism” doesn’t hold up

The spread of this tweet exposes the gap between “catchy viral slogans” (“just two books!”) and “the real lever for building strong models” (massive training on mega-scale clusters). As scrutiny on training data compliance and leakage increases—such as Stanford’s recent documentation of models repeating copyright-protected novels—this becomes even more critical.

xAI is positioning Grok 4 as the strongest level of agentic reasoning by applying RL at the pretraining scale. Unlike OpenAI and Anthropic’s more cautious approach, xAI is joking about “efficiency” while actually delivering multimodal tools. Interpreting this tweet as the popular view of “open source” or an “efficiency revolution” is mostly emotional anticipation—xAI’s $6 billion Series C primarily goes to infrastructure, not “dataset minimalization.”

This also creates a mismatch between pricing and narrative. If the market over-focuses on cost efficiency, it may overlook the higher weight of the compute moat. xAI has a relative advantage in infrastructure; companies like Meta may fall behind on inference depth if they can’t match the same scale of RL and training compute.

Faction	What they saw	Impact on industry understanding	Assessment
Minimalist believers	Treating the “two-book” joke as a stamp of efficient training	Encouraging independent developers to expect that scaling laws can be bypassed	Exaggerates—ignores the hard constraint of compute thresholds on teams without sufficient funding
Scale-pragmatists	Focusing on xAI’s Colossus cluster and Grok 3/4’s RL roadmap	Reinforces the consensus that “FLOPs beat data tricks”; enterprise customers prefer high-compute suppliers	Closer to reality—xAI’s enterprise-side advantage is undervalued by the market
Cautious camp	Noting the lack of expert backing, and that it’s not strongly tied to benchmarks like ARC-AGI-2 (Grok 4 at 15.9%)	Avoids mis-setting investment assumptions due to narrative swings	Reasonable restraint—narrative-driven funding bubbles carry higher risk
Competitor analysts	Comparing the toolchain integration of the Grok API and issues with competitor hallucinations (improved in version 4.1)	Speeds up positioning; xAI’s multimodal pressures (voice/video, etc.) transmit to competitors	xAI widens the gap; Anthropic may be constrained in the pace of RL expansion

Conclusion: The real variable hidden by this joke is xAI’s compute lead. Builders who haven’t shifted toward scalable RL are already behind; investors betting on compute and infrastructure moats are still in an early stage; enterprise buyers adopting Grok’s agentic tools now will be better positioned than rivals still clinging to the “minimal data” myth.

Importance: Medium
Category: Technical insights, industry trends, market impact

Verdgment: The timing to enter this narrative is now: for investors and enterprise buyers betting on compute and RL infrastructure, it’s “early advantage”; for builders still insisting on a data-minimal route, it’s already “too late.” Those who benefit most in practice are participants who control or have access to large-scale GPU clusters and the RL engineering stack: infrastructure builders and mid-to-long-term funds benefit the most, and enterprise buyers willing to deploy the Grok agent toolchain early are also advantaged. For short-term traders, unless there’s a clear compute-supply catalyst, the marginal advantage is limited.

GROK-3.09%

XAI0.93%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.