Nvidia and Alibaba re-evaluate AI, throwing FLOPS into the trash heap

2026-03-18 12:06:11

On March 17, Jensen Huang took the stage at NVIDIA GTC 2026, wearing his signature leather jacket and speaking for over two hours. Afterwards, almost the entire internet was talking about “NVIDIA aiming to be the Token King.”

But if you listen carefully to the speech, you’ll find that Jensen Huang was actually emphasizing not the Token itself, but Tokens per Watt. When showing inference performance charts, he explicitly mentioned this concept and straightforwardly stated: every data center, every AI factory, is fundamentally limited by power. A 1GW factory will never become 2GW—this is dictated by physical laws. Under fixed power, the entity with the highest Tokens per Watt produces the lowest cost, and its revenue curve is the steepest.

This statement is the real core of GTC 2026.

Public discussion has focused on how much stronger Vera Rubin is compared to Blackwell, how Groq LPX can boost inference speed by 35 times, and NVIDIA’s plans to move data centers into space. These are all important, but fundamentally they are different expressions of the same logic: under energy constraints, maximizing intelligent output per Watt.

When Jensen Huang takes “Tokens/W” as the core metric for AI factory output, there’s actually a deeper industry implication behind it: the measurement system for compute power competition is shifting from chips to systems, from peak parameters to end-to-end efficiency, from who has faster chips to who can convert energy into intelligence more efficiently.

In the current product and technology landscape, NVIDIA and Jensen Huang are still constrained by token/W; they are still far from becoming the true Token King.

This is a migration toward an “intelligent measurement language,” and the industry perspective opened by this shift is far more worth discussing than any new chip.

Coincidentally, just one day before GTC officially opened, Alibaba announced the establishment of Alibaba Token Hub, personally led by Wu Yongming. Alibaba’s AI core is not named after AI but after Token, elevating Token to Alibaba’s AI strategic level.

This also reflects that viewing AI from a system perspective is gradually becoming the industry’s new understanding. This is the idea I want to emphasize in this article, and it is the significance of this piece.

01 The most important change at GTC 2026 is not in the chips themselves

At GTC 2026, everyone’s focus remains on new products and terms like Vera Rubin, Rubin POD, LPX, DSX AI Factory. But if you look at these releases collectively, you’ll see that the narrative boundary of compute power competition has shifted from individual chips to the entire compute infrastructure—an AI factory composed of computing, networking, storage, power, cooling, control systems, and software.

Rubin is described as a POD-scale platform, a large, coherent system formed by multiple racks; DSX is defined as a reference design for AI factories, aiming to maximize Tokens per Watt.

This indicates that the industry’s real competition is shifting from how high a single chip’s compute power is to how powerful the entire system is. More specifically, whether the whole system can efficiently organize limited power, cooling, and network resources into stable AI output.

The specific metric is Tokens per Watt (Token/W).

This article aims to analyze the significance conveyed by this metric and the opportunities it presents for developing AI infrastructure.

02 As the competition shifts to systems, the measurement system can no longer stay at the chip level

The measurement system of the chip era is well known: peak FLOPS, memory bandwidth, FLOPS/W, TOPS/W, bit/J—these indicators are important because they describe the capability boundaries of components.

This leads to an awkward situation in practice: there is no objective, unified, and universal measurement unit for intelligent computing centers.

Typically, data centers are measured in MW of power, and in domestic AI centers, the compute power is often expressed in PFlops (based on FP16). However, the same compute or power units can have vastly different efficiencies depending on internal chip design, networking, and cooling.

The reason is simple: previous metrics only measure certain dimensions. Peak FLOPS describe the theoretical maximum calculations of a chip; bit/J describes local data transfer efficiency; bandwidth measures the information pathway capacity of a subsystem—all chip-level metrics.

But a complete AI system ultimately answers: under fixed power budgets, cooling conditions, and data center constraints, how much effective AI output can it produce? This question cannot be answered solely by chip-level metrics.

From NVIDIA’s discourse, we see metrics like token cost, throughput per Watt, token performance per Watt, and Tokens/W.

The measurement language is shifting from component-level to system-level.

Therefore, if the common chip-level metrics are peak FLOPS, bandwidth, and bit/J, then a more appropriate system-level metric should be Tokens/W. The former measures component capability; the latter measures overall output. The former corresponds to local optimization; the latter to system optimization.

03 Tokens/W connects energy input to intelligent output

In NVIDIA’s GTC 2026 transcript, tokens are called the basic unit of modern AI. This is very accurate. For large language models, inference services, and agent systems, the ultimate measure that users pay for is essentially the system’s ability to generate and process tokens.

From a business perspective, tokens have three advantages: 1) they are directly coupled with the inference process; 2) they are directly coupled with revenue models; 3) they are suitable for new inference workloads like multi-turn dialogue, long context, retrieval augmentation, tool invocation, and reasoning chains.

These new workloads are hard to describe with a single FLOPS metric but leave traces in token, latency, and goodput dimensions.

More importantly, today’s AI infrastructure constraints are increasingly directly reflected as energy constraints. The IEA’s “Energy and AI” report estimates that by 2030, global data center electricity consumption will reach about 945 TWh, a significant increase from current levels; AI is one of the main drivers, especially in the US, which will account for a large share of this growth. In other words, many future problems in AI are not just chip issues but fundamentally energy, cooling, and infrastructure organization issues.

The value of the Token/W concept lies in connecting the core chain of the AI industry: power input, computation, networking, storage, scheduling, cooling, and finally token output.

In this sense, Token/W is not just a simple replacement for FLOPS/W or bit/J. It adds a layer of perspective that was previously overlooked:

How much energy is converted into how much intelligent output by AI systems.

I believe the most important discussion point at GTC is here: we can no longer view chips in isolation; we must see chips within systems and systems within industry constraints.

This is also the perspective I advocate. When looking at AI chips, it’s not enough to consider peak compute, memory bandwidth, and interface parameters; we must also consider how they collaborate in networks, how they are deployed in racks, how they draw power in campuses, how they form cost structures for customers, and ultimately how they generate real business output.

In some ways, GTC 2026 has validated this system perspective. Because when NVIDIA itself begins to center its narrative on AI factories, the industry is shifting from AI chip-centric to compute system-centric.

This is crucial. Many industries initially focus on component parameters because they are easiest to measure and promote. But once large-scale deployment begins, the real determinant of success is system organization capability. Today’s AI infrastructure has reached this stage.

04 As Token/W advances, the importance of optical interconnects will become more apparent

When the measurement system shifts to the system level, many supporting links previously considered auxiliary will gain importance.

Optical interconnects are a prime example.

Historically, the industry has focused on optical modules, communication, and device perspectives: higher bandwidth, longer transmission distances, lower pJ/bit, better bandwidth density, lower insertion loss. These are all valid but still stay at the subsystem level—components and chips. Under the Token/W framework, the value of optical interconnects becomes more intuitive: they reduce data transfer energy costs and enhance the ability of large-scale AI systems to convert power into tokens.

NVIDIA’s optical network products, such as photon-based CPO, can achieve up to 5 times higher energy efficiency compared to optical modules, while reducing latency and supporting larger AI factories.

The key point is not just more advanced links but larger system scale and higher system efficiency.

From an industry logic perspective, this is understandable. As models grow larger, contexts lengthen, and clusters expand, much of the energy consumption occurs not in arithmetic units but in data movement—cross-chip, cross-board, cross-rack, cross-POD communication.

At this stage, improving Tokens/W requires not only more powerful GPUs but also more efficient interconnects.

Therefore, from the Tokens/W perspective, developing optical interconnects is not just about cutting-edge technology but a necessary energy-saving measure for large-scale AI systems.

05 Optical computing is more advanced than optical interconnects, but the logic is also beginning to make sense

Optical computing is indeed earlier-stage than optical interconnects, and this must be honest.

Issues like versatility, precision, compilers, manufacturing consistency, and system integration are still evolving. But if we look at the industry from a system perspective, its industry significance is now easier to articulate than before.

The reason is that Token/W cares about end-to-end efficiency. Whoever can significantly reduce energy consumption along high-frequency, high-density, reproducible compute paths will have the opportunity to improve token output efficiency at the system level. This logic does not require optical computing to replace entire GPUs or to become a universal computing foundation overnight.

It only requires one thing: in certain key workloads, to lower the J/token of the entire system and increase token output under fixed power budgets.

This is why the narrative of optical computing needs to shift from single-device efficiency to system-level energy savings. If the industry only looks at TOPS/W or MAC/J, it’s more like a lab story; but if it begins to consider Tokens/W, it can enter infrastructure discussions.

This shift is especially important for optical computing because it finally provides a higher-level language to communicate with customers, campuses, power grids, and capital expenditure.

06 When compute power measurement shifts from chips to systems, optical interconnects and optical computing become main industry themes

When compute power competition remains mainly at the chip level, optical interconnects are more like I/O technology, and optical computing is more like frontier device exploration.

When the competition shifts to large-scale AI system infrastructure, the situation changes. System efficiency increasingly depends on dense compute energy, data movement, context management, cross-node collaboration, power supply, and thermal management—areas where optics have the greatest potential.

From the Tokens/W perspective, optical interconnects address the energy cost of data movement behind each token; optical computing attempts to rewrite some of the computational energy cost per token. Both influence the overall token output efficiency of the system.

This is the fundamental reason they are entering the industry’s main track.

More practically, besides chip capacity and supply, future data centers and AI factories will face constraints including power grid access, cooling, campus energy consumption, cabinet power density, and deployment speed. The IEA’s recent assessment of AI’s energy consumption and NVIDIA’s statements on AI factories point in the same direction: AI infrastructure is becoming a systemic engineering measured by energy.

Looking forward, the issues solved by optical interconnects and optical computing are the increasingly expensive and difficult-to-optimize parts of AI: data movement energy costs and high-density compute unit energy consumption.

Behind this is a more complete systems thinking. That’s why GTC 2026 again emphasizes photonic and silicon photonics products: as compute measurement shifts from chips to systems, optics will evolve from an advanced technology option to a foundational infrastructure.

From this perspective, CPO and optical computing systems are very promising!

Final thoughts: The main axis of AGI development

In my daily work, I have always advocated establishing objective, measurable compute power standards, and have used the Tokens/W approach to evaluate different chips.

Looking back at technological history, when the output energy-to-weight ratio of internal combustion engines increased, cars were born; airplanes could take off; rockets could launch.

And in the AI era, when the output results (now tokens) to energy consumption ratio increases, intelligence becomes smarter, and AGI may emerge from it.

What’s truly worth remembering from GTC 2026 is not just NVIDIA’s success or Jensen Huang’s potential to become the “Token King,” but the clear new measurement standards for AI.

Furthermore, NVIDIA, Alibaba, and many other industry giants are beginning to realize that AI development must be viewed from a system perspective.

This aligns with the main trajectory of human civilization: using less energy to collect, transmit, and process more information.

AGI will not be an exception!

Source: Tencent Technology

Risk Warning and Disclaimer

Market risks exist; investments should be cautious. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should consider whether any opinions, viewpoints, or conclusions in this article are suitable for their circumstances. Invest accordingly at your own risk.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.