AI Chip Costs Are Becoming A Memory Story


The Expensive Part Is No Longer Just The GPU

When people talk about the cost of AI infrastructure, they usually talk about GPUs as if the accelerator were one thing. That shorthand is convenient, but it hides the part of the machine that is starting to matter most.

Epoch AI’s latest component-cost work says high-bandwidth memory, or HBM, grew from 52% of AI chip component spending in Q1 2024 to 63% in Q4 2025. That is not a small accounting wrinkle. It means the memory sitting next to the compute die has become the largest and fastest-growing cost bucket inside frontier AI accelerators.

The rest of the bill moved in the opposite direction or stayed roughly stable. Logic dies remained near 13%. Advanced packaging fell from 19% to 15%. Auxiliary components such as substrates, boards, and final assembly dropped from 15% to about 10%. In other words, the accelerator package became less of a “silicon die plus extras” story and more of a “memory stack plus everything needed to feed it” story.

That changes how to read the AI buildout. The bottleneck is not only fab capacity. It is not only GPU allocation. It is not only data-center power. It is also the number of HBM stacks that can be manufactured, qualified, packaged next to logic dies, and delivered into systems fast enough to meet demand.

What Epoch Is Measuring

The useful detail in Epoch’s work is that it does not simply count GPUs. It breaks accelerator production into constrained inputs:

  • advanced-node logic wafers fabricated at 3 nm and 5 nm class processes
  • CoWoS advanced packaging used to combine logic and memory into one accelerator package
  • HBM memory attached to the accelerator
  • auxiliary per-chip components such as substrate, board, and assembly costs

The data covers the largest AI chip designers by supply-chain consumption: Nvidia, AMD, Google, and Amazon. That means the dataset includes GPUs and custom accelerators such as TPUs and Trainium, but it does not cover every designer in the world. Meta, Microsoft, Tesla, Groq, Huawei, Cambricon, and others are outside the tracked set.

That scope matters. The numbers should not be read as a perfect map of every AI chip everywhere. They are a model of the major public cloud and accelerator supply chain. But that is exactly why the signal is important: if the biggest buyers and designers are already being pulled toward HBM-heavy designs, the rest of the market will feel the pressure through prices, lead times, and capacity allocation.

Epoch also attributes component demand to the quarter when inputs are consumed, not merely when finished chips are sold. That avoids a common distortion in hardware analysis. A chip can consume wafers, packaging capacity, and HBM before it shows up as shipped revenue. Inventory and work-in-progress can hide real pressure if you only look at sales.

Why HBM Became The Center Of The Package

Modern AI workloads are not starved only for arithmetic. They are starved for fast access to model weights, activations, key-value caches, embeddings, and intermediate tensors. A large model can have enormous compute demand, but the compute units are only useful when data reaches them quickly enough.

That is why HBM is physically placed next to the logic die inside the accelerator package. It provides far more bandwidth than ordinary server memory because it uses stacked DRAM dies and very wide connections. The tradeoff is cost and complexity. HBM is harder to build, harder to package, and supplied by a smaller set of memory manufacturers than commodity DRAM.

The product specs tell the same story. AMD’s MI300X was built around 192 GB of HBM3 and 5.3 TB/s of peak memory bandwidth. Google’s TPU v5p documentation lists 95 GB of HBM2e and 2,765 GB/s of bandwidth per chip. Nvidia’s Blackwell systems push memory capacity and bandwidth even higher. These are not cosmetic numbers on a datasheet. They are central to whether a model fits, whether inference batches efficiently, and whether training runs keep expensive compute units busy.

The result is a straightforward economic shift: if model serving and training keep asking for more memory capacity and bandwidth per accelerator, then the memory subsystem captures more of the accelerator’s cost.

The Memory Share Rose While Logic Did Not

The most striking part of Epoch’s chart is not just that HBM reached 63%. It is that logic did not rise with it. Logic stayed near 13% of component spending from Q1 2024 to Q4 2025.

That does not mean logic is easy. Leading-edge wafers are still scarce, expensive, and strategically important. TSMC’s advanced-node capacity remains one of the most important industrial constraints in the world. But in the bill of materials for these AI accelerators, memory is where the mix shifted.

Packaging also became less dominant as a share of cost, moving from 19% to 15%. That may sound like packaging stopped mattering, but that would be the wrong interpretation. CoWoS remains a binding constraint because it is the process that brings large logic dies, chiplets, and HBM stacks together. A lower share of cost does not mean a lower share of strategic importance.

The better reading is this: several constraints have to clear at once, but HBM is taking a larger fraction of the economic value inside the final accelerator.

Why This Shows Up In Cloud Budgets

If memory is becoming a larger share of the accelerator cost, cloud capital expenditure becomes more sensitive to memory pricing and allocation.

That helps explain why AI infrastructure spending can rise even when chip designers improve compute efficiency. Better FLOPS per watt and better price-performance do not automatically lower total spend if customers are asking for more memory-rich systems, longer context windows, larger serving fleets, and more inference capacity.

For cloud providers, the implication is uncomfortable. They cannot optimize only at the model or scheduler layer. They must also manage memory procurement, packaging slots, inventory timing, rack design, and deployment pacing. A shortage or price spike in HBM can move the economics of an entire AI cluster.

For model companies, the same pressure appears as capacity planning. A model that is elegant on paper but memory-hungry in production can be much more expensive to serve than its parameter count suggests. Context length, KV-cache behavior, batch size, quantization strategy, and mixture-of-experts routing all become hardware-cost questions.

For startups buying cloud inference, the cost may arrive indirectly. They do not negotiate HBM contracts, but they do pay the platforms that do.

The Supply Chain Is Narrower Than The Word “Chip” Suggests

“AI chip shortage” is too broad a phrase. Different shortages have different fixes.

If the shortage is leading-edge logic wafers, the answer is more advanced fab capacity and better yield. If the shortage is advanced packaging, the answer is more CoWoS capacity and packaging throughput. If the shortage is HBM, the answer is more memory wafer starts, more stacking capacity, more qualified suppliers, and more package integration.

Those capacity expansions do not happen on the same schedule. Memory makers can shift some production toward HBM, but they cannot instantly create unlimited advanced HBM output. Packaging capacity can expand, but not overnight. Foundry capacity has even longer lead times. The AI supply chain is a queueing system with several narrow doors.

Epoch’s framework is useful because it separates those doors. Counting finished accelerators alone blurs the problem. A chip with more HBM stacks consumes the supply chain differently from a chip with less memory. A custom inference accelerator and a training GPU may use different mixes of logic, packaging, and HBM even if both are called “AI chips” in earnings calls.

Export Controls And Policy Get Messier

This also matters for policy. Export controls often talk about chips, compute performance, interconnect bandwidth, or destination markets. But if HBM is the binding component, then the policy question shifts.

Selling or withholding one accelerator is not only a question of delivered FLOPS. It is also a question of how much scarce HBM and packaging capacity that accelerator consumed before it reached a customer. If a policy allows certain chips to ship but requires that exports not reduce capacity available to domestic customers, component-level accounting becomes more relevant.

The same applies to stockpiling. If firms or countries build HBM inventories ahead of restrictions, the market impact can appear before finished accelerator sales reflect it. Epoch’s data model explicitly tries to account for timing like this by looking at component consumption rather than only shipment dates.

What Builders Should Take From This

For engineers and infrastructure teams, the practical lesson is to treat memory as a first-class design constraint.

That starts with model architecture. A serving architecture that reduces KV-cache pressure can have real infrastructure value. Quantization can be a capacity strategy, not just a latency trick. Retrieval design, context budgeting, batching, speculative decoding, and model routing all affect how much HBM a workload burns per unit of useful output.

It continues with procurement. Teams evaluating accelerator options should compare memory capacity, bandwidth, software maturity, networking, and availability together. A chip with excellent compute but insufficient memory can force awkward sharding or smaller batches. A chip with abundant memory but weaker software support can create engineering drag. The right answer depends on the workload.

It also affects financial planning. If memory prices rise, an AI budget can miss even if GPU counts look unchanged. If HBM supply tightens, delivery schedules can slip even when there is enough demand and enough data-center space.

What Investors Should Watch

For investors and analysts, the important shift is value capture. The AI boom is not only a GPU vendor story. It is also a memory manufacturer story, an advanced packaging story, a substrate story, and a supply-chain timing story.

HBM suppliers such as SK Hynix, Samsung, and Micron sit much closer to the center of the AI economy than ordinary memory-cycle thinking would suggest. The old mental model says memory is cyclical, commoditized, and mostly interchangeable. HBM is still memory, but the qualification, packaging, power, thermal, and bandwidth requirements make it a more strategic component.

That does not eliminate cyclicality. It does mean the cycle is now tied to AI deployment plans, model scaling, and cloud capex in a way that did not exist at the same intensity a few years ago.

The Bigger Lesson

The headline number is simple: HBM went from 52% to 63% of AI chip component spending in less than two years.

The bigger lesson is that AI hardware economics are becoming less legible if we keep using one-word labels like “GPU.” The accelerator is an assembly of constrained parts, and the expensive center of gravity is moving.

Logic still matters. Packaging still matters. Power, networking, cooling, and buildings still matter. But inside the accelerator package, memory has become the dominant cost bucket.

That should change how teams talk about AI costs. The question is no longer only “how many GPUs can we get?” It is also:

  • how much HBM do those accelerators consume?
  • how much bandwidth does the workload really need?
  • how much serving efficiency is being lost to memory pressure?
  • which suppliers control the scarce parts?
  • what happens to the budget if memory prices move first?

AI infrastructure is often described as a race for compute. Increasingly, it is also a race for memory.

Sources