logo
  1. Home
  2. Blog
  3. Ai workloads driving memory demand

The Memory Challenges Posed by Modern AI Workloads

By: Barnaby

|

September 16, 2025

blog_image_main

Imagine your brain trying to juggle not just numbers, but thousands of entire books at once; that’s what modern AI workloads are asking from compute systems. The models keep growing larger and more complex: trillions of parameters, expanding context windows, and sparse activation models like “mixture of experts” that selectively wake up parts of the network, but still demand fast memory access everywhere.

Even the most powerful GPUs and AI accelerators, loaded with teraflops of compute, can sit idle if the memory subsystem lags. Memory bandwidth and latency have become the real choke points. This bottleneck is often referred to as the “memory wall”: compute capability (FLOPs) has been increasing at a far faster pace than DRAM and interconnect bandwidth.

And there’s more: power, cooling, and cost. High-performance memory like HBM isn’t just expensive; it’s energy-hungry. The energy required to move data around often exceeds the energy needed for actual computation. When you add in dataset shuffling, I/O transfers, and caching layers, it becomes obvious, in AI systems today, memory isn’t just a supporting character; it’s the main stage.

Key Technologies & Memory Types Meeting the Demand

Think of an AI system as a race car. The GPUs and TPUs are the engines, raw compute is the fuel, and memory is the road. Without wide enough roads (bandwidth) and smooth surfaces (low latency), your high-powered race car sputters, no matter how strong the engine is.

Here are the technologies carrying today’s AI workloads forward.

High-Bandwidth Memory (HBM3 / HBM3E)

HBM (High-Bandwidth Memory) places memory directly beside the compute using 3D stacking. This shortens the travel distance for data, cuts energy losses, and delivers massive throughput.

  • HBM3: delivers ~819 GB/s per stack.
  • HBM3E: pushes beyond 1.2 TB/s per stack, as seen in Micron’s 24 GB, 8-Hi products.

The benefit isn’t only speed; performance per watt improves dramatically compared with traditional DRAM. The tight integration also helps with thermals, although stacking multiple dies does create its own cooling challenges.

Best use case:

GPU and accelerator memory in AI training and inference, where bandwidth is the absolute bottleneck.

DRAM Evolution (DDR5, LPDDR, etc.)

Standard DRAM hasn’t stood still. DDR5 modules deliver ~38–51 GB/s each, scaling with clock speeds (DDR5-4800 to DDR5-6400). They’re cheaper per gigabyte than HBM, more abundant, and easier to integrate into general-purpose servers.

DDR5 also supports higher densities, improved channel efficiency, and in some cases, on-die ECC (error correction) for reliability. While DDR5 can’t match HBM bandwidth, it’s the workhorse memory layer that underpins most system architectures.

Best use case:

System memory, model offloading, and intermediate layers of the hierarchy.

Compute Express Link (CXL) & Disaggregated Memory

CXL (Compute Express Link) is a game-changer. It enables memory pooling and disaggregation, breaking free from fixed DIMM slots tied to a single CPU. With CXL, multiple hosts and accelerators can dynamically access and share a common pool of memory.

Vendors have demonstrated CXL-attached DDR5 modules (e.g., Niagara 2.0) that scale both bandwidth and capacity without requiring a full hardware redesign. This is a stepping stone to truly composable infrastructure.

Best use case:

Cloud and data centre environments that need elastic scaling of memory across CPUs, GPUs, and accelerators.

SSD / NVMe / Persistent Memory Layers

When the higher tiers max out, NVMe SSDs step in.

  • PCIe Gen4 NVMe: ~7–7.5 GB/s.
  • PCIe Gen5 NVMe: higher still (above 12 GB/s in some products).

Latency is orders of magnitude higher than DRAM or HBM, but NVMe provides immense capacity and streaming throughput. For datasets and large model weights, NVMe is the fallback layer, slower, but indispensable.

Best use case:

Storing large models, datasets, or batch streaming when DRAM/HBM capacity is insufficient.

Memory Type Comparison

Memory Type

Example Bandwidth / Spec

Relative Cost & Energy

Best Use Case

HBM3 / HBM3E

HBM3: ~819 GB/s per stack; HBM3E: >1.2 TB/s per stack

High cost per GB; highly energy efficient per bit; requires advanced cooling

Accelerator/GPU memory for AI training and high-throughput inference

DDR5 (Standard DRAM)

~38–51 GB/s per module

Lower cost per byte; higher latency vs HBM; widely available

Main system memory, offloading, memory hierarchy balance

NVMe SSD / Flash

Gen4: ~7–7.5 GB/s; Gen5: >12 GB/s

Very low cost per TB; relatively high latency; IO power draw is significant

Storage of model weights/datasets; cache and streaming

Market & Trend Insights (2024–2025 and Beyond)

The memory market is booming, with AI the biggest driver of demand.

  • Revenue & growth: The global memory market hit ~£129.37 billion in 2024 (DRAM ≈ £73.8 billion, NAND ≈ £51.8 billion). Projections put it close to £152.2 billion in 2025.
  • HBM growth: SK hynix projects ~30% CAGR through 2030 for AI-specialised HBM. Some analyses suggest revenues could reach ~£74.6 billion by 2030
  • Next-gen persistent/disaggregated memory: Expected to grow from £8.4 billion – £11.4 billion in 2025 to £30.4 billion – £34.3 billion by 2030.
  • Supply and geopolitics: High-performance memory production is constrained by advanced process nodes, packaging complexity, and thermal design. Add export restrictions and localisation pushes, and supply chains stay volatile.
  • Edge and inference trends: Unified memory designs (like Apple’s Mac Studio) are becoming common in AI PCs and workstations. Memory pooling via CXL is moving from labs to adoption, especially in hyperscale datacenters.

Designing Memory Architecture for AI Workloads

Designing an AI system isn’t just about raw GPU count; it’s about ensuring data flows smoothly through the memory hierarchy.

Memory Hierarchy & Tiering

Visualise it as a pyramid:

  • On-chip caches (fastest, smallest).
  • HBM (or future equivalents).
  • DDR5/LPDDR.
  • SSD/NVMe/persistent memory at the base (largest, slowest).

The key is to keep the hottest, most frequently accessed data in the highest tiers.

Latency vs Throughput Trade-offs

  • Training workloads: Throughput is king - moving massive data batches in parallel.
  • Inference workloads: Latency dominates - responses must be near-instant.
  • Balanced systems: Sometimes sacrificing throughput for latency (e.g., edge AI) makes sense.

Data Movement & Architecture Patterns

  • Caching & prefetching to keep compute pipelines fed.
  • Pooling/disaggregation (CXL) to reduce idle capacity.
  • Compression & quantisation (4- or 8-bit) to cut storage requirements and boost effective throughput.

Power, Thermal, and Cost Considerations

Memory subsystems can draw a large fraction of a system’s power. HBM stacks run hot; cooling costs mount. Total Cost of Ownership (TCO) must weigh upfront expense against downstream savings in energy and developer productivity.

Flexibility & Future-Proofing

  • Modularity: Expansion-friendly chassis, CXL slots, upgradable DIMMs.
  • Vendor diversity: Don’t bet everything on one supplier.
  • Observability: Monitors bandwidth saturation and latency distributions in real workloads, not just benchmarks.

Get a better understanding with Real-World Examples

Google’s Ironwood TPU Supercomputer

Unveiled at Hot Chips 2025, Google’s Ironwood TPU (7th generation) packs:

  • 192 GB HBM3E per chip
  • ~7.4 TB/s bandwidth
  • ~1.8 PB pooled memory per pod

This architecture is designed for large-scale inference, such as recommendation systems and serving LLMs at scale. The shared memory pool reduces latency and enables multi-workload concurrency.

Micron HBM3E

Micron’s HBM3E products deliver:

  • 24 GB per stack
  • >1.2 TB/s bandwidth
  • Improved performance per watt

These devices reduce training time significantly, directly lowering operational costs for hyperscalers.

Enfabrica’s EMFASYS

Enfabrica (backed by Nvidia) developed EMFASYS, a smart interconnect fabric that links accelerators with DDR5 memory. This design enables cheaper, higher-capacity DRAM to supplement HBM, balancing cost and performance.

Elastics.cloud CXL Pooling

Elastics.cloud showcased symmetric CXL pooling, where two servers shared both local and remote CXL-attached memory in real time. This demonstrates the future of flexible, disaggregated memory architectures.

Implications & Recommendations for IT Decision-Makers

  1. Plan for supply & cost volatility. HBM is scarce and premium-priced; diversify suppliers.
  2. Balance performance and TCO. HBM may look costly, but it shortens training cycles and lowers energy bills. DDR5 with CXL pooling offers affordable scalability.
  3. Think long-term architecture. Prioritise modular, composable designs over rigid DIMM slots.
  4. Benchmark aggressively. Real workloads rarely behave like datasheet specs.
  5. Consider sustainability. AI’s reputation for energy hunger is real; memory design is key to improving performance per watt.

Frequently Asked Questions (FAQs)

Why is memory so important for AI workloads?
Because AI models move enormous amounts of data. Without fast memory, even the most powerful GPUs stall.

What’s the difference between HBM and DDR5?
HBM is ultra-fast but expensive; DDR5 is slower but far cheaper and more capacious. Most AI systems use both.

Can SSDs or NVMe drives support AI memory needs?
Yes, but mainly for storage and caching. They’re too slow for active computation compared with DRAM or HBM.

How can organisations plan for future memory demands?
By choosing modular systems, mixing premium and standard memory, and benchmarking workloads regularly.

Conclusion

AI may grab headlines with flashy billion-parameter models, but the unsung hero behind it all is memory. From HBM3E stacks delivering terabytes per second, to CXL pooling that extends DDR5, to NVMe tiers providing cost-effective capacity, memory design is what determines whether workloads run smoothly or stall.

For IT leaders, the lesson is simple: treat memory as a strategic pillar, not an afterthought. Costs are high today, but the risks of under-investing are higher: stalled training runs, soaring energy bills, or infrastructure bottlenecks that strangle innovation.

The good news? The ecosystem now offers modular, scalable options. You’re building hyperscale datacenter clusters or AI-ready workstations, there’s a memory strategy to fit.

Read More:
What is the Cheapest Laptop that Comes with 32GB RAM?
Top Tech Gadgets for Work & Study Under £300 from Clearance Stock
What are Good Graphics Cards?

Related Articles

How to Choose the Right GPU for Your PC in 2025
calendar

June 25, 2025

How to Choose the Right GPU for Your PC in 2025

When it comes to building a PC, choosing the right GPU is crucial. Whether you are a casual gamer, a competitive enthusiast, or a creative professional working with 4K video and 3D rendering, the graphics card you choose can directly impact your PC’s performance, visual fidelity, and future-proofing potential.

With the launch of next-gen GPUs, such as NVIDIA's RTX 5000 series and AMD’s RDNA 4 lineup, the market is flooded with options that cater to every use case and budget. But how do you know which GPU is right for you? Should you invest in the most powerful graphics card available, or can a budget option handle your daily needs? And how does resolution, 1080p, 1440p, or 4K, influence your choice?

In this comprehensive guide, we will answer all these questions. From understanding performance tiers and VRAM requirements to matching your GPU with your monitor and CPU, we will break down everything you need to make an informed decision. We will also highlight the best GPUs in 2025 for every

Read More
Kingston DDR4 vs DDR5 RAM – Performance Compared
calendar

August 06, 2025

Kingston DDR4 vs DDR5 Performance: Which RAM Should You Buy?

DDR4 vs DDR5 Explained: Find the Best Kingston RAM for Your Setup

Upgrading your computer memory is one of the smartest ways to boost system performance, but with the launch of next-gen memory modules, the debate of DDR4 vs DDR5 is more relevant than ever. Whether you’re a gamer, creator, or multitasker, choosing the right PC RAM can significantly impact how fast your system runs and how future-ready it is.

In this guide, we’ll dive into the differences between DDR4 vs DDR5 RAM, comparing their performance, price, compatibility, and use cases, especially for gaming. We’ll also look at why Kingston RAM remains a top-tier choice for both generations.

If you’re wondering which RAM is best for your setup in 2025, keep reading and don’t forget to check our handpicked selection at Laptop Outlet’s RAM & Memory section.

DDR4 vs DDR5: What’s the Real Performance Difference?

When it comes to the performance of RAM computer, DDR5 RAM undeniably leads the charge. It offers substantially higher bandwidth

Read More
What to Check Before Buying a Refurbished SSD or HDD?
calendar

August 12, 2025

What to Look for When Buying a Refurbished SSD or HDD?

Need more storage or speed but don’t fancy paying full price? Buying a refurbished SSD or HDD could be the budget-friendly solution you’re after. Whether you're upgrading your current device or adding extra space for files, games, or media, refurbished drives offer great value without compromising functionality.

However, not all refurbished storage is created equal. To ensure your investment lasts and performs well, you need to understand the difference between drive types, evaluate their health, and select based on your specific needs, whether it’s for day-to-day use, gaming, or business tasks.

This guide covers everything you should know before buying refurbished SSD or HDD, external refurbished drives, or upgrading your refurbished SSD laptop. It’s your go-to resource for making a smart, informed choice.

Understanding the Difference Between SSD and HDD

When choosing between a refurbished SSD and HDD, it’s important to know how each works and which suits your needs better. Both offer

Read More
Key Features to Look for in a Gaming Laptop: GPU, CPU & More
calendar

August 26, 2025

Essential Features to Look for in a Gaming Laptop (GPU, CPU, Cooling, Display, Storage)

Choosing a gaming laptop today is all about understanding the gaming laptop key features that move the needle foryourgames and workflow. In 2025, the big shifts are real: next‑gen GPUs with AI frame generation, smarter CPUs with onboard NPUs, faster storage that feeds data straight to the GPU, premium displays (OLED/Mini‑LED) at esports‑grade refresh rates, and cooling systems that can keep all of it performing sustainably.

However, what does all of that actually mean when you just want smooth gameplay and a machine that lasts? Below, we have you covered with a clear, up‑to‑date walkthrough of what to look for in a gaming laptop with the evidence to back it. Whether you’re looking at entry-level models around £900, mid-range workhorses at £1,500, or premium powerhouses above £2,000, this guide focuses on the essential gaming laptop key features.

If you’d like a broader perspective from budgeting to comparisons, you can check out our full gaming laptop buying guide.

GPU: the first decision

Read More