AI accelerators drawing over a kilowatt each push rack power beyond 60 kW, making power delivery and cooling the real challenges. Traditional air cooling and power rails can’t keep up. Liquid cooling, high-voltage distribution, and advanced materials are essential to sustain performance and reliability. True AI power lies in the infrastructure supporting the chips, not just the chips themselves.
Key Account Manager - Data/Telecom
AI accelerators like NVIDIA’s H100 and GB200 are pushing power consumption to unprecedented levels—700 to 1,200 watts per chip, with the upcoming GB300 reaching 1,400 watts. In dense deployments like the NVLink 72 rack—housing 72 GPUs across 18 compute trays—rack-level power demands can exceed 60 kW.
Legacy air cooling, originally designed for 200–300 W CPUs, is struggling to keep pace, especially in dense 1U and 2U server formats common at the edge and in telecom deployments. Limited airflow and fan capacity cause thermal gradients, hotspots, and throttling, reducing reliability and uptime.
Liquid cooling has evolved from a niche technology to a necessity. Direct Liquid Cooling (DLC) using cold plates reduces thermal resistance and supports higher densities. Immersion cooling is emerging in hyperscale AI clusters, pushing racks beyond 100 kW with power usage effectiveness (PUE) below 1.2. These liquid solutions lower thermal deltas by 10–15°C compared to air cooling, which is critical for preserving hardware performance and longevity.
Power delivery is equally challenging. Modern GPUs require over 600 amps at voltages below 1 V, placing extreme demands on onboard voltage regulator modules (VRMs) that must maintain over 90% efficiency under rapid transient loads. Power delivery networks (PDNs) must use low-inductance interconnects and effective decoupling to control voltage droop and ripple.
At the rack level, many operators outgrow traditional 12V or 48V power rails. Hyperscalers are evaluating ±400 VDC distribution to reduce current, minimize cable bulk, and free internal chassis space. This transition affects PSU design, insulation standards, and EMI management, presenting new design frontiers.
Unlike hyperscale datacenters, telecom and edge deployments face tighter space and power constraints, often requiring compliance with NEBS standards and operation in ambient temperatures exceeding 50°C, all within legacy 19” rack formats. Deploying AI inference or 800G switching in these environments necessitates innovative thermal and electrical strategies.
Co-packaged optics (CPO) illustrate these challenges. By integrating optics with high-bandwidth switch ASICs, CPO reduces latency and power per bit but increases thermal density and raises EMI and mechanical integrity issues. Managing heat, vibration, and long-term reliability demands materials like thermal interface materials (such as gap fillers), along with low-modulus adhesives to maintain mechanical integrity. Incorrect mechanical layering risks early-life failures.
This complex environment means thermal and power systems must be co-designed with compute hardware. Mechanical, electrical, and packaging teams can no longer operate in silos. Every watt saved and degree reduced translates into greater performance, uptime, and deployment flexibility.
As a Key Account Manager collaborating with system architects and hardware teams, I witness firsthand how integrated system design becomes the true differentiator. Success is not just about delivering powerful chips—it’s about enabling those chips to run at full capacity 24/7 in real-world conditions.
In today’s AI-driven landscape, compute power is abundant; the real scarcity lies in delivering and cooling that power efficiently and reliably at scale—especially in telecom and edge contexts. The bottleneck is not the chip but the infrastructure around it.
- Article
- Brochure
- Case study
- Infographic
- White paper
Our experts are here to learn more about your needs.
Our support center and experts are ready to help you find solutions for your business needs.