3.SRAM vs. HBM: A Memory Revolution- SRAM’s speed beats HBM’s high-capacity but slow access, critical for inference where data moves constantly—e.g., 140GB per word in a 70B parameter model.
- Wafer-scale integration solves SRAM’s capacity limits, reducing chip count from thousands to a handful, cutting complexity and power.
- Current AI algorithms waste 93-95% of GPU capacity during inference, signaling vast room for hardware-algorithm synergy.
Quote: Andrew Feldman “In a GPU, most of the time it’s doing inference, it’s 5 or 7% utilized. That means it’s 95 or 93% wasted.”