#1 out of 2
technology1d ago
New AI method lets models think harder while avoiding costly bandwidth
- DeepSeek's Engram decouples memory storage from computation to reduce memory demands in AI models.
- The method aims to reduce high-speed memory needs by enabling lookups for static information.
- Engram supports asynchronous prefetching across multiple GPUs with minimal overhead.
- The Engram approach works with existing GPU and system memory architectures, potentially avoiding costly HBM upgrades.
- Early tests on a 27-billion-parameter model reported measurable improvements on standard benchmarks.
- The research was conducted in collaboration with Peking University to validate Engram.
- The approach aligns with Compute Express Link (CXL) standards to ease GPU memory bottlenecks.
- Engram could ease memory constraints in AI infrastructure, potentially easing DRAM price swings.
- The TechRadar Pro article emphasizes Engram as a complementary option to AI accelerators.
- Engram uses hashed N-grams for memory lookups to support deterministic retrieval.
- The article notes memory pricing dynamics as DRAM demand rose with AI workloads.
Vote 0

