Colossus is xAI’s groundbreaking AI supercomputer cluster, designed for training advanced AI models like Grok. Launched in 2024 in Memphis, Tennessee, it has rapidly expanded to become the world’s largest AI training system as of January 2026. Built in a record 122 days, Colossus emphasizes massive scale, efficiency, and rapid deployment.
Key Specifications
GPUs: Over 450,000 NVIDIA Hopper GPUs (H100/H200), with plans to reach 900,000+ by mid-2026. Initial phase: 100,000 GPUs, expanded to 200,000 in late 2024.
Networking: NVIDIA Spectrum-X Ethernet with 3.6 Tbps bandwidth per server and 400 Gbps BlueField-3 SuperNICs, achieving 95% data throughput.
Cooling: Liquid-cooled systems supported by air-cooled chillers for up to 200 MW capacity.
Storage: Over 1 Exabyte (EB) for AI data, with 194 petabytes per second memory bandwidth.
Power: Exceeds 100 MW initially, scaling to 1+ GW with expansions, backed by Tesla Megapacks.
Compute Power: Equivalent to ~1.4 million H100 GPUs in raw performance by 2026.
Development Timeline
2024: Construction begins; operational in July with 100,000 GPUs. Doubled to 200,000 by year-end.
2025-2026: Expansion to Colossus 2 (Gigawatt+ scale); further growth to 1 million+ GPUs via new facilities.