GO33 AI Server Review — 2025

Supermicro AS-8126GS-NB3RT: Eight NVIDIA Blackwell B300 GPUs in a Single 8U Chassis

✍ Parmy Buta — Solution Design Specialist 📅 June 2025 ⏱ 15 min read 🏭 Tested: GO33 London Lab

The Supermicro AS-8126GS-NB3RT-01-G2 is a Gold Series 8U AI server built around the NVIDIA HGX B300 NVL8 — eight Blackwell B300 GPUs unified via NVLink into a single coherent memory fabric. Paired with dual AMD EPYC 9575F processors, 3TB of DDR5-6400, and PCIe 5.0 E1.S NVMe storage, this is the most compute-dense single-node GPU platform available for enterprise AI in 2025. Here is the full GO33 verdict.

9.2

GO33 Score / 10

⚡ HGX B300 NVL8

🔥 Dual EPYC 9575F

💾 3TB DDR5-6400

🚀 63.3TB PCIe 5.0 NVMe

🌐 400GbE CX7

⏰ Ships in 24 Hours

Parmy Buta

Solution Design Specialist — GO33

Parmy Buta leads solution design at GO33, with hands-on experience across Supermicro AI server and storage platforms, NVIDIA GPU infrastructure, and enterprise data centre architecture. All GO33 reviews are based on direct physical testing in our London lab. Vendors have no editorial input into scores or findings.

🏭 GO33 London Lab 📋 Solution Design Specialist ✅ Editorially Independent

At a Glance

B300 GPUs (NVL8)

3TB

DDR5-6400 RAM

128C

CPU Cores Total

63.3TB

NVMe Storage

400G

Network BW

Rack Height

Full Technical Specifications

Component	Specification
Model	Supermicro AS-8126GS-NB3RT-01-G2
Form Factor	8U Rackmount — 1 Node
Series	A+ Gold Series (Ready to Ship)
CPU	2× AMD EPYC 9575F — 64-core, 3.30GHz base, 256MB L3 cache, 400W TDP each
Total Cores / Threads	128 cores / 256 threads
Memory	24× 128GB DDR5-6400 ECC RDIMM = 3TB total
GPU	1× NVIDIA HGX B300 NVL8 (8× Blackwell B300 GPUs, NVLink 5 interconnect)
Boot Storage	2× 1.9TB M.2 NVMe PCIe 4.0 (Opal-capable)
Data Storage	8× 7.68TB E1.S NVMe PCIe 5.0 (1 DWPD)
Total Raw NVMe	~63.3TB
Network	2× CX7 200GbE QSFP112 — NDR InfiniBand + Ethernet + 10GbE RJ45 on-board
Total Network BW	400Gb/s
Management	IPMI 2.0 — dedicated management port
OS Support	Ubuntu 22.04 LTS, RHEL 9, Rocky Linux 9, Windows Server 2022
Ships Within	24 Hours (Gold Series)

Pros & Cons

Pros

NVIDIA HGX B300 NVL8 — the most capable single-node GPU board in 2025
Dual EPYC 9575F: 128 cores, 256MB L3 per socket — serious CPU headroom
3TB DDR5-6400 across 24 DIMMs — handles full-precision 140B model serving
PCIe 5.0 E1.S NVMe — 12GB/s+ per slot for checkpoint and dataset I/O
Dual CX7: NDR InfiniBand and Ethernet on the same NIC — no separate HCA
Gold Series in-stock — ships within 24 hours, no lead time uncertainty
Single 8U node simplifies GPU-to-CPU topology vs multi-node configs

Cons

8U footprint is large — plan rack density before ordering
Peak power draw ~10–14kW — verify data centre circuit budget
Blackwell driver/framework ecosystem still maturing in early 2025
Price-per-node is very high — best at sustained, high-utilisation workloads
E1.S at 1 DWPD — not ideal for write-heavy checkpoint workflows

Inside the AS-8126GS-NB3RT

NVIDIA HGX B300 NVL8 — Blackwell Architecture

The HGX B300 NVL8 baseboard houses eight Blackwell B300 GPUs linked via NVLink 5 into a unified GPU fabric. Blackwell’s second-generation transformer engine delivers native FP4/FP8 throughput, higher HBM3e bandwidth per GPU, and improved all-reduce performance versus Hopper. For LLM training at 30B–140B parameter scale, the flat NVLink memory space eliminates the NCCL tuning overhead of multi-node tensor parallelism — a meaningful operational advantage for smaller AI teams.

In our London lab we validated the system on Ubuntu 22.04 LTS with CUDA 12.x and PyTorch 2.x nightly builds. Supermicro’s Gold Series pre-validation means driver bundles are tested before shipment — important when you are paying for this class of compute.

Dual AMD EPYC 9575F

Two EPYC 9575F processors deliver 128 total cores at 3.30GHz base, 512MB aggregate L3 cache, and a combined 800W TDP. This is AMD’s Genoa-X generation built for throughput-heavy workloads. The CPU pairing provides ample headroom for tokenisation pipelines, dataset preprocessing, and serving-side request routing without GPU-CPU bandwidth contention — a compromise many GPU-dense platforms make, and which this system avoids.

3TB DDR5-6400 Memory

All 24 DIMM slots are populated with 128GB DDR5-6400 ECC RDIMMs. For memory-capacity-bound LLM serving — running a 70B model at full BF16 precision requires roughly 140GB, a 140B model approximately 280GB — the 3TB headroom allows multiple concurrent large models in memory simultaneously. This matters for multi-tenant inference serving where model switching latency is a cost driver.

Storage: 8× E1.S PCIe 5.0

Eight 7.68TB E1.S PCIe 5.0 drives deliver 61.4TB of high-bandwidth data storage. At PCIe 5.0 speeds each slot sustains over 12GB/s sequential read, making dataset streaming and model checkpoint reads highly capable. The 1 DWPD endurance rating suits read-heavy workloads well; for environments checkpointing every few hundred training iterations, plan write amplification or consider higher-endurance E1.S SKUs.

Networking: Dual CX7 at 200GbE

Two CX7 add-on cards provide 400Gb/s total bandwidth via QSFP112 ports. Each CX7 supports both NDR InfiniBand RDMA and RoCEv2 Ethernet — you can run GPU-to-GPU cluster traffic over InfiniBand or standard Ethernet without swapping hardware as your cluster scales. The on-board 10GbE RJ45 is cleanly separated for management traffic, IPMI, and out-of-band access.

GO33 Lab Methodology: Systems are tested in our London data centre facility on production-grade power with remote IPMI management enabled. GPU workloads run on vendor-recommended Linux stacks with validated driver versions. Benchmark dates and configuration details are recorded per run. Supermicro had no editorial input on this review.

Benchmark Results

Relative Performance — Normalised vs 8× H100 SXM Hopper Baseline (100 = baseline)

LLM Training Throughput (70B FP8, tokens/sec)96

Inference Latency (p99, 13B INT8, batch=32)93

HPC FP64 Throughput (LINPACK)88

Checkpoint I/O (E1.S PCIe 5.0, sequential write)91

GPU HBM3e Memory Bandwidth (aggregate)97

NCCL All-Reduce (400GbE RDMA)90

GO33 London Lab, June 2025. Tools: MLPerf Inference v4.0, fio 3.36, custom NCCL all-reduce harness. Scores normalised relative to 8× H100 SXM baseline. Higher = better.

Best Used For

🧠

LLM Training (30B–140B)

AI research teams training large language models where single-node NVLink removes multi-node NCCL tuning complexity.

📊

Generative AI Inference

Production serving of large multimodal or text models at scale. 3TB RAM enables full-precision loading of multiple concurrent models.

🔬

Drug Discovery & Life Sciences

Molecular dynamics and protein structure prediction requiring FP64 precision and large per-run memory footprints.

💸

Financial AI & Fraud Detection

Real-time inference on large graph neural networks where sub-millisecond latency is a regulatory or compliance requirement.

🚗

Autonomous Vehicle R&D

Training perception and planning models on large multi-sensor datasets with PCIe 5.0 streaming headroom.

🏢

On-Prem Cloud Replacement

Enterprises repatriating GPU workloads from cloud. A single node at 80% utilisation amortises over 3 years vs equivalent H100 SXM on-demand pricing.

Not Right For

Teams training models under 7B parameters — a 4U 4-GPU system delivers better cost efficiency
Edge inference where rack space and power are severely constrained
Kubernetes workloads requiring GPU fractioning across many small jobs — consider MIG-capable H100 configs

Frequently Asked Questions

What is the NVIDIA HGX B300 NVL8 and how does it compare to HGX H100 SXM?

The HGX B300 NVL8 uses NVIDIA’s Blackwell architecture — the successor to Hopper (H100/H200). Blackwell introduces a second-generation transformer engine with native FP4/FP8 support, higher HBM3e memory bandwidth per GPU, and NVLink 5. In practice Blackwell delivers substantially higher tokens-per-second on large LLM inference versus an equivalent H100 SXM system.

Which Linux distributions are validated on this system?

Supermicro validates Ubuntu 22.04 LTS, RHEL 9, and Rocky Linux 9. CUDA 12.x and corresponding NVIDIA driver stacks are pre-tested as part of the Gold Series configuration. Windows Server 2022 is supported for non-AI workloads.

Can I scale this to a multi-node GPU cluster?

Yes. The dual CX7 NICs support NDR InfiniBand for multi-node NCCL all-reduce. Connect multiple AS-8126GS-NB3RT nodes via an NDR InfiniBand switch fabric to scale beyond a single node’s 8-GPU boundary without changing your NIC hardware.

How much power does this server consume at full GPU load?

Budget 10–14kW per node at peak training load depending on GPU power limits and cooling configuration. Supermicro’s IPMI power monitoring provides real-time per-node wattage. Confirm your data centre circuit and PDU ratings before deploying.

Is the Gold Series a fixed config or can it be customised?

The AS-8126GS-NB3RT-01-G2 is a pre-validated configuration that ships within 24 hours. A barebone variant is available for teams sourcing CPUs, DIMMs, and storage separately or to a different specification.

What management interfaces does this server include?

IPMI 2.0 with a dedicated management port for out-of-band remote KVM, power control, fan management, and firmware updates. Supermicro SuperDoctor 5 handles OS-level health monitoring and integrates with standard DCIM platforms.

GO33 Verdict

The Definitive Single-Node AI Training Platform for 2025

The Supermicro AS-8126GS-NB3RT delivers the NVIDIA Blackwell HGX B300 NVL8 in a properly resourced 8U platform — no CPU, memory, or storage compromises. If you are running LLM training or high-throughput generative AI inference on-premises at scale, this is the benchmark platform for 2025. Gold Series availability means you can have it racked and running inside a week.

GO33 Score: 9.2 / 10 — Strongly recommended for enterprise AI training and inference teams.

View on Supermicro Store Get a GO33 Quote

GO33 Transparency Notice: GO33 is a commercial reseller and reviewer of AI hardware. We may earn revenue if you enquire or purchase through our links. Vendor editorial teams have no input into our reviews, scores, or findings. Hardware reviewed was sourced independently.