NVIDIA

RTX PRO 6000 Blackwell Server Edition 96GB

Name: RTX PRO 6000 Blackwell Server Edition 96GB
Brand: NVIDIA

RTX PRO BlackwellDatacenterBlackwellPCIe 5CUDA

96GB

VRAM

1.6kGB/s

Bandwidth

120TFLOPS

FP16 Compute

4kTOPS

INT8 Inference

RTX PRO 6000 Blackwell Server Edition 96GBCategory AvgGaudi 3 128GB

Specifications

Compute

FP16120 TFLOPS

INT84000 TOPS

ArchitectureBlackwell

Memory

VRAM96 GB

Bandwidth1597 GB/s

General

FamilyRTX PRO Blackwell

SegmentDatacenter

InterconnectPCIe 5

Compute PlatformCUDA

Architecture

Blackwell

Blackwell is NVIDIA's fifth-generation RTX architecture, built on TSMC's 4NP process. It introduces 5th-generation Tensor Cores with native FP4 precision support, enabling double the inference throughput per watt compared to Ada Lovelace's FP8 operations. Key innovations include the Neural Rendering Pipeline for AI-driven shading and the debut of GDDR7 memory in consumer GPUs.

AI Relevance

FP4 Tensor Cores deliver the highest tokens-per-watt efficiency in any consumer architecture. Native FP4 quantization means models can run at lower precision with minimal quality loss, effectively doubling the effective VRAM for model weights.

Process: TSMC 4NPPlatform: CUDATensor Cores: Gen 5Precisions: FP32, FP16, BF16, FP8, FP4, INT8, INT4

Recommendations by Workload

Agentic Coding

Qwen3-Coder-Next

This model is still usable for agentic-coding, but it is not the most specialized pick. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Decode 83.3 tok/s · 51K ctx · llama.cpp

60.2 GB / 96.0 GB VRAM

Chat

Qwen3-Coder-Next

This model is still usable for chat, but it is not the most specialized pick. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Decode 83.3 tok/s · 13K ctx · llama.cpp

60.1 GB / 96.0 GB VRAM

Coding

Qwen3-Coder-Next

This model is a direct match for coding. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Decode 83.3 tok/s · 26K ctx · llama.cpp

60.1 GB / 96.0 GB VRAM

RAG

Qwen3-Coder-Next

This model is still usable for rag, but it is not the most specialized pick. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Decode 83.3 tok/s · 51K ctx · llama.cpp

60.2 GB / 96.0 GB VRAM

Reasoning

Qwen3-Coder-Next

This model is a direct match for reasoning. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Decode 83.3 tok/s · 26K ctx · llama.cpp

60.1 GB / 96.0 GB VRAM

Full Model Compatibility

Qwen3-Coder-Next

B55

80B60.1 GB83 tok/s26K ctx

RTX PRO 6000 Blackwell Server Edition 96GB

Specifications

Blackwell

Recommendations by Workload

Full Model Compatibility

Models you could run with an upgrade

Upgrade from RTX PRO 6000 Blackwell Server Edition 96GB

Upgrade options

RTX PRO 6000 Blackwell Server Edition 96GB

Specifications

Blackwell

Recommendations by Workload

Full Model Compatibility

Models you could run with an upgrade

Upgrade from RTX PRO 6000 Blackwell Server Edition 96GB

Upgrade options