Will It Run AI
CalculatorModelsHardwareCompare
Product
  • Calculator
  • Compare
  • Tier List
Browse
  • Models
  • Hardware
  • Docs
About
  • Why It Works
  • What's New
  • Legal Notice
  • Privacy Policy

All estimates are approximations based on mathematical models and public specifications. Actual performance may vary. Do not make purchasing decisions based solely on these estimates.

Data sourced from Hugging Face, Ollama, and official model documentation. Model names and logos are trademarks of their respective owners.

© 2026 Will It Run AI — Fase Consulting Ibiza, S.L. (NIF: B57969656)

Home/Qwen 2.5 32B/on RTX 5090 32GB

Can it run?

Can RTX 5090 32GB run Qwen 2.5 32B?

CUsable

Tight fit

Using Q4_K_M in Ollama

Capabilities:

Fit status

Tight fit

Decode

61.5 tok/s

TTFT

3148 ms

Safe context

18K

Memory

28.9 GB / 32.0 GB

Memory breakdown

Weights19.5 GB
KV Cache5.0 GB
Runtime1.2 GB
Headroom3.2 GB

Performance by workload

WorkloadGradeFitDecodeTTFTContext
Agentic CodingCRuns with offload (needs ~1.1 GB host RAM)58.7 tok/s4795 ms30K
ChatCTight fit61.5 tok/s1717 ms10K
CodingCTight fit61.5 tok/s3148 ms18K
RAGCRuns with offload (needs ~1.1 GB host RAM)58.7 tok/s5994 ms30K
ReasoningCTight fit61.5 tok/s3720 ms18K

Quantization options

How Qwen 2.5 32B (32B params) fits at each quantization level on RTX 5090 32GB (32.0 GB usable).

QuantBitsVRAMQualityFit
Q2_K
2
12.5 GB
LowD38
Q3_K_S
3
15.7 GB
LowD40
NVFP4
4
17.9 GB
MediumC41
Q4_K_M
4
19.5 GB
MediumC42
Q5_K_MBest for your GPU
5
23.0 GB
HighC44
Q6_K
6
26.2 GB
HighC44
Q8_0
8
34.2 GB
Very HighF0
F16
16
65.6 GB
MaximumF0

Get started

Ollama
ollama run qwen-2.5-32b
HuggingFace
huggingface-cli download qwen-2.5-32b

Upgrade options

Hardware that runs Qwen 2.5 32B well

NVIDIANVIDIA A100 40GBBudget pick
B66.9 tok/s decode

~$10,000 MSRP

NVIDIARTX PRO 5000 Blackwell 48GBBiggest leap
C57.8 tok/s decode

 

NVIDIARTX 6000 Ada 48GBNVIDIA upgrade
C40.3 tok/s decode

 

See all results for RTX 5090 32GBSee all hardware for Qwen 2.5 32B