How much VRAM does Qwen 2.5 14B need?

Qwen 2.5 14B (14B parameters) requires approximately 14.3 GB of memory with Q4_K_M quantization.

What is the best quantization for Qwen 2.5 14B?

The recommended quantization for Qwen 2.5 14B is Q4_K_M, which balances quality and memory efficiency.

Can it run?

Can RTX 4090 24GB run Qwen 2.5 14B?

Q: Can RTX 4090 24GB run Qwen 2.5 14B?

Yes, RTX 4090 24GB can run Qwen 2.5 14B with a C grade (Runs well). Expected decode speed: 89.7 tok/s.

CUsable

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

89.7 tok/s

TTFT

2158 ms

Safe context

27K

Memory

14.3 GB / 24.0 GB

Memory breakdown

Weights8.5 GB

KV Cache2.2 GB

Runtime1.2 GB

Headroom2.4 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	89.7 tok/s	3139 ms	47K
Chat	C	Runs well	89.7 tok/s	1177 ms	15K
Coding	C	Runs well	89.7 tok/s	2158 ms	27K
RAG	B	Runs well	89.7 tok/s	3924 ms	47K
Reasoning	C	Runs well	89.7 tok/s	2551 ms	27K

Quantization options

How Qwen 2.5 14B (14B params) fits at each quantization level on RTX 4090 24GB (24.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	5.5 GB	Low	D34
Q3_K_S	3	6.9 GB	Low	D35
NVFP4	4	7.8 GB	Medium	D36
Q4_K_M	4	8.5 GB	Medium	D36
Q5_K_M	5	10.1 GB	High	D38
Q6_K	6	11.5 GB	High	D39
Q8_0Best for your GPU	8	15.0 GB	Very High	C42
F16	16	28.7 GB	Maximum	F0

Get started

Ollama

ollama run qwen-2.5-14b

HuggingFace

huggingface-cli download qwen-2.5-14b

See all results for RTX 4090 24GB See all hardware for Qwen 2.5 14B

Can it run?

Can RTX 4090 24GB run Qwen 2.5 14B?

CUsable

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

89.7 tok/s

TTFT

2158 ms

Safe context

27K

Memory

14.3 GB / 24.0 GB

Memory breakdown

Weights8.5 GB

KV Cache2.2 GB

Runtime1.2 GB

Headroom2.4 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	89.7 tok/s	3139 ms	47K
Chat	C	Runs well	89.7 tok/s	1177 ms	15K
Coding	C	Runs well	89.7 tok/s	2158 ms	27K
RAG	B	Runs well	89.7 tok/s	3924 ms	47K
Reasoning	C	Runs well	89.7 tok/s	2551 ms	27K

Quantization options

How Qwen 2.5 14B (14B params) fits at each quantization level on RTX 4090 24GB (24.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	5.5 GB	Low	D34
Q3_K_S	3	6.9 GB	Low	D35
NVFP4	4	7.8 GB	Medium	D36
Q4_K_M	4	8.5 GB	Medium	D36
Q5_K_M	5	10.1 GB	High	D38
Q6_K	6	11.5 GB	High	D39
Q8_0Best for your GPU	8	15.0 GB	Very High	C42
F16	16	28.7 GB	Maximum	F0

Get started

Ollama

ollama run qwen-2.5-14b

HuggingFace

huggingface-cli download qwen-2.5-14b

See all results for RTX 4090 24GB See all hardware for Qwen 2.5 14B