How much VRAM does InternLM 20B need?

InternLM 20B (20B parameters) requires approximately 21.6 GB of memory with Q5_K_M quantization.

What is the best quantization for InternLM 20B?

The recommended quantization for InternLM 20B is Q5_K_M, which balances quality and memory efficiency.

Can it run?

Can RTX PRO 4500 Blackwell 32GB run InternLM 20B?

Q: Can RTX PRO 4500 Blackwell 32GB run InternLM 20B?

Yes, RTX PRO 4500 Blackwell 32GB can run InternLM 20B with a C grade (Runs well). Expected decode speed: 53.3 tok/s.

CUsable

Runs well

Using Q5_K_M in llama.cpp

Capabilities:

Fit status

Runs well

Decode

53.3 tok/s

TTFT

3631 ms

Safe context

Memory

21.6 GB / 32.0 GB

Memory breakdown

Weights14.4 GB

KV Cache3.1 GB

Runtime0.9 GB

Headroom3.2 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	53.3 tok/s	5282 ms	8K
Chat	C	Runs well	53.3 tok/s	1981 ms	8K
Coding	C	Runs well	53.3 tok/s	3631 ms	8K
RAG	B	Runs well	53.3 tok/s	6603 ms	8K
Reasoning	C	Runs well	53.3 tok/s	4292 ms	8K

Quantization options

How InternLM 20B (20B params) fits at each quantization level on RTX PRO 4500 Blackwell 32GB (32.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	7.8 GB	Low	D34
Q3_K_S	3	9.8 GB	Low	D36
NVFP4	4	11.2 GB	Medium	D37
Q4_K_M	4	12.2 GB	Medium	D37
Q5_K_M	5	14.4 GB	High	D39
Q6_K	6	16.4 GB	High	C40
Q8_0Best for your GPU	8	21.4 GB	Very High	C44
F16	16	41.0 GB	Maximum	F0

Get started

HuggingFace

huggingface-cli download internlm-20b

See all results for RTX PRO 4500 Blackwell 32GB See all hardware for InternLM 20B

Can it run?

Can RTX PRO 4500 Blackwell 32GB run InternLM 20B?

CUsable

Runs well

Using Q5_K_M in llama.cpp

Capabilities:

Fit status

Runs well

Decode

53.3 tok/s

TTFT

3631 ms

Safe context

Memory

21.6 GB / 32.0 GB

Memory breakdown

Weights14.4 GB

KV Cache3.1 GB

Runtime0.9 GB

Headroom3.2 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	53.3 tok/s	5282 ms	8K
Chat	C	Runs well	53.3 tok/s	1981 ms	8K
Coding	C	Runs well	53.3 tok/s	3631 ms	8K
RAG	B	Runs well	53.3 tok/s	6603 ms	8K
Reasoning	C	Runs well	53.3 tok/s	4292 ms	8K

Quantization options

How InternLM 20B (20B params) fits at each quantization level on RTX PRO 4500 Blackwell 32GB (32.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	7.8 GB	Low	D34
Q3_K_S	3	9.8 GB	Low	D36
NVFP4	4	11.2 GB	Medium	D37
Q4_K_M	4	12.2 GB	Medium	D37
Q5_K_M	5	14.4 GB	High	D39
Q6_K	6	16.4 GB	High	C40
Q8_0Best for your GPU	8	21.4 GB	Very High	C44
F16	16	41.0 GB	Maximum	F0

Get started

HuggingFace

huggingface-cli download internlm-20b

See all results for RTX PRO 4500 Blackwell 32GB See all hardware for InternLM 20B