How much VRAM does StableLM 2 12B need?

StableLM 2 12B (12B parameters) requires approximately 13.0 GB of memory with Q5_K_M quantization.

What is the best quantization for StableLM 2 12B?

The recommended quantization for StableLM 2 12B is Q5_K_M, which balances quality and memory efficiency.

Can it run?

Can RTX 4080 Super 16GB run StableLM 2 12B?

Q: Can RTX 4080 Super 16GB run StableLM 2 12B?

Yes, RTX 4080 Super 16GB can run StableLM 2 12B with a B grade (Runs well). Expected decode speed: 72.1 tok/s.

BGood

Runs well

Using Q5_K_M in llama.cpp

Capabilities:

Fit status

Runs well

Decode

72.1 tok/s

TTFT

2684 ms

Safe context

Memory

13.0 GB / 16.0 GB

Memory breakdown

Weights8.6 GB

KV Cache1.9 GB

Runtime0.9 GB

Headroom1.6 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Tight fit	72.1 tok/s	3905 ms	4K
Chat	B	Runs well	72.1 tok/s	1464 ms	4K
Coding	B	Runs well	72.1 tok/s	2684 ms	4K
RAG	C	Tight fit	72.1 tok/s	4881 ms	4K
Reasoning	B	Runs well	72.1 tok/s	3173 ms	4K

Quantization options

How StableLM 2 12B (12B params) fits at each quantization level on RTX 4080 Super 16GB (16.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	4.7 GB	Low	D35
Q3_K_S	3	5.9 GB	Low	D36
NVFP4	4	6.7 GB	Medium	D37
Q4_K_M	4	7.3 GB	Medium	D38
Q5_K_M	5	8.6 GB	High	D40
Q6_KBest for your GPU	6	9.8 GB	High	C42
Q8_0	8	12.8 GB	Very High	C43
F16	16	24.6 GB	Maximum	F0

Get started

HuggingFace

huggingface-cli download stablelm-2-12b

See all results for RTX 4080 Super 16GB See all hardware for StableLM 2 12B

Can it run?

Can RTX 4080 Super 16GB run StableLM 2 12B?

BGood

Runs well

Using Q5_K_M in llama.cpp

Capabilities:

Fit status

Runs well

Decode

72.1 tok/s

TTFT

2684 ms

Safe context

Memory

13.0 GB / 16.0 GB

Memory breakdown

Weights8.6 GB

KV Cache1.9 GB

Runtime0.9 GB

Headroom1.6 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Tight fit	72.1 tok/s	3905 ms	4K
Chat	B	Runs well	72.1 tok/s	1464 ms	4K
Coding	B	Runs well	72.1 tok/s	2684 ms	4K
RAG	C	Tight fit	72.1 tok/s	4881 ms	4K
Reasoning	B	Runs well	72.1 tok/s	3173 ms	4K

Quantization options

How StableLM 2 12B (12B params) fits at each quantization level on RTX 4080 Super 16GB (16.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	4.7 GB	Low	D35
Q3_K_S	3	5.9 GB	Low	D36
NVFP4	4	6.7 GB	Medium	D37
Q4_K_M	4	7.3 GB	Medium	D38
Q5_K_M	5	8.6 GB	High	D40
Q6_KBest for your GPU	6	9.8 GB	High	C42
Q8_0	8	12.8 GB	Very High	C43
F16	16	24.6 GB	Maximum	F0

Get started

HuggingFace

huggingface-cli download stablelm-2-12b

See all results for RTX 4080 Super 16GB See all hardware for StableLM 2 12B