How much VRAM does Baichuan 13B need?

Baichuan 13B (13B parameters) requires approximately 15.0 GB of memory with Q5_K_M quantization.

What is the best quantization for Baichuan 13B?

The recommended quantization for Baichuan 13B is Q5_K_M, which balances quality and memory efficiency.

Can it run?

Can RTX 5090 Laptop 24GB run Baichuan 13B?

Q: Can RTX 5090 Laptop 24GB run Baichuan 13B?

Yes, RTX 5090 Laptop 24GB can run Baichuan 13B with a C grade (Runs well). Expected decode speed: 82.0 tok/s.

CUsable

Runs well

Using Q5_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

82.0 tok/s

TTFT

2360 ms

Safe context

Memory

15.0 GB / 24.0 GB

Memory breakdown

Weights9.4 GB

KV Cache2.0 GB

Runtime1.2 GB

Headroom2.4 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	82.0 tok/s	3433 ms	8K
Chat	C	Runs well	82.0 tok/s	1288 ms	8K
Coding	C	Runs well	82.0 tok/s	2360 ms	8K
RAG	B	Runs well	82.0 tok/s	4292 ms	8K
Reasoning	C	Runs well	82.0 tok/s	2790 ms	8K

Quantization options

How Baichuan 13B (13B params) fits at each quantization level on RTX 5090 Laptop 24GB (24.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	5.1 GB	Low	D33
Q3_K_S	3	6.4 GB	Low	D34
NVFP4	4	7.3 GB	Medium	D35
Q4_K_M	4	7.9 GB	Medium	D36
Q5_K_M	5	9.4 GB	High	D37
Q6_K	6	10.7 GB	High	D38
Q8_0Best for your GPU	8	13.9 GB	Very High	C41
F16	16	26.7 GB	Maximum	F0

Get started

HuggingFace

huggingface-cli download baichuan-13b

See all results for RTX 5090 Laptop 24GB See all hardware for Baichuan 13B

Can it run?

Can RTX 5090 Laptop 24GB run Baichuan 13B?

CUsable

Runs well

Using Q5_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

82.0 tok/s

TTFT

2360 ms

Safe context

Memory

15.0 GB / 24.0 GB

Memory breakdown

Weights9.4 GB

KV Cache2.0 GB

Runtime1.2 GB

Headroom2.4 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	82.0 tok/s	3433 ms	8K
Chat	C	Runs well	82.0 tok/s	1288 ms	8K
Coding	C	Runs well	82.0 tok/s	2360 ms	8K
RAG	B	Runs well	82.0 tok/s	4292 ms	8K
Reasoning	C	Runs well	82.0 tok/s	2790 ms	8K

Quantization options

How Baichuan 13B (13B params) fits at each quantization level on RTX 5090 Laptop 24GB (24.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	5.1 GB	Low	D33
Q3_K_S	3	6.4 GB	Low	D34
NVFP4	4	7.3 GB	Medium	D35
Q4_K_M	4	7.9 GB	Medium	D36
Q5_K_M	5	9.4 GB	High	D37
Q6_K	6	10.7 GB	High	D38
Q8_0Best for your GPU	8	13.9 GB	Very High	C41
F16	16	26.7 GB	Maximum	F0

Get started

HuggingFace

huggingface-cli download baichuan-13b

See all results for RTX 5090 Laptop 24GB See all hardware for Baichuan 13B