How much VRAM does Ministral 8B need?

Ministral 8B (8B parameters) requires approximately 8.4 GB of memory with Q4_K_M quantization.

What is the best quantization for Ministral 8B?

The recommended quantization for Ministral 8B is Q4_K_M, which balances quality and memory efficiency.

Can it run?

Yes, RTX 2080 Ti 11GB can run Ministral 8B with a B grade (Runs well). Expected decode speed: 90.9 tok/s.

BGood

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

90.9 tok/s

TTFT

2130 ms

Safe context

21K

Memory

8.4 GB / 11.0 GB

Weights4.9 GB

KV Cache1.3 GB

Runtime1.2 GB

Headroom1.1 GB

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Tight fit	82.0 tok/s	3432 ms	36K
Chat	B	Runs well	82.0 tok/s	1287 ms	11K
Coding	B	Runs well	90.9 tok/s	2130 ms	21K
RAG	C	Tight fit	82.0 tok/s	4290 ms	36K
Reasoning	B	Runs well	82.0 tok/s	2789 ms	21K

How Ministral 8B (8B params) fits at each quantization level on RTX 2080 Ti 11GB (11.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	3.1 GB	Low	D35
Q3_K_S	3	3.9 GB	Low	D36
NVFP4	4

Ollama

ollama run ministral-8b

HuggingFace

huggingface-cli download ministral-8b

Upgrade options

RTX 5070 12GBBudget pick

B86.8 tok/s decode

~$549 MSRP

RTX 3080 12GBBest value

B142 tok/s decode

~$799 MSRP

RTX 3080 Ti 12GBBiggest leap

B138.3 tok/s decode

~$1,199 MSRP