How much VRAM does Magistral Small 2507 need?

Magistral Small 2507 (24B parameters) requires approximately 23.6 GB of memory with Q4_K_M quantization.

What is the best quantization for Magistral Small 2507?

The recommended quantization for Magistral Small 2507 is Q4_K_M, which balances quality and memory efficiency.

Can it run?

Can NVIDIA A100 40GB run Magistral Small 2507?

Q: Can NVIDIA A100 40GB run Magistral Small 2507?

Yes, NVIDIA A100 40GB can run Magistral Small 2507 with a C grade (Runs well). Expected decode speed: 89.2 tok/s.

CUsable

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

89.2 tok/s

TTFT

2170 ms

Safe context

27K

Memory

23.6 GB / 40.0 GB

Memory breakdown

Weights14.6 GB

KV Cache3.8 GB

Runtime1.2 GB

Headroom4.0 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	89.2 tok/s	3156 ms	47K
Chat	C	Runs well	89.2 tok/s	1184 ms	15K
Coding	C	Runs well	89.2 tok/s	2170 ms	27K
RAG	B	Runs well	89.2 tok/s	3945 ms	47K
Reasoning	C	Runs well	89.2 tok/s	2564 ms	27K

Quantization options

How Magistral Small 2507 (24B params) fits at each quantization level on NVIDIA A100 40GB (40.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	9.4 GB	Low	D34
Q3_K_S	3	11.8 GB	Low	D36
NVFP4	4	13.4 GB	Medium	D36
Q4_K_M	4	14.6 GB	Medium	D37
Q5_K_M	5	17.3 GB	High	D39
Q6_K	6	19.7 GB	High	D40
Q8_0Best for your GPU	8	25.7 GB	Very High	C43
F16	16	49.2 GB	Maximum	F0

Get started

Ollama

ollama run magistral-small-2507

HuggingFace

huggingface-cli download magistral-small-2507

See all results for NVIDIA A100 40GB See all hardware for Magistral Small 2507

Can it run?

Can NVIDIA A100 40GB run Magistral Small 2507?

CUsable

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

89.2 tok/s

TTFT

2170 ms

Safe context

27K

Memory

23.6 GB / 40.0 GB

Memory breakdown

Weights14.6 GB

KV Cache3.8 GB

Runtime1.2 GB

Headroom4.0 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	89.2 tok/s	3156 ms	47K
Chat	C	Runs well	89.2 tok/s	1184 ms	15K
Coding	C	Runs well	89.2 tok/s	2170 ms	27K
RAG	B	Runs well	89.2 tok/s	3945 ms	47K
Reasoning	C	Runs well	89.2 tok/s	2564 ms	27K

Quantization options

How Magistral Small 2507 (24B params) fits at each quantization level on NVIDIA A100 40GB (40.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	9.4 GB	Low	D34
Q3_K_S	3	11.8 GB	Low	D36
NVFP4	4	13.4 GB	Medium	D36
Q4_K_M	4	14.6 GB	Medium	D37
Q5_K_M	5	17.3 GB	High	D39
Q6_K	6	19.7 GB	High	D40
Q8_0Best for your GPU	8	25.7 GB	Very High	C43
F16	16	49.2 GB	Maximum	F0

Get started

Ollama

ollama run magistral-small-2507

HuggingFace

huggingface-cli download magistral-small-2507

See all results for NVIDIA A100 40GB See all hardware for Magistral Small 2507