How much VRAM does Devstral Small 1.1 need?

Devstral Small 1.1 (24B parameters) requires approximately 26.5 GB of memory with Q4_K_M quantization.

What is the best quantization for Devstral Small 1.1?

The recommended quantization for Devstral Small 1.1 is Q4_K_M, which balances quality and memory efficiency.

Can it run?

Can MacBook Pro M4 Max 64GB run Devstral Small 1.1?

Q: Can MacBook Pro M4 Max 64GB run Devstral Small 1.1?

Yes, MacBook Pro M4 Max 64GB can run Devstral Small 1.1 with a C grade (Runs well). Expected decode speed: 23.5 tok/s.

CUsable

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

23.5 tok/s

TTFT

8240 ms

Safe context

28K

Memory

26.5 GB / 46.1 GB

Memory breakdown

Weights14.6 GB

KV Cache3.8 GB

Runtime1.2 GB

Headroom6.9 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs well	23.5 tok/s	11985 ms	49K
Chat	C	Runs well	23.5 tok/s	4494 ms	15K
Coding	C	Runs well	23.5 tok/s	8240 ms	28K
RAG	C	Runs well	23.5 tok/s	14981 ms	49K
Reasoning	C	Runs well	23.5 tok/s	9738 ms	28K

Quantization options

How Devstral Small 1.1 (24B params) fits at each quantization level on MacBook Pro M4 Max 64GB (46.1 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	9.4 GB	Low	D34
Q3_K_S	3	11.8 GB	Low	D35
NVFP4	4	13.4 GB	Medium	D36
Q4_K_M	4	14.6 GB	Medium	D36
Q5_K_M	5	17.3 GB	High	D37
Q6_K	6	19.7 GB	High	D39
Q8_0Best for your GPU	8	25.7 GB	Very High	C41
F16	16	49.2 GB	Maximum	F0

Get started

Ollama

ollama run devstral-small-2507

HuggingFace

huggingface-cli download devstral-small-2507

Upgrade options

Hardware that runs Devstral Small 1.1 well

RTX A6000 48GBBudget pick

C39.9 tok/s decode

~$4,650 MSRP

NVIDIA L40 48GBBest value

C46 tok/s decode

~$5,500 MSRP

RTX PRO 5000 Blackwell 48GBBiggest leap

C77.1 tok/s decode

See all results for MacBook Pro M4 Max 64GB See all hardware for Devstral Small 1.1

Can it run?

Can MacBook Pro M4 Max 64GB run Devstral Small 1.1?

CUsable

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

23.5 tok/s

TTFT

8240 ms

Safe context

28K

Memory

26.5 GB / 46.1 GB

Memory breakdown

Weights14.6 GB

KV Cache3.8 GB

Runtime1.2 GB

Headroom6.9 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs well	23.5 tok/s	11985 ms	49K
Chat	C	Runs well	23.5 tok/s	4494 ms	15K
Coding	C	Runs well	23.5 tok/s	8240 ms	28K
RAG	C	Runs well	23.5 tok/s	14981 ms	49K
Reasoning	C	Runs well	23.5 tok/s	9738 ms	28K

Quantization options

How Devstral Small 1.1 (24B params) fits at each quantization level on MacBook Pro M4 Max 64GB (46.1 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	9.4 GB	Low	D34
Q3_K_S	3	11.8 GB	Low	D35
NVFP4	4	13.4 GB	Medium	D36
Q4_K_M	4	14.6 GB	Medium	D36
Q5_K_M	5	17.3 GB	High	D37
Q6_K	6	19.7 GB	High	D39
Q8_0Best for your GPU	8	25.7 GB	Very High	C41
F16	16	49.2 GB	Maximum	F0

Get started

Ollama

ollama run devstral-small-2507

HuggingFace

huggingface-cli download devstral-small-2507

Upgrade options

Hardware that runs Devstral Small 1.1 well

RTX A6000 48GBBudget pick

C39.9 tok/s decode

~$4,650 MSRP

NVIDIA L40 48GBBest value

C46 tok/s decode

~$5,500 MSRP

RTX PRO 5000 Blackwell 48GBBiggest leap

C77.1 tok/s decode

See all results for MacBook Pro M4 Max 64GB See all hardware for Devstral Small 1.1