How much VRAM does Granite Code 20B need?

Granite Code 20B (20B parameters) requires approximately 20.0 GB of memory with Q4_K_M quantization.

What is the best quantization for Granite Code 20B?

The recommended quantization for Granite Code 20B is Q4_K_M, which balances quality and memory efficiency.

Can it run?

Can MacBook Pro M4 32GB run Granite Code 20B?

Q: Can MacBook Pro M4 32GB run Granite Code 20B?

Yes, MacBook Pro M4 32GB can run Granite Code 20B with a C grade (Tight fit). Expected decode speed: 7.1 tok/s.

CUsable

Tight fit

Using Q4_K_M in Ollama

Capabilities:

Fit status

Tight fit

Decode

7.1 tok/s

TTFT

27337 ms

Safe context

Memory

20.0 GB / 23.0 GB

Memory breakdown

Weights12.2 GB

KV Cache3.1 GB

Runtime1.2 GB

Headroom3.5 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs with offload (needs ~0 GB host RAM)	7.1 tok/s	39797 ms	8K
Chat	C	Runs well	7.1 tok/s	14911 ms	8K
Coding	C	Tight fit	7.1 tok/s	27337 ms	8K
RAG	C	Runs with offload (needs ~0 GB host RAM)	7.1 tok/s	49746 ms	8K
Reasoning	C	Tight fit	7.1 tok/s	32307 ms	8K

Quantization options

How Granite Code 20B (20B params) fits at each quantization level on MacBook Pro M4 32GB (23.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	7.8 GB	Low	D36
Q3_K_S	3	9.8 GB	Low	D38
NVFP4	4	11.2 GB	Medium	D39
Q4_K_M	4	12.2 GB	Medium	C40
Q5_K_M	5	14.4 GB	High	C42
Q6_KBest for your GPU	6	16.4 GB	High	C44
Q8_0	8	21.4 GB	Very High	C44
F16	16	41.0 GB	Maximum	F0

Get started

Ollama

ollama run granite-code-20b

HuggingFace

huggingface-cli download granite-code-20b

Upgrade options

Hardware that runs Granite Code 20B well

RX 7900 XTX 24GBBudget pick

B56.7 tok/s decode

~$999 MSRP

RTX 3090 24GBBest value

C53.7 tok/s decode

~$1,499 MSRP

RTX 4090 24GBBiggest leap

B62.8 tok/s decode

~$1,599 MSRP

MacBook Pro M4 Max 36GBApple upgrade

C21.2 tok/s decode

~$2,499 MSRP

See all results for MacBook Pro M4 32GB See all hardware for Granite Code 20B

Can it run?

Can MacBook Pro M4 32GB run Granite Code 20B?

CUsable

Tight fit

Using Q4_K_M in Ollama

Capabilities:

Fit status

Tight fit

Decode

7.1 tok/s

TTFT

27337 ms

Safe context

Memory

20.0 GB / 23.0 GB

Memory breakdown

Weights12.2 GB

KV Cache3.1 GB

Runtime1.2 GB

Headroom3.5 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs with offload (needs ~0 GB host RAM)	7.1 tok/s	39797 ms	8K
Chat	C	Runs well	7.1 tok/s	14911 ms	8K
Coding	C	Tight fit	7.1 tok/s	27337 ms	8K
RAG	C	Runs with offload (needs ~0 GB host RAM)	7.1 tok/s	49746 ms	8K
Reasoning	C	Tight fit	7.1 tok/s	32307 ms	8K

Quantization options

How Granite Code 20B (20B params) fits at each quantization level on MacBook Pro M4 32GB (23.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	7.8 GB	Low	D36
Q3_K_S	3	9.8 GB	Low	D38
NVFP4	4	11.2 GB	Medium	D39
Q4_K_M	4	12.2 GB	Medium	C40
Q5_K_M	5	14.4 GB	High	C42
Q6_KBest for your GPU	6	16.4 GB	High	C44
Q8_0	8	21.4 GB	Very High	C44
F16	16	41.0 GB	Maximum	F0

Get started

Ollama

ollama run granite-code-20b

HuggingFace

huggingface-cli download granite-code-20b

Upgrade options

Hardware that runs Granite Code 20B well

RX 7900 XTX 24GBBudget pick

B56.7 tok/s decode

~$999 MSRP

RTX 3090 24GBBest value

C53.7 tok/s decode

~$1,499 MSRP

RTX 4090 24GBBiggest leap

B62.8 tok/s decode

~$1,599 MSRP

MacBook Pro M4 Max 36GBApple upgrade

C21.2 tok/s decode

~$2,499 MSRP

See all results for MacBook Pro M4 32GB See all hardware for Granite Code 20B