How much VRAM does Falcon 40B Instruct need?

Falcon 40B Instruct (40B parameters) requires approximately 42.7 GB of memory with Q5_K_M quantization.

What is the best quantization for Falcon 40B Instruct?

The recommended quantization for Falcon 40B Instruct is Q5_K_M, which balances quality and memory efficiency.

Can it run?

Can AMD Instinct MI210 64GB run Falcon 40B Instruct?

Q: Can AMD Instinct MI210 64GB run Falcon 40B Instruct?

Yes, AMD Instinct MI210 64GB can run Falcon 40B Instruct with a C grade (Runs well). Expected decode speed: 39.4 tok/s.

CUsable

Runs well

Using Q5_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

39.4 tok/s

TTFT

4908 ms

Safe context

Memory

42.7 GB / 64.0 GB

Memory breakdown

Weights28.8 GB

KV Cache6.3 GB

Runtime1.2 GB

Headroom6.4 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs well	39.4 tok/s	7138 ms	8K
Chat	C	Runs well	39.4 tok/s	2677 ms	8K
Coding	C	Runs well	39.4 tok/s	4908 ms	8K
RAG	C	Runs well	39.4 tok/s	8923 ms	8K
Reasoning	C	Runs well	39.4 tok/s	5800 ms	8K

Quantization options

How Falcon 40B Instruct (40B params) fits at each quantization level on AMD Instinct MI210 64GB (64.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	15.6 GB	Low	D34
Q3_K_S	3	19.6 GB	Low	D35
NVFP4	4	22.4 GB	Medium	D36
Q4_K_M	4	24.4 GB	Medium	D37
Q5_K_M	5	28.8 GB	High	D38
Q6_K	6	32.8 GB	High	D40
Q8_0Best for your GPU	8	42.8 GB	Very High	C43
F16	16	82.0 GB	Maximum	F0

Get started

Ollama

ollama run falcon-40b-instruct

HuggingFace

huggingface-cli download falcon-40b-instruct

Upgrade options

Hardware that runs Falcon 40B Instruct well

NVIDIA H100 80GBBudget pick

C99.7 tok/s decode

~$40,000 MSRP

NVIDIA H800 80GBBiggest leap

C86.1 tok/s decode

See all results for AMD Instinct MI210 64GB See all hardware for Falcon 40B Instruct

Can it run?

Can AMD Instinct MI210 64GB run Falcon 40B Instruct?

CUsable

Runs well

Using Q5_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

39.4 tok/s

TTFT

4908 ms

Safe context

Memory

42.7 GB / 64.0 GB

Memory breakdown

Weights28.8 GB

KV Cache6.3 GB

Runtime1.2 GB

Headroom6.4 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs well	39.4 tok/s	7138 ms	8K
Chat	C	Runs well	39.4 tok/s	2677 ms	8K
Coding	C	Runs well	39.4 tok/s	4908 ms	8K
RAG	C	Runs well	39.4 tok/s	8923 ms	8K
Reasoning	C	Runs well	39.4 tok/s	5800 ms	8K

Quantization options

How Falcon 40B Instruct (40B params) fits at each quantization level on AMD Instinct MI210 64GB (64.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	15.6 GB	Low	D34
Q3_K_S	3	19.6 GB	Low	D35
NVFP4	4	22.4 GB	Medium	D36
Q4_K_M	4	24.4 GB	Medium	D37
Q5_K_M	5	28.8 GB	High	D38
Q6_K	6	32.8 GB	High	D40
Q8_0Best for your GPU	8	42.8 GB	Very High	C43
F16	16	82.0 GB	Maximum	F0

Get started

Ollama

ollama run falcon-40b-instruct

HuggingFace

huggingface-cli download falcon-40b-instruct

Upgrade options

Hardware that runs Falcon 40B Instruct well

NVIDIA H100 80GBBudget pick

C99.7 tok/s decode

~$40,000 MSRP

NVIDIA H800 80GBBiggest leap

C86.1 tok/s decode

See all results for AMD Instinct MI210 64GB See all hardware for Falcon 40B Instruct