How much VRAM does Aya Expanse 8B need?

Aya Expanse 8B (8B parameters) requires approximately 8.2 GB of memory with Q4_K_M quantization.

What is the best quantization for Aya Expanse 8B?

The recommended quantization for Aya Expanse 8B is Q4_K_M, which balances quality and memory efficiency.

Can it run?

Can RTX 5070 12GB run Aya Expanse 8B?

Q: Can RTX 5070 12GB run Aya Expanse 8B?

Yes, RTX 5070 12GB can run Aya Expanse 8B with a B grade (Runs well). Expected decode speed: 86.8 tok/s.

BGood

Runs well

Using Q4_K_M in llama.cpp

Capabilities:

Fit status

Runs well

Decode

86.8 tok/s

TTFT

2232 ms

Safe context

Memory

8.2 GB / 12.0 GB

Memory breakdown

Weights4.9 GB

KV Cache1.3 GB

Runtime0.9 GB

Headroom1.2 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	86.8 tok/s	3246 ms	8K
Chat	B	Runs well	86.8 tok/s	1217 ms	8K
Coding	B	Runs well	86.8 tok/s	2232 ms	8K
RAG	B	Runs well	86.8 tok/s	4057 ms	8K
Reasoning	B	Runs well	86.8 tok/s	2637 ms	8K

Quantization options

How Aya Expanse 8B (8B params) fits at each quantization level on RTX 5070 12GB (12.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	3.1 GB	Low	D35
Q3_K_S	3	3.9 GB	Low	D36
NVFP4	4	4.5 GB	Medium	D37
Q4_K_M	4	4.9 GB	Medium	D38
Q5_K_M	5	5.8 GB	High	D40
Q6_K	6	6.6 GB	High	C41
Q8_0Best for your GPU	8	8.6 GB	Very High	C44
F16	16	16.4 GB	Maximum	F0

Get started

HuggingFace

huggingface-cli download aya-expanse-8b

See all results for RTX 5070 12GB See all hardware for Aya Expanse 8B

Can it run?

Can RTX 5070 12GB run Aya Expanse 8B?

BGood

Runs well

Using Q4_K_M in llama.cpp

Capabilities:

Fit status

Runs well

Decode

86.8 tok/s

TTFT

2232 ms

Safe context

Memory

8.2 GB / 12.0 GB

Memory breakdown

Weights4.9 GB

KV Cache1.3 GB

Runtime0.9 GB

Headroom1.2 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	86.8 tok/s	3246 ms	8K
Chat	B	Runs well	86.8 tok/s	1217 ms	8K
Coding	B	Runs well	86.8 tok/s	2232 ms	8K
RAG	B	Runs well	86.8 tok/s	4057 ms	8K
Reasoning	B	Runs well	86.8 tok/s	2637 ms	8K

Quantization options

How Aya Expanse 8B (8B params) fits at each quantization level on RTX 5070 12GB (12.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	3.1 GB	Low	D35
Q3_K_S	3	3.9 GB	Low	D36
NVFP4	4	4.5 GB	Medium	D37
Q4_K_M	4	4.9 GB	Medium	D38
Q5_K_M	5	5.8 GB	High	D40
Q6_K	6	6.6 GB	High	C41
Q8_0Best for your GPU	8	8.6 GB	Very High	C44
F16	16	16.4 GB	Maximum	F0

Get started

HuggingFace

huggingface-cli download aya-expanse-8b

See all results for RTX 5070 12GB See all hardware for Aya Expanse 8B