How much VRAM does Yi 1.5 6B need?

Yi 1.5 6B (6B parameters) requires approximately 6.6 GB of memory with Q4_K_M quantization.

What is the best quantization for Yi 1.5 6B?

The recommended quantization for Yi 1.5 6B is Q4_K_M, which balances quality and memory efficiency.

Can it run?

Can RTX 3070 Ti 8GB run Yi 1.5 6B?

Q: Can RTX 3070 Ti 8GB run Yi 1.5 6B?

Yes, RTX 3070 Ti 8GB can run Yi 1.5 6B with a C grade (Tight fit). Expected decode speed: 119.6 tok/s.

CUsable

Tight fit

Using Q4_K_M in Ollama

Capabilities:

Fit status

Tight fit

Decode

119.6 tok/s

TTFT

1619 ms

Safe context

Memory

6.6 GB / 8.0 GB

Memory breakdown

Weights3.7 GB

KV Cache0.9 GB

Runtime1.2 GB

Headroom0.8 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Tight fit	119.6 tok/s	2354 ms	4K
Chat	B	Runs well	119.6 tok/s	883 ms	4K
Coding	C	Tight fit	119.6 tok/s	1619 ms	4K
RAG	C	Tight fit	119.6 tok/s	2943 ms	4K
Reasoning	C	Tight fit	119.6 tok/s	1913 ms	4K

Quantization options

How Yi 1.5 6B (6B params) fits at each quantization level on RTX 3070 Ti 8GB (8.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	2.3 GB	Low	D35
Q3_K_S	3	2.9 GB	Low	D36
NVFP4	4	3.4 GB	Medium	D37
Q4_K_M	4	3.7 GB	Medium	D38
Q5_K_M	5	4.3 GB	High	D40
Q6_KBest for your GPU	6	4.9 GB	High	C42
Q8_0	8	6.4 GB	Very High	C43
F16	16	12.3 GB	Maximum	F0

Get started

Ollama

ollama run yi-1.5-6b

HuggingFace

huggingface-cli download yi-1.5-6b

Upgrade options

Hardware that runs Yi 1.5 6B well

RTX 5070 12GBBudget pick

C115.7 tok/s decode

~$549 MSRP

RTX 4070 12GBBest value

C103.3 tok/s decode

~$599 MSRP

RTX 3080 10GBBiggest leap

B157.8 tok/s decode

~$699 MSRP

RTX 2080 Ti 11GBNVIDIA upgrade

C109.4 tok/s decode

~$999 MSRP

See all results for RTX 3070 Ti 8GB See all hardware for Yi 1.5 6B

Can it run?

Can RTX 3070 Ti 8GB run Yi 1.5 6B?

CUsable

Tight fit

Using Q4_K_M in Ollama

Capabilities:

Fit status

Tight fit

Decode

119.6 tok/s

TTFT

1619 ms

Safe context

Memory

6.6 GB / 8.0 GB

Memory breakdown

Weights3.7 GB

KV Cache0.9 GB

Runtime1.2 GB

Headroom0.8 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Tight fit	119.6 tok/s	2354 ms	4K
Chat	B	Runs well	119.6 tok/s	883 ms	4K
Coding	C	Tight fit	119.6 tok/s	1619 ms	4K
RAG	C	Tight fit	119.6 tok/s	2943 ms	4K
Reasoning	C	Tight fit	119.6 tok/s	1913 ms	4K

Quantization options

How Yi 1.5 6B (6B params) fits at each quantization level on RTX 3070 Ti 8GB (8.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	2.3 GB	Low	D35
Q3_K_S	3	2.9 GB	Low	D36
NVFP4	4	3.4 GB	Medium	D37
Q4_K_M	4	3.7 GB	Medium	D38
Q5_K_M	5	4.3 GB	High	D40
Q6_KBest for your GPU	6	4.9 GB	High	C42
Q8_0	8	6.4 GB	Very High	C43
F16	16	12.3 GB	Maximum	F0

Get started

Ollama

ollama run yi-1.5-6b

HuggingFace

huggingface-cli download yi-1.5-6b

Upgrade options

Hardware that runs Yi 1.5 6B well

RTX 5070 12GBBudget pick

C115.7 tok/s decode

~$549 MSRP

RTX 4070 12GBBest value

C103.3 tok/s decode

~$599 MSRP

RTX 3080 10GBBiggest leap

B157.8 tok/s decode

~$699 MSRP

RTX 2080 Ti 11GBNVIDIA upgrade

C109.4 tok/s decode

~$999 MSRP

See all results for RTX 3070 Ti 8GB See all hardware for Yi 1.5 6B