How much VRAM does Nous Hermes 1.0 need?

Nous Hermes 1.0 (9B parameters) requires approximately 9.3 GB of memory with Q4_K_M quantization.

What is the best quantization for Nous Hermes 1.0?

The recommended quantization for Nous Hermes 1.0 is Q4_K_M, which balances quality and memory efficiency.

Can it run?

Can RTX 3080 Ti 12GB run Nous Hermes 1.0?

Q: Can RTX 3080 Ti 12GB run Nous Hermes 1.0?

Yes, RTX 3080 Ti 12GB can run Nous Hermes 1.0 with a B grade (Runs well). Expected decode speed: 122.9 tok/s.

BGood

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

122.9 tok/s

TTFT

1575 ms

Safe context

16K

Memory

9.3 GB / 12.0 GB

Memory breakdown

Weights5.5 GB

KV Cache1.4 GB

Runtime1.2 GB

Headroom1.2 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Tight fit	122.9 tok/s	2291 ms	16K
Chat	B	Runs well	122.9 tok/s	859 ms	11K
Coding	B	Runs well	122.9 tok/s	1575 ms	16K
RAG	C	Tight fit	122.9 tok/s	2863 ms	16K
Reasoning	B	Runs well	122.9 tok/s	1861 ms	16K

Quantization options

How Nous Hermes 1.0 (9B params) fits at each quantization level on RTX 3080 Ti 12GB (12.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	3.5 GB	Low	D35
Q3_K_S	3	4.4 GB	Low	D37
NVFP4	4	5.0 GB	Medium	D38
Q4_K_M	4	5.5 GB	Medium	D39
Q5_K_M	5	6.5 GB	High	C40
Q6_KBest for your GPU	6	7.4 GB	High	C42
Q8_0	8	9.6 GB	Very High	C44
F16	16	18.5 GB	Maximum	F0

Get started

Ollama

ollama run nous-hermes-1.0

HuggingFace

huggingface-cli download nous-hermes-1.0

See all results for RTX 3080 Ti 12GB See all hardware for Nous Hermes 1.0

Can it run?

Can RTX 3080 Ti 12GB run Nous Hermes 1.0?

BGood

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

122.9 tok/s

TTFT

1575 ms

Safe context

16K

Memory

9.3 GB / 12.0 GB

Memory breakdown

Weights5.5 GB

KV Cache1.4 GB

Runtime1.2 GB

Headroom1.2 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Tight fit	122.9 tok/s	2291 ms	16K
Chat	B	Runs well	122.9 tok/s	859 ms	11K
Coding	B	Runs well	122.9 tok/s	1575 ms	16K
RAG	C	Tight fit	122.9 tok/s	2863 ms	16K
Reasoning	B	Runs well	122.9 tok/s	1861 ms	16K

Quantization options

How Nous Hermes 1.0 (9B params) fits at each quantization level on RTX 3080 Ti 12GB (12.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	3.5 GB	Low	D35
Q3_K_S	3	4.4 GB	Low	D37
NVFP4	4	5.0 GB	Medium	D38
Q4_K_M	4	5.5 GB	Medium	D39
Q5_K_M	5	6.5 GB	High	C40
Q6_KBest for your GPU	6	7.4 GB	High	C42
Q8_0	8	9.6 GB	Very High	C44
F16	16	18.5 GB	Maximum	F0

Get started

Ollama

ollama run nous-hermes-1.0

HuggingFace

huggingface-cli download nous-hermes-1.0

See all results for RTX 3080 Ti 12GB See all hardware for Nous Hermes 1.0