NVIDIA

Nemotron Nano 8B

Name: Nemotron Nano 8B
Author: NVIDIA

HuggingFace

306.6KDownloads219LikesMar 2025Released131K tokensContextNVIDIA Open ModelLicense4 EntryQuality

Get started

— copy & paste to run locally

Ollama

ollama run nemotron-nano-8b

HuggingFace

huggingface-cli download nemotron-nano-8b

Quick specs

Parameters8B

Architecturedense

Context131K tokens

Modalitytext

Min RAM3.1 GB

Rec. RAM4.9 GB (Q4_K_M)

LicenseNVIDIA Open Model

FamilyNemotron

✓ Chat✓ Reasoning

About this model

Nemotron Nano 8B is NVIDIA's reasoning model derived from Llama 3.1 8B Instruct, post-trained for switchable reasoning with on/off modes. Achieves 95.4% on MATH-500 and 54.1% on GPQA Diamond with reasoning enabled. Fits on a single RTX GPU for local deployment.

•Switchable reasoning: toggle detailed thinking on/off via system prompt
•95.4% on MATH-500 with reasoning on, 36.6% with reasoning off
•Derived from Llama 3.1 8B with multi-phase post-training
•Fits on a single RTX GPU for local inference

Related models

Quick picks

Best budgetC

Intel Arc B580 12GB~$249 — 45 tok/s

Best overallB

RTX 3080 10GB~$699 — 118 tok/s

Best hardware

Top picks for Nemotron Nano 8B

Quantization options

VRAM estimates by quant level

No hardware detected — fit column shows raw VRAM estimates

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	3.1 GB	Low	—
Q3_K_S	3	3.9 GB	Low	—
NVFP4	4	4.5 GB	Medium	—
Q4_K_M	4	4.9 GB	Medium	—
Q5_K_M	5	5.8 GB	High	—
Q6_K	6	6.6 GB	High	—
Q8_0	8	8.6 GB	Very High	—
F16	16	16.4 GB	Maximum	—

Hardware compatibility

Fit estimates across all hardware

Open calculator

Computing compatibility...

Memory breakdown

Reference: NVIDIA A10 24GB

Weights4.9 GB

KV Cache1.3 GB

Runtime0.9 GB

Headroom2.4 GB

NVIDIA

Nemotron Nano 8B

HuggingFace

306.6KDownloads219LikesMar 2025Released131K tokensContextNVIDIA Open ModelLicense4 EntryQuality

Get started

— copy & paste to run locally

Ollama

ollama run nemotron-nano-8b

HuggingFace

huggingface-cli download nemotron-nano-8b

Quick specs

Parameters8B

Architecturedense

Context131K tokens

Modalitytext

Min RAM3.1 GB

Rec. RAM4.9 GB (Q4_K_M)

LicenseNVIDIA Open Model

FamilyNemotron

✓ Chat✓ Reasoning

About this model

•Switchable reasoning: toggle detailed thinking on/off via system prompt
•95.4% on MATH-500 with reasoning on, 36.6% with reasoning off
•Derived from Llama 3.1 8B with multi-phase post-training
•Fits on a single RTX GPU for local inference

Related models

Quick picks

Best budgetC

Intel Arc B580 12GB~$249 — 45 tok/s

Best overallB

RTX 3080 10GB~$699 — 118 tok/s

Best hardware

Top picks for Nemotron Nano 8B

Quantization options

VRAM estimates by quant level

No hardware detected — fit column shows raw VRAM estimates

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	3.1 GB	Low	—
Q3_K_S	3	3.9 GB	Low	—
NVFP4	4	4.5 GB	Medium	—
Q4_K_M	4	4.9 GB	Medium	—
Q5_K_M	5	5.8 GB	High	—
Q6_K	6	6.6 GB	High	—
Q8_0	8	8.6 GB	Very High	—
F16	16	16.4 GB	Maximum	—

Hardware compatibility

Fit estimates across all hardware

Open calculator

Computing compatibility...

Memory breakdown

Reference: NVIDIA A10 24GB

Weights4.9 GB

KV Cache1.3 GB

Runtime0.9 GB

Headroom2.4 GB