NVIDIA

Nemotron 70B

Name: Nemotron 70B
Author: NVIDIA

Current

HuggingFace

Ollama

32Downloads568LikesOct 2024Released131K tokensContextNVIDIA Open ModelLicense5 EntryQuality

Get started

— copy & paste to run locally

Ollama

ollama run nemotron-70b

HuggingFace

huggingface-cli download nemotron-70b

Quick specs

Parameters70B

Architecturedense

Context131K tokens

Modalitytext

Min RAM27.3 GB

Rec. RAM42.7 GB (Q4_K_M)

LicenseNVIDIA Open Model

FamilyNemotron

✓ Chat✓ Reasoning

About this model

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

•Please sign up to get free and immediate access to NVIDIA NeMo Framework container. If you don’t have an NVIDIA NGC account, you will be...
•If you don’t have an NVIDIA NGC API key, sign into NVIDIA NGC, selecting organization/team: ea-bignlp/ga-participants and click Generate API key....
•On your machine, docker login to nvcr.io using

Related models

Quick picks

Best budgetC

MacBook Pro M3 Max 128GB~$2,499 — 6 tok/s

Best overallB

NVIDIA H100 80GB~$40,000 — 66 tok/s

Best hardware

Top picks for Nemotron 70B

Quantization options

VRAM estimates by quant level

No hardware detected — fit column shows raw VRAM estimates

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	27.3 GB	Low	—
Q3_K_S	3	34.3 GB	Low	—
NVFP4	4	39.2 GB	Medium	—
Q4_K_M	4	42.7 GB	Medium	—
Q5_K_M	5	50.4 GB	High	—
Q6_K	6	57.4 GB	High	—
Q8_0	8	74.9 GB	Very High	—
F16	16	143.5 GB	Maximum	—

Hardware compatibility

Fit estimates across all hardware

Open calculator

Computing compatibility...

Memory breakdown

Reference: NVIDIA A10 24GB

Weights42.7 GB

KV Cache10.9 GB

Runtime0.9 GB

Headroom2.4 GB

NVIDIA

Nemotron 70B

Current

HuggingFace

Ollama

32Downloads568LikesOct 2024Released131K tokensContextNVIDIA Open ModelLicense5 EntryQuality

Get started

— copy & paste to run locally

Ollama

ollama run nemotron-70b

HuggingFace

huggingface-cli download nemotron-70b

Quick specs

Parameters70B

Architecturedense

Context131K tokens

Modalitytext

Min RAM27.3 GB

Rec. RAM42.7 GB (Q4_K_M)

LicenseNVIDIA Open Model

FamilyNemotron

✓ Chat✓ Reasoning

About this model

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

•Please sign up to get free and immediate access to NVIDIA NeMo Framework container. If you don’t have an NVIDIA NGC account, you will be...
•If you don’t have an NVIDIA NGC API key, sign into NVIDIA NGC, selecting organization/team: ea-bignlp/ga-participants and click Generate API key....
•On your machine, docker login to nvcr.io using

Related models

Quick picks

Best budgetC

MacBook Pro M3 Max 128GB~$2,499 — 6 tok/s

Best overallB

NVIDIA H100 80GB~$40,000 — 66 tok/s

Best hardware

Top picks for Nemotron 70B

Quantization options

VRAM estimates by quant level

No hardware detected — fit column shows raw VRAM estimates

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	27.3 GB	Low	—
Q3_K_S	3	34.3 GB	Low	—
NVFP4	4	39.2 GB	Medium	—
Q4_K_M	4	42.7 GB	Medium	—
Q5_K_M	5	50.4 GB	High	—
Q6_K	6	57.4 GB	High	—
Q8_0	8	74.9 GB	Very High	—
F16	16	143.5 GB	Maximum	—

Hardware compatibility

Fit estimates across all hardware

Open calculator

Computing compatibility...

Memory breakdown

Reference: NVIDIA A10 24GB

Weights42.7 GB

KV Cache10.9 GB

Runtime0.9 GB

Headroom2.4 GB