OpenAI

GPT-OSS 20B

Name: GPT-OSS 20B
Author: OpenAI

Frontier

HuggingFace

Ollama

7.5MDownloads4.5KLikesAug 2025Released128K tokensContextApache 2.0License4 EntryQuality

Get started

— copy & paste to run locally

Ollama

ollama run gpt-oss-20b

HuggingFace

huggingface-cli download gpt-oss-20b

Quick specs

Parameters21B (3.6B active)

Architecturemoe (MoE)

Context128K tokens

Modalitytext

Min RAM8.2 GB

Rec. RAM12.8 GB (Q4_K_M)

LicenseApache 2.0

FamilyGPT-OSS

✓ Chat✓ Reasoning

About this model

GPT-OSS 20B is OpenAI's first open-weight model, a 21B-parameter mixture-of-experts model with 3.6B active parameters per token. Features configurable reasoning effort (low/medium/high), full chain-of-thought visibility, and agentic capabilities including function calling. Runs on devices with 16GB of memory using MXFP4 quantization.

•OpenAI's first open-weight model under Apache 2.0 license
•MoE architecture: 24 layers, 32 experts, top-4 routing per token
•Configurable reasoning effort: low, medium, and high modes
•Fits in 16GB VRAM with MXFP4 quantization

Quick picks

Best budgetC

MacBook Pro M4 32GB~$799 — 15 tok/s

Best overallB

RTX A5000 24GB~$2,500 — 96 tok/s

Best hardware

Top picks for GPT-OSS 20B

Quantization options

VRAM estimates by quant level

No hardware detected — fit column shows raw VRAM estimates

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	8.2 GB	Low	—
Q3_K_S	3	10.3 GB	Low	—
NVFP4	4	11.8 GB	Medium	—
Q4_K_M	4	12.8 GB	Medium	—
Q5_K_M	5	15.1 GB	High	—
Q6_K	6	17.2 GB	High	—
Q8_0	8	22.5 GB	Very High	—
F16	16	43.1 GB	Maximum	—

Hardware compatibility

Fit estimates across all hardware

Open calculator

Computing compatibility...

Memory breakdown

Reference: NVIDIA A10 24GB

Weights12.8 GB

KV Cache0.8 GB

Runtime0.9 GB

Headroom2.4 GB

OpenAI

GPT-OSS 20B

Frontier

HuggingFace

Ollama

7.5MDownloads4.5KLikesAug 2025Released128K tokensContextApache 2.0License4 EntryQuality

Get started

— copy & paste to run locally

Ollama

ollama run gpt-oss-20b

HuggingFace

huggingface-cli download gpt-oss-20b

Quick specs

Parameters21B (3.6B active)

Architecturemoe (MoE)

Context128K tokens

Modalitytext

Min RAM8.2 GB

Rec. RAM12.8 GB (Q4_K_M)

LicenseApache 2.0

FamilyGPT-OSS

✓ Chat✓ Reasoning

About this model

•OpenAI's first open-weight model under Apache 2.0 license
•MoE architecture: 24 layers, 32 experts, top-4 routing per token
•Configurable reasoning effort: low, medium, and high modes
•Fits in 16GB VRAM with MXFP4 quantization

Quick picks

Best budgetC

MacBook Pro M4 32GB~$799 — 15 tok/s

Best overallB

RTX A5000 24GB~$2,500 — 96 tok/s

Best hardware

Top picks for GPT-OSS 20B

Quantization options

VRAM estimates by quant level

No hardware detected — fit column shows raw VRAM estimates

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	8.2 GB	Low	—
Q3_K_S	3	10.3 GB	Low	—
NVFP4	4	11.8 GB	Medium	—
Q4_K_M	4	12.8 GB	Medium	—
Q5_K_M	5	15.1 GB	High	—
Q6_K	6	17.2 GB	High	—
Q8_0	8	22.5 GB	Very High	—
F16	16	43.1 GB	Maximum	—

Hardware compatibility

Fit estimates across all hardware

Open calculator

Computing compatibility...

Memory breakdown

Reference: NVIDIA A10 24GB

Weights12.8 GB

KV Cache0.8 GB

Runtime0.9 GB

Headroom2.4 GB