Will It Run AI
CalculatorModelsHardwareCompare
Product
  • Calculator
  • Compare
  • Tier List
Browse
  • Models
  • Hardware
  • Docs
About
  • Why It Works
  • What's New
  • Legal Notice
  • Privacy Policy

All estimates are approximations based on mathematical models and public specifications. Actual performance may vary. Do not make purchasing decisions based solely on these estimates.

Data sourced from Hugging Face, Ollama, and official model documentation. Model names and logos are trademarks of their respective owners.

© 2026 Will It Run AI — Fase Consulting Ibiza, S.L. (NIF: B57969656)

Home/Models/Falcon 40B Instruct

TIITII

Falcon 40B Instruct

Legacy
huggingfaceHuggingFace
44.3KDownloads1.2KLikesMay 2023Released8K tokensContextApache 2.0License3 EntryQuality

Get started

— copy & paste to run locally
Ollama
ollama run falcon-40b-instruct
HuggingFace
huggingface-cli download falcon-40b-instruct

Quick specs

Parameters40B
Architecturedense
Context8K tokens
Modalitytext
Min RAM15.6 GB
Rec. RAM28.8 GB (Q5_K_M)
LicenseApache 2.0
FamilyFalcon
✓ Chat✓ Reasoning

About this model

Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the Apache 2.0 license.

  • •You are looking for a ready-to-use chat/instruct model based on Falcon-40B
  • •Falcon-40B is the best open-source model available.: It outperforms LLaMA, StableLM, RedPajama, MPT, etc. See the OpenLLM Leaderboard
  • •It features an architecture optimized for inference: , with FlashAttention (Dao et al., 2022) and multiquery (Shazeer et al., 2019)

Related models

Your hardware

Detecting...

Quick picks

Apple
Best budgetC
Mac mini M4 64GB~$1,099 — 3 tok/s
NVIDIA
Best overallC
NVIDIA H100 80GB~$40,000 — 100 tok/s

Best hardware

Top picks for Falcon 40B Instruct

NVIDIA
NVIDIA H100 80GBC
80 GB
NVIDIA
NVIDIA H800 80GBC
80 GB
AMD
AMD Instinct MI210 64GBC
64 GB
NVIDIA
NVIDIA A100 80GBC
80 GB
NVIDIA
NVIDIA H100 PCIe 80GBC
80 GB

Quantization options

VRAM estimates by quant level

No hardware detected — fit column shows raw VRAM estimates

QuantBitsVRAMQualityFit
Q2_K
2
15.6 GB
Low—
Q3_K_S
3
19.6 GB
Low—
NVFP4
4
22.4 GB
Medium—
Q4_K_M
4
24.4 GB
Medium—
Q5_K_M
5
28.8 GB
High—
Q6_K
6
32.8 GB
High—
Q8_0
8
42.8 GB
Very High—
F16
16
82.0 GB
Maximum—

Hardware compatibility

Fit estimates across all hardware

Open calculator

Computing compatibility...

Memory breakdown

Reference: NVIDIA A10 24GB

Weights28.8 GB
KV Cache6.3 GB
Runtime1.2 GB
Headroom2.4 GB