Will It Run AI
CalculatorModelsHardwareCompare
Product
  • Calculator
  • Compare
  • Tier List
Browse
  • Models
  • Hardware
  • Docs
About
  • Why It Works
  • What's New
  • Legal Notice
  • Privacy Policy

All estimates are approximations based on mathematical models and public specifications. Actual performance may vary. Do not make purchasing decisions based solely on these estimates.

Data sourced from Hugging Face, Ollama, and official model documentation. Model names and logos are trademarks of their respective owners.

© 2026 Will It Run AI — Fase Consulting Ibiza, S.L. (NIF: B57969656)

Browse AI Models

17 models available

/
Status:
Sort:
Filtered by:
DeepSeekDeepSeekDeepSeek V3 671B
671B (37B active)131K ctx375.8 GBfrontier
moeLegacy

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

CohereCohereCommand A 111B
111B262K ctx62.2 GBfrontier
denseLegacy

Command A is Cohere's latest flagship model with 111B parameters, designed for agentic enterprise applications. Features advanced tool use, multi-step reasoning, and retrieval-augmented generation.

DeepSeekDeepSeekDeepSeek Coder V2 236B
236B (21B active)131K ctx132.2 GBcurrent
moeLegacy

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

DeepSeekDeepSeekDeepSeek V2.5 236B
236B (21B active)131K ctx132.2 GBcurrent
moeLegacy

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit DeepSeek-V2 page for more information.

Mistral AIMistral AICodestral 22B
22B33K ctx12.3 GBcurrent
denseLegacy

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest

OpenAIOpenAIGPT-OSS 20B
21B (3.6B active)128K ctx11.8 GBfrontier
moeLegacy

GPT-OSS 20B is OpenAI's first open-weight model, a 21B-parameter mixture-of-experts model with 3.6B active parameters per token. Features configurable reasoning effort (low/medium/high), full chain-of-thought visibility, and agentic capabilities including function calling. Runs on devices with 16GB of memory using MXFP4 quantization.

DeepSeekDeepSeekDeepSeek Coder V2 16B
16B (2.4B active)131K ctx9 GBcurrent
moeLegacy

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

DeepSeekDeepSeekDeepSeek R1 Distill 7B
7B33K ctx3.9 GBactive
denseLegacy

DeepSeek R1 Distill Qwen 7B is a 7B-parameter reasoning model distilled from the larger DeepSeek-R1. Based on Qwen2.5-Math-7B and fine-tuned on 800K samples from DeepSeek-R1, it delivers strong reasoning with 92.8% on MATH-500 and 49.1 on GPQA Diamond while being far more efficient than the full 671B model.

IBMIBMGranite Code 20B
20B8K ctx11.2 GBcurrent
denseLegacy

Granite-20B-Code-Instruct-8K is a 20B parameter model fine tuned from *Granite-20B-Code-Base-8K* on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

NVIDIANVIDIANemotron Nano 8B
8B131K ctx4.5 GBactive
denseLegacy

Nemotron Nano 8B is NVIDIA's reasoning model derived from Llama 3.1 8B Instruct, post-trained for switchable reasoning with on/off modes. Achieves 95.4% on MATH-500 and 54.1% on GPQA Diamond with reasoning enabled. Fits on a single RTX GPU for local deployment.

DefogDefogSQLCoder 7B
7B8K ctx3.9 GBcurrent
denseLegacy

The model weights were updated at 7 AM UTC on Feb 7, 2024. The new model weights lead to a much more performant model – particularly for joins.

Tsinghua/ZhipuTsinghua/ZhipuCodeGeeX 4 9B
9B131K ctx5 GBcurrent
denseLegacy

We introduce CodeGeeX4-ALL-9B, the open-source version of the latest CodeGeeX4 model series. It is a multilingual code generation model continually trained on the GLM-4-9B, significantly enhancing its code generation capabilities. Using a single CodeGeeX4-ALL-9B model, it can support comprehensive functions such as code completion and generation, code interpreter, web search, function call, repository-level code Q&A, covering various scenarios of software development. CodeGeeX4-ALL-9B has achieved highly competitive performance on public benchmarks, such as BigCodeBench and NaturalCodeBench.

01.AI01.AIYi Coder 9B
9B131K ctx5 GBcurrent
denseLegacy

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Mistral AIMistral AICodestral Mamba 7B
7B262K ctx3.9 GBcurrent
denseLegacy

Codestral Mamba is an open code model based on the Mamba2 architecture. It performs on par with state-of-the-art Transformer-based code models. \ You can read more in the official blog post.

IBMIBMGranite Code 8B
8B8K ctx4.5 GBcurrent
denseLegacy

Granite-8B-Code-Instruct-4K is a 8B parameter model fine tuned from *Granite-8B-Code-Base-4K* on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

DeepSeekDeepSeekDeepSeek R1 1.5B
1.5B33K ctx0.8 GBactive
denseLegacy

DeepSeek R1 Distill Qwen 1.5B is a compact reasoning model distilled from DeepSeek-R1, based on Qwen2.5-Math-1.5B. Fine-tuned on 800K curated samples, it achieves 83.9% on MATH-500 and supports chain-of-thought reasoning on resource-constrained devices.

AlibabaAlibabaQwen 2.5 Coder 1.5B
1.5B33K ctx0.8 GBactive
denseLegacy

Qwen 2.5 Coder 1.5B is Alibaba's compact code-specific language model from the Qwen2.5 Coder series. Trained on 5.5T tokens including source code, text-code grounding, and synthetic data. Features improvements in code generation, reasoning, and fixing while maintaining general and math capabilities.