Browse AI Models

17 models available

/

Status:

Sort:

Filtered by:

DeepSeek DeepSeek V3 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

Cohere Command A 111B

111B262K ctx62.2 GBfrontier

denseLegacy

Command A is Cohere's latest flagship model with 111B parameters, designed for agentic enterprise applications. Features advanced tool use, multi-step reasoning, and retrieval-augmented generation.

DeepSeek DeepSeek Coder V2 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

DeepSeek DeepSeek V2.5 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit DeepSeek-V2 page for more information.

Mistral AI Codestral 22B

22B33K ctx12.3 GBcurrent

denseLegacy

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest

OpenAI GPT-OSS 20B

21B (3.6B active)128K ctx11.8 GBfrontier

moeLegacy

GPT-OSS 20B is OpenAI's first open-weight model, a 21B-parameter mixture-of-experts model with 3.6B active parameters per token. Features configurable reasoning effort (low/medium/high), full chain-of-thought visibility, and agentic capabilities including function calling. Runs on devices with 16GB of memory using MXFP4 quantization.

DeepSeek DeepSeek Coder V2 16B

16B (2.4B active)131K ctx9 GBcurrent

moeLegacy

DeepSeek DeepSeek R1 Distill 7B

7B33K ctx3.9 GBactive

denseLegacy

DeepSeek R1 Distill Qwen 7B is a 7B-parameter reasoning model distilled from the larger DeepSeek-R1. Based on Qwen2.5-Math-7B and fine-tuned on 800K samples from DeepSeek-R1, it delivers strong reasoning with 92.8% on MATH-500 and 49.1 on GPQA Diamond while being far more efficient than the full 671B model.

IBM Granite Code 20B

20B8K ctx11.2 GBcurrent

denseLegacy

Granite-20B-Code-Instruct-8K is a 20B parameter model fine tuned from *Granite-20B-Code-Base-8K* on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

NVIDIA Nemotron Nano 8B

8B131K ctx4.5 GBactive

denseLegacy

Nemotron Nano 8B is NVIDIA's reasoning model derived from Llama 3.1 8B Instruct, post-trained for switchable reasoning with on/off modes. Achieves 95.4% on MATH-500 and 54.1% on GPQA Diamond with reasoning enabled. Fits on a single RTX GPU for local deployment.

Defog SQLCoder 7B

7B8K ctx3.9 GBcurrent

denseLegacy

The model weights were updated at 7 AM UTC on Feb 7, 2024. The new model weights lead to a much more performant model – particularly for joins.

Tsinghua/Zhipu CodeGeeX 4 9B

9B131K ctx5 GBcurrent

denseLegacy

We introduce CodeGeeX4-ALL-9B, the open-source version of the latest CodeGeeX4 model series. It is a multilingual code generation model continually trained on the GLM-4-9B, significantly enhancing its code generation capabilities. Using a single CodeGeeX4-ALL-9B model, it can support comprehensive functions such as code completion and generation, code interpreter, web search, function call, repository-level code Q&A, covering various scenarios of software development. CodeGeeX4-ALL-9B has achieved highly competitive performance on public benchmarks, such as BigCodeBench and NaturalCodeBench.

01.AI Yi Coder 9B

9B131K ctx5 GBcurrent

denseLegacy

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Mistral AI Codestral Mamba 7B

7B262K ctx3.9 GBcurrent

denseLegacy

Codestral Mamba is an open code model based on the Mamba2 architecture. It performs on par with state-of-the-art Transformer-based code models. \ You can read more in the official blog post.

IBM Granite Code 8B

8B8K ctx4.5 GBcurrent

denseLegacy

Granite-8B-Code-Instruct-4K is a 8B parameter model fine tuned from *Granite-8B-Code-Base-4K* on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

DeepSeek DeepSeek R1 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy

DeepSeek R1 Distill Qwen 1.5B is a compact reasoning model distilled from DeepSeek-R1, based on Qwen2.5-Math-1.5B. Fine-tuned on 800K curated samples, it achieves 83.9% on MATH-500 and supports chain-of-thought reasoning on resource-constrained devices.

Alibaba Qwen 2.5 Coder 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy

Qwen 2.5 Coder 1.5B is Alibaba's compact code-specific language model from the Qwen2.5 Coder series. Trained on 5.5T tokens including source code, text-code grounding, and synthetic data. Features improvements in code generation, reasoning, and fixing while maintaining general and math capabilities.

Browse AI Models

17 models available

/

Status:

Sort:

Filtered by:

DeepSeek DeepSeek V3 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

Cohere Command A 111B

111B262K ctx62.2 GBfrontier

denseLegacy

Command A is Cohere's latest flagship model with 111B parameters, designed for agentic enterprise applications. Features advanced tool use, multi-step reasoning, and retrieval-augmented generation.

DeepSeek DeepSeek Coder V2 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

DeepSeek DeepSeek V2.5 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

Mistral AI Codestral 22B

22B33K ctx12.3 GBcurrent

denseLegacy

OpenAI GPT-OSS 20B

21B (3.6B active)128K ctx11.8 GBfrontier

moeLegacy

DeepSeek DeepSeek Coder V2 16B

16B (2.4B active)131K ctx9 GBcurrent

moeLegacy

DeepSeek DeepSeek R1 Distill 7B

7B33K ctx3.9 GBactive

denseLegacy

IBM Granite Code 20B

20B8K ctx11.2 GBcurrent

denseLegacy

NVIDIA Nemotron Nano 8B

8B131K ctx4.5 GBactive

denseLegacy

Defog SQLCoder 7B

7B8K ctx3.9 GBcurrent

denseLegacy

The model weights were updated at 7 AM UTC on Feb 7, 2024. The new model weights lead to a much more performant model – particularly for joins.

Tsinghua/Zhipu CodeGeeX 4 9B

9B131K ctx5 GBcurrent

denseLegacy

01.AI Yi Coder 9B

9B131K ctx5 GBcurrent

denseLegacy

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Mistral AI Codestral Mamba 7B

7B262K ctx3.9 GBcurrent

denseLegacy

Codestral Mamba is an open code model based on the Mamba2 architecture. It performs on par with state-of-the-art Transformer-based code models. \ You can read more in the official blog post.

IBM Granite Code 8B

8B8K ctx4.5 GBcurrent

denseLegacy

DeepSeek DeepSeek R1 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy

Alibaba Qwen 2.5 Coder 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy