Browse AI Models

328 models available

/

Status:

Sort:

22B256K ctx12.3 GBfrontier

denseLegacy

Codestral 2 is Mistral AI's latest code-focused model with enhanced performance on code generation, refactoring, and documentation across dozens of programming languages.

Mistral Devstral Small 1.1

24B131K ctx13.4 GBcurrent

denseLegacy

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this benchmark.

Meta Llama 3.1 70B

70B128K ctx39.2 GBlegacy

denseLegacy

Llama 3.1 70B is Meta's high-capability open model with 128K context window. Excels at complex reasoning, multilingual tasks, code generation, and tool use with quality competitive with leading proprietary models.

NVIDIA Nemotron 70B

70B131K ctx39.2 GBcurrent

denseLegacy

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

Mistral AI Pixtral Large 124B

124B131K ctx69.4 GBfrontier

denseLegacy

Pixtral-Large-Instruct-2411 is a 124B multimodal model built on top of Mistral Large 2, i.e., Mistral-Large-Instruct-2407. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.

Alibaba Qwen 2.5 Math 72B

72B4K ctx40.3 GBfrontier

denseLegacy

> [!Warning] > > > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >

Unsloth DeepSeek R1 Distill Llama 8B

8B0K ctx4.5 GB

denseLegacy

Dphn Dolphin3.0 Llama3.1 8B

8B0K ctx4.5 GB

denseLegacy

MaziyarPanahi Llama 3 8B Instruct 32k v0.1

8B0K ctx4.5 GB

denseLegacy

Unsloth DeepSeek R1 Distill Qwen 1.5B

1.5B0K ctx0.8 GB

denseLegacy

MaziyarPanahi Meta Llama 3.1 8B Instruct

8B0K ctx4.5 GB

denseLegacy

Cohere Command R+ 104B

104B131K ctx58.2 GBcurrent

denseLegacy

Command R+ is Cohere's most capable open-weight model for enterprise RAG workloads. Offers superior long-context reasoning, multi-step tool use, and grounded generation with citations across 10 languages.

DeepSeek DeepSeek Coder V2 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

DeepSeek DeepSeek R1 Distill 32B

32B33K ctx17.9 GBfrontier

denseLegacy

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

Meta Llama 4 Scout 17B 16E

109B (17B active)10.5M ctx61 GBfrontier

moeLegacy

Llama 4 Scout is Meta's efficient Mixture-of-Experts model with 17B active parameters across 16 experts. Supports a 10M token context window and natively handles text, images, and video inputs.

Alibaba Qwen 3 30B A3B

30.5B (3.3B active)131K ctx17.1 GBfrontier

moeLegacy

We introduce the updated version of the Qwen3-30B-A3B non-thinking mode, named Qwen3-30B-A3B-Instruct-2507, featuring the following key enhancements:

BigCode StarCoder 15B

15B8K ctx8.4 GBlegacy

denseLegacy

StarCoder 15B is BigCode's flagship code generation model trained on 1 trillion tokens from The Stack. Supports 80+ programming languages with 8K context and strong code completion capabilities.

Lmg-anon vntl llama3 8b v2

8B0K ctx4.5 GB

denseLegacy

Unsloth Mistral Small 3.2 24B Instruct 2506

24B0K ctx13.4 GB

denseLegacy

Lmstudio-community DeepSeek R1 0528 Qwen3 8B

8B0K ctx4.5 GB

denseLegacy

Unsloth DeepSeek R1 Distill Qwen 14B

14B0K ctx7.8 GB

denseLegacy

MaziyarPanahi gemma 3 4b it

4B0K ctx2.2 GB

denseLegacy

MaziyarPanahi Llama 3.3 70B Instruct

70B0K ctx39.2 GB

denseLegacy

DeepSeek DeepSeek V2.5 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit DeepSeek-V2 page for more information.

Browse AI Models

328 models available

/

Status:

Sort:

Mistral Codestral 2 25.08

22B256K ctx12.3 GBfrontier

denseLegacy

Codestral 2 is Mistral AI's latest code-focused model with enhanced performance on code generation, refactoring, and documentation across dozens of programming languages.

Mistral Devstral Small 1.1

24B131K ctx13.4 GBcurrent

denseLegacy

Meta Llama 3.1 70B

70B128K ctx39.2 GBlegacy

denseLegacy

NVIDIA Nemotron 70B

70B131K ctx39.2 GBcurrent

denseLegacy

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

Mistral AI Pixtral Large 124B

124B131K ctx69.4 GBfrontier

denseLegacy

Alibaba Qwen 2.5 Math 72B

72B4K ctx40.3 GBfrontier

denseLegacy

> [!Warning] > > > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >

Unsloth DeepSeek R1 Distill Llama 8B

8B0K ctx4.5 GB

denseLegacy

Dphn Dolphin3.0 Llama3.1 8B

8B0K ctx4.5 GB

denseLegacy

MaziyarPanahi Llama 3 8B Instruct 32k v0.1

8B0K ctx4.5 GB

denseLegacy

Unsloth DeepSeek R1 Distill Qwen 1.5B

1.5B0K ctx0.8 GB

denseLegacy

MaziyarPanahi Meta Llama 3.1 8B Instruct

8B0K ctx4.5 GB

denseLegacy

Cohere Command R+ 104B

104B131K ctx58.2 GBcurrent

denseLegacy

DeepSeek DeepSeek Coder V2 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

DeepSeek DeepSeek R1 Distill 32B

32B33K ctx17.9 GBfrontier

denseLegacy

Meta Llama 4 Scout 17B 16E

109B (17B active)10.5M ctx61 GBfrontier

moeLegacy

Llama 4 Scout is Meta's efficient Mixture-of-Experts model with 17B active parameters across 16 experts. Supports a 10M token context window and natively handles text, images, and video inputs.

Alibaba Qwen 3 30B A3B

30.5B (3.3B active)131K ctx17.1 GBfrontier

moeLegacy

We introduce the updated version of the Qwen3-30B-A3B non-thinking mode, named Qwen3-30B-A3B-Instruct-2507, featuring the following key enhancements:

BigCode StarCoder 15B

15B8K ctx8.4 GBlegacy

denseLegacy

StarCoder 15B is BigCode's flagship code generation model trained on 1 trillion tokens from The Stack. Supports 80+ programming languages with 8K context and strong code completion capabilities.

Lmg-anon vntl llama3 8b v2

8B0K ctx4.5 GB

denseLegacy

Unsloth Mistral Small 3.2 24B Instruct 2506

24B0K ctx13.4 GB

denseLegacy

Lmstudio-community DeepSeek R1 0528 Qwen3 8B

8B0K ctx4.5 GB

denseLegacy

Unsloth DeepSeek R1 Distill Qwen 14B

14B0K ctx7.8 GB

denseLegacy