Browse AI Models

84 models available

/

Status:

Sort:

Filtered by:

Alibaba Qwen 2.5 Math 72B

72B4K ctx40.3 GBfrontier

denseLegacy

> [!Warning] > > > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >

Unsloth DeepSeek R1 Distill Llama 8B

8B0K ctx4.5 GB

denseLegacy

Unsloth DeepSeek R1 Distill Qwen 1.5B

1.5B0K ctx0.8 GB

denseLegacy

Cohere Command R+ 104B

104B131K ctx58.2 GBcurrent

denseLegacy

Command R+ is Cohere's most capable open-weight model for enterprise RAG workloads. Offers superior long-context reasoning, multi-step tool use, and grounded generation with citations across 10 languages.

DeepSeek DeepSeek Coder V2 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

DeepSeek DeepSeek R1 Distill 32B

32B33K ctx17.9 GBfrontier

denseLegacy

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

Meta Llama 4 Scout 17B 16E

109B (17B active)10.5M ctx61 GBfrontier

moeLegacy

Llama 4 Scout is Meta's efficient Mixture-of-Experts model with 17B active parameters across 16 experts. Supports a 10M token context window and natively handles text, images, and video inputs.

Alibaba Qwen 3 30B A3B

30.5B (3.3B active)131K ctx17.1 GBfrontier

moeLegacy

We introduce the updated version of the Qwen3-30B-A3B non-thinking mode, named Qwen3-30B-A3B-Instruct-2507, featuring the following key enhancements:

BigCode StarCoder 15B

15B8K ctx8.4 GBlegacy

denseLegacy

StarCoder 15B is BigCode's flagship code generation model trained on 1 trillion tokens from The Stack. Supports 80+ programming languages with 8K context and strong code completion capabilities.

Lmstudio-community DeepSeek R1 0528 Qwen3 8B

8B0K ctx4.5 GB

denseLegacy

Unsloth DeepSeek R1 Distill Qwen 14B

14B0K ctx7.8 GB

denseLegacy

DeepSeek DeepSeek V2.5 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit DeepSeek-V2 page for more information.

Microsoft Phi-4-reasoning-plus 14B

14.7B33K ctx8.2 GBfrontier

denseLegacy

> [!IMPORTANT] > To fully take advantage of the model's capabilities, inference must use `temperature=0.8`, `top_k=50`, `top_p=0.95`, and `do_sample=True`. For more complex queries, set `max_new_tokens=32768` to allow for longer chain-of-thought (CoT).

Alibaba Qwen 3 32B

32B131K ctx17.9 GBfrontier

denseLegacy

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:

MaziyarPanahi DeepSeek R1 0528 Qwen3 8B

8B0K ctx4.5 GB

denseLegacy

Mistral Magistral Small 2507

24B131K ctx13.4 GBlegacy

denseLegacy

Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters.

Mistral Mixtral 8x7B

47B (13B active)33K ctx26.3 GBcurrent

moeLegacy

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest

OpenAI GPT-OSS 20B

21B (3.6B active)128K ctx11.8 GBfrontier

moeLegacy

GPT-OSS 20B is OpenAI's first open-weight model, a 21B-parameter mixture-of-experts model with 3.6B active parameters per token. Features configurable reasoning effort (low/medium/high), full chain-of-thought visibility, and agentic capabilities including function calling. Runs on devices with 16GB of memory using MXFP4 quantization.

Mistral Mistral Small 3.2 24B

24B131K ctx13.4 GBcurrent

visionLegacy

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.

Alibaba Qwen 2.5 32B

32B131K ctx17.9 GBcurrent

denseLegacy

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

BigCode StarCoder 7B

7B8K ctx3.9 GBlegacy

denseLegacy

StarCoder 7B is BigCode's code generation model trained on The Stack v1. Supports over 80 programming languages with fill-in-the-middle capability and 8K context window.

Bartowski cognitivecomputations Dolphin3.0 R1 Mistral 24B

24B0K ctx13.4 GB

denseLegacy

Google Gemma 2 27B

27B8K ctx15.1 GBcurrent

denseLegacy

Gemma 2 27B is Google's largest Gemma 2 model, offering state-of-the-art performance among open models of similar size. Built on Gemini technology with strong reasoning, code, and multilingual capabilities.

01.AI Yi 1.5 34B

34B4K ctx19 GBcurrent

denseLegacy

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Browse AI Models

84 models available

/

Status:

Sort:

Filtered by:

Alibaba Qwen 2.5 Math 72B

72B4K ctx40.3 GBfrontier

denseLegacy

> [!Warning] > > > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >

Unsloth DeepSeek R1 Distill Llama 8B

8B0K ctx4.5 GB

denseLegacy

Unsloth DeepSeek R1 Distill Qwen 1.5B

1.5B0K ctx0.8 GB

denseLegacy

Cohere Command R+ 104B

104B131K ctx58.2 GBcurrent

denseLegacy

DeepSeek DeepSeek Coder V2 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

DeepSeek DeepSeek R1 Distill 32B

32B33K ctx17.9 GBfrontier

denseLegacy

Meta Llama 4 Scout 17B 16E

109B (17B active)10.5M ctx61 GBfrontier

moeLegacy

Llama 4 Scout is Meta's efficient Mixture-of-Experts model with 17B active parameters across 16 experts. Supports a 10M token context window and natively handles text, images, and video inputs.

Alibaba Qwen 3 30B A3B

30.5B (3.3B active)131K ctx17.1 GBfrontier

moeLegacy

We introduce the updated version of the Qwen3-30B-A3B non-thinking mode, named Qwen3-30B-A3B-Instruct-2507, featuring the following key enhancements:

BigCode StarCoder 15B

15B8K ctx8.4 GBlegacy

denseLegacy

StarCoder 15B is BigCode's flagship code generation model trained on 1 trillion tokens from The Stack. Supports 80+ programming languages with 8K context and strong code completion capabilities.

Lmstudio-community DeepSeek R1 0528 Qwen3 8B

8B0K ctx4.5 GB

denseLegacy

Unsloth DeepSeek R1 Distill Qwen 14B

14B0K ctx7.8 GB

denseLegacy

DeepSeek DeepSeek V2.5 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

Microsoft Phi-4-reasoning-plus 14B

14.7B33K ctx8.2 GBfrontier

denseLegacy

Alibaba Qwen 3 32B

32B131K ctx17.9 GBfrontier

denseLegacy

MaziyarPanahi DeepSeek R1 0528 Qwen3 8B

8B0K ctx4.5 GB

denseLegacy

Mistral Magistral Small 2507

24B131K ctx13.4 GBlegacy

denseLegacy

Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters.

Mistral Mixtral 8x7B

47B (13B active)33K ctx26.3 GBcurrent

moeLegacy

OpenAI GPT-OSS 20B

21B (3.6B active)128K ctx11.8 GBfrontier

moeLegacy

Mistral Mistral Small 3.2 24B

24B131K ctx13.4 GBcurrent

visionLegacy

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.

Alibaba Qwen 2.5 32B

32B131K ctx17.9 GBcurrent

denseLegacy

BigCode StarCoder 7B

7B8K ctx3.9 GBlegacy

denseLegacy

StarCoder 7B is BigCode's code generation model trained on The Stack v1. Supports over 80 programming languages with fill-in-the-middle capability and 8K context window.

Bartowski cognitivecomputations Dolphin3.0 R1 Mistral 24B

24B0K ctx13.4 GB

denseLegacy

Google Gemma 2 27B

27B8K ctx15.1 GBcurrent

denseLegacy

01.AI Yi 1.5 34B

34B4K ctx19 GBcurrent

denseLegacy

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat