Browse AI Models

84 models available

/

Status:

Sort:

Filtered by:

DeepSeek DeepSeek R1 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

Mistral Devstral 2 123B Instruct

123B256K ctx68.9 GBfrontier

denseLegacy

Devstral is an agentic LLM for software engineering tasks. Devstral 2 excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench.

Z.ai GLM-5

744B (40B active)200K ctx416.6 GBfrontier

moeLegacy

📍 Use GLM-5 API services on Z.ai API Platform.

Moonshot AI Kimi K2.5

1000B (32B active)256K ctx560 GBfrontier

moeLegacy

Kimi K2.5 is Moonshot AI's advanced reasoning model with strong performance in math, coding, and multilingual tasks. Features long-context understanding and agentic capabilities for complex multi-step problem solving.

Mistral Mistral Large 3

675B (41B active)256K ctx378 GBfrontier

moeLegacy

Mistral-Large-Instruct-2411 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities extending Mistral-Large-Instruct-2407 with better Long Context, Function Calling and System Prompt.

Mistral Mistral Small 4 119B

119B (6.5B active)256K ctx66.6 GBfrontier

moeLegacy

Mistral Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—Instruct, Reasoning (previously called Magistral), and Devstral—into a single, unified model.

Alibaba Qwen3-Coder 30B A3B Instruct

30.5B (3.3B active)256K ctx17.1 GBfrontier

moeLegacy

Qwen3-Coder is available in multiple sizes. Today, we're excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:

Alibaba Qwen3-Coder 480B A35B Instruct

480B (35B active)256K ctx268.8 GBfrontier

moeLegacy

Today, we're announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder is available in multiple sizes, but we're excited to introduce its most powerful variant first: Qwen3-Coder-480B-A35B-Instruct. featuring the following key enhancements:

Alibaba Qwen3-Coder-Next

80B (3B active)256K ctx44.8 GBfrontier

moeLegacy

Today, we're announcing Qwen3-Coder-Next, an open-weight language model designed specifically for coding agents and local development. It features the following key enhancements:

DeepSeek DeepSeek V3 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

Mistral Mixtral 8x22B

141B (39B active)66K ctx79 GBcurrent

moeLegacy

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest

Alibaba Qwen 2.5 72B

72B131K ctx40.3 GBcurrent

denseLegacy

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

Alibaba Qwen 3 235B A22B

235B (22B active)131K ctx131.6 GBfrontier

moeLegacy

We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507, featuring the following key enhancements:

Alibaba Qwen3-VL 30B A3B Instruct

30B (3B active)256K ctx16.8 GBfrontier

moeLegacy

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.

Mistral Devstral Small 2 24B Instruct

24B256K ctx13.4 GBfrontier

denseLegacy

Devstral is an agentic LLM for software engineering tasks. Devstral Small 2 excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench.

Meta Llama 3.3 70B

70B128K ctx39.2 GBcurrent

denseLegacy

Llama 3.3 70B is Meta's most capable single-GPU-class model, offering improved reasoning and instruction following over Llama 3.1 70B. Supports 128K context with enhanced multilingual and code capabilities.

Meta Llama 4 Maverick 17B 128E

400B (17B active)1.0M ctx224 GBfrontier

moeLegacy

Llama 4 Maverick is Meta's large MoE model with 17B active parameters and 128 experts (400B total). Delivers frontier-class performance on reasoning and coding while remaining deployable on a single node.

Unsloth DeepSeek R1 0528 Qwen3 8B

8B0K ctx4.5 GB

denseLegacy

Cohere Command A 111B

111B262K ctx62.2 GBfrontier

denseLegacy

Command A is Cohere's latest flagship model with 111B parameters, designed for agentic enterprise applications. Features advanced tool use, multi-step reasoning, and retrieval-augmented generation.

Alibaba Qwen 2.5 VL 72B

72B33K ctx40.3 GBfrontier

denseLegacy

license: other license_name: qwen license_link: https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct/blob/main/LICENSE language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers

Mistral Devstral Small 1.1

24B131K ctx13.4 GBcurrent

denseLegacy

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this benchmark.

Meta Llama 3.1 70B

70B128K ctx39.2 GBlegacy

denseLegacy

Llama 3.1 70B is Meta's high-capability open model with 128K context window. Excels at complex reasoning, multilingual tasks, code generation, and tool use with quality competitive with leading proprietary models.

NVIDIA Nemotron 70B

70B131K ctx39.2 GBcurrent

denseLegacy

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

Mistral AI Pixtral Large 124B

124B131K ctx69.4 GBfrontier

denseLegacy

Pixtral-Large-Instruct-2411 is a 124B multimodal model built on top of Mistral Large 2, i.e., Mistral-Large-Instruct-2407. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.

Browse AI Models

84 models available

/

Status:

Sort:

Filtered by:

DeepSeek DeepSeek R1 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

Mistral Devstral 2 123B Instruct

123B256K ctx68.9 GBfrontier

denseLegacy

Z.ai GLM-5

744B (40B active)200K ctx416.6 GBfrontier

moeLegacy

📍 Use GLM-5 API services on Z.ai API Platform.

Moonshot AI Kimi K2.5

1000B (32B active)256K ctx560 GBfrontier

moeLegacy

Mistral Mistral Large 3

675B (41B active)256K ctx378 GBfrontier

moeLegacy

Mistral Mistral Small 4 119B

119B (6.5B active)256K ctx66.6 GBfrontier

moeLegacy

Alibaba Qwen3-Coder 30B A3B Instruct

30.5B (3.3B active)256K ctx17.1 GBfrontier

moeLegacy

Alibaba Qwen3-Coder 480B A35B Instruct

480B (35B active)256K ctx268.8 GBfrontier

moeLegacy

Alibaba Qwen3-Coder-Next

80B (3B active)256K ctx44.8 GBfrontier

moeLegacy

Today, we're announcing Qwen3-Coder-Next, an open-weight language model designed specifically for coding agents and local development. It features the following key enhancements:

DeepSeek DeepSeek V3 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

Mistral Mixtral 8x22B

141B (39B active)66K ctx79 GBcurrent

moeLegacy

Alibaba Qwen 2.5 72B

72B131K ctx40.3 GBcurrent

denseLegacy

Alibaba Qwen 3 235B A22B

235B (22B active)131K ctx131.6 GBfrontier

moeLegacy

We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507, featuring the following key enhancements:

Alibaba Qwen3-VL 30B A3B Instruct

30B (3B active)256K ctx16.8 GBfrontier

moeLegacy

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.

Mistral Devstral Small 2 24B Instruct

24B256K ctx13.4 GBfrontier

denseLegacy

Meta Llama 3.3 70B

70B128K ctx39.2 GBcurrent

denseLegacy

Meta Llama 4 Maverick 17B 128E

400B (17B active)1.0M ctx224 GBfrontier

moeLegacy

Unsloth DeepSeek R1 0528 Qwen3 8B

8B0K ctx4.5 GB

denseLegacy

Cohere Command A 111B

111B262K ctx62.2 GBfrontier

denseLegacy

Command A is Cohere's latest flagship model with 111B parameters, designed for agentic enterprise applications. Features advanced tool use, multi-step reasoning, and retrieval-augmented generation.

Alibaba Qwen 2.5 VL 72B

72B33K ctx40.3 GBfrontier

denseLegacy

Mistral Devstral Small 1.1

24B131K ctx13.4 GBcurrent

denseLegacy

Meta Llama 3.1 70B

70B128K ctx39.2 GBlegacy

denseLegacy

NVIDIA Nemotron 70B

70B131K ctx39.2 GBcurrent

denseLegacy

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

Mistral AI Pixtral Large 124B

124B131K ctx69.4 GBfrontier

denseLegacy