Browse AI Models

283 models available

/

Status:

Sort:

Filtered by:

DeepSeek DeepSeek R1 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

Z.ai GLM-5

744B (40B active)200K ctx416.6 GBfrontier

moeLegacy

📍 Use GLM-5 API services on Z.ai API Platform.

Moonshot AI Kimi K2.5

1000B (32B active)256K ctx560 GBfrontier

moeLegacy

Kimi K2.5 is Moonshot AI's advanced reasoning model with strong performance in math, coding, and multilingual tasks. Features long-context understanding and agentic capabilities for complex multi-step problem solving.

Mistral Mistral Large 3

675B (41B active)256K ctx378 GBfrontier

moeLegacy

Mistral-Large-Instruct-2411 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities extending Mistral-Large-Instruct-2407 with better Long Context, Function Calling and System Prompt.

Mistral Mistral Small 4 119B

119B (6.5B active)256K ctx66.6 GBfrontier

moeLegacy

Mistral Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—Instruct, Reasoning (previously called Magistral), and Devstral—into a single, unified model.

Unsloth Qwen3.5 27B

27B0K ctx15.1 GB

denseLegacy

Unsloth Qwen3.5 35B A3B

35B0K ctx19.6 GB

denseLegacy

Unsloth Qwen3.5 9B

9B0K ctx5 GB

denseLegacy

Google gemma 2b

2B0K ctx1.1 GB

denseLegacy

HauhauCS Qwen3.5 9B Uncensored HauhauCS Aggressive

9B0K ctx5 GB

denseLegacy

Bartowski gemma 2 2b it

2B0K ctx1.1 GB

denseLegacy

Unsloth Qwen3.5 122B A10B

122B0K ctx68.3 GB

denseLegacy

Bartowski Meta Llama 3.1 8B Instruct

8B0K ctx4.5 GB

denseLegacy

DeepSeek DeepSeek V3 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

Mistral Mixtral 8x22B

141B (39B active)66K ctx79 GBcurrent

moeLegacy

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest

Alibaba Qwen 2.5 72B

72B131K ctx40.3 GBcurrent

denseLegacy

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

Alibaba Qwen 3 235B A22B

235B (22B active)131K ctx131.6 GBfrontier

moeLegacy

We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507, featuring the following key enhancements:

Bartowski Llama 3.2 3B Instruct

3B0K ctx1.7 GB

denseLegacy

Unsloth Qwen3.5 4B

4B0K ctx2.2 GB

denseLegacy

TheBloke Llama 2 7B Chat

7B0K ctx3.9 GB

denseLegacy

Xtuner llava llama 3 8b v1 1

8B0K ctx4.5 GB

denseLegacy

Unsloth Qwen3.5 397B A17B

397B0K ctx222.3 GB

denseLegacy

Hugging-quants Llama 3.2 1B Instruct Q8 0

1B0K ctx0.6 GB

denseLegacy

Meta Llama 3.3 70B

70B128K ctx39.2 GBcurrent

denseLegacy

Llama 3.3 70B is Meta's most capable single-GPU-class model, offering improved reasoning and instruction following over Llama 3.1 70B. Supports 128K context with enhanced multilingual and code capabilities.

Browse AI Models

283 models available

/

Status:

Sort:

Filtered by:

DeepSeek DeepSeek R1 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

Z.ai GLM-5

744B (40B active)200K ctx416.6 GBfrontier

moeLegacy

📍 Use GLM-5 API services on Z.ai API Platform.

Moonshot AI Kimi K2.5

1000B (32B active)256K ctx560 GBfrontier

moeLegacy

Mistral Mistral Large 3

675B (41B active)256K ctx378 GBfrontier

moeLegacy

Mistral Mistral Small 4 119B

119B (6.5B active)256K ctx66.6 GBfrontier

moeLegacy

Unsloth Qwen3.5 27B

27B0K ctx15.1 GB

denseLegacy

Unsloth Qwen3.5 35B A3B

35B0K ctx19.6 GB

denseLegacy

Unsloth Qwen3.5 9B

9B0K ctx5 GB

denseLegacy

Google gemma 2b

2B0K ctx1.1 GB

denseLegacy

HauhauCS Qwen3.5 9B Uncensored HauhauCS Aggressive

9B0K ctx5 GB

denseLegacy

Bartowski gemma 2 2b it

2B0K ctx1.1 GB

denseLegacy

Unsloth Qwen3.5 122B A10B

122B0K ctx68.3 GB

denseLegacy

Bartowski Meta Llama 3.1 8B Instruct

8B0K ctx4.5 GB

denseLegacy

DeepSeek DeepSeek V3 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

Mistral Mixtral 8x22B

141B (39B active)66K ctx79 GBcurrent

moeLegacy

Alibaba Qwen 2.5 72B

72B131K ctx40.3 GBcurrent

denseLegacy

Alibaba Qwen 3 235B A22B

235B (22B active)131K ctx131.6 GBfrontier

moeLegacy

We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507, featuring the following key enhancements:

Bartowski Llama 3.2 3B Instruct

3B0K ctx1.7 GB

denseLegacy

Unsloth Qwen3.5 4B

4B0K ctx2.2 GB

denseLegacy

TheBloke Llama 2 7B Chat

7B0K ctx3.9 GB

denseLegacy

Xtuner llava llama 3 8b v1 1

8B0K ctx4.5 GB

denseLegacy