Browse AI Models

84 models available

/

Status:

Sort:

Filtered by:

Jamesburton Phi 4 reasoning vision 15B

15B0K ctx8.4 GB

denseLegacy

Mradermacher Dolphin Mistral GLM 4.7 Flash 24B Venice Edition Thinking Uncensored i1

24B0K ctx13.4 GB

denseLegacy

DeepSeek DeepSeek R1 Distill 14B

14B33K ctx7.8 GBfrontier

denseLegacy

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

DeepSeek DeepSeek LLM 67B

67B4K ctx37.5 GBlegacy

denseLegacy

Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

DavidAU Qwen3 48B A4B Savant Commander Distill 12X Closed Open Heretic Uncensored

48B0K ctx26.9 GB

denseLegacy

NousResearch Hermes 4.3 36B

36B0K ctx20.2 GB

denseLegacy

Alibaba Qwen 3 14B

14B131K ctx7.8 GBfrontier

denseLegacy

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:

Ai21labs AI21 Jamba Reasoning 3B

3B0K ctx1.7 GB

denseLegacy

DeepSeek DeepSeek R1 Distill 7B

7B33K ctx3.9 GBactive

denseLegacy

DeepSeek R1 Distill Qwen 7B is a 7B-parameter reasoning model distilled from the larger DeepSeek-R1. Based on Qwen2.5-Math-7B and fine-tuned on 800K samples from DeepSeek-R1, it delivers strong reasoning with 92.8% on MATH-500 and 49.1 on GPQA Diamond while being far more efficient than the full 671B model.

NVIDIA Nemotron Nano 8B

8B131K ctx4.5 GBactive

denseLegacy

Nemotron Nano 8B is NVIDIA's reasoning model derived from Llama 3.1 8B Instruct, post-trained for switchable reasoning with on/off modes. Achieves 95.4% on MATH-500 and 54.1% on GPQA Diamond with reasoning enabled. Fits on a single RTX GPU for local deployment.

Microsoft Phi 3 Medium 14B

14B128K ctx7.8 GBcurrent

denseLegacy

The Phi-3-Medium-128K-Instruct is a 14B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Medium version in two variants 4k and 128K which is the context length (in tokens) that it can support.

Microsoft Phi-4 14B

14B16K ctx7.8 GBcurrent

denseLegacy

Our training data is an extension of the data used for Phi-3 and includes a wide variety of sources from:

Instinct AI Solar 7B

7B8K ctx3.9 GBlegacy

denseLegacy

Solar 7B is Upstage's efficient language model built on a depth-upscaled architecture. Offers strong instruction following and reasoning performance optimized for single-GPU inference.

Alibaba Qwen 2.5 14B

14B131K ctx7.8 GBcurrent

denseLegacy

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

Bartowski ai21labs AI21 Jamba Reasoning 3B

3B0K ctx1.7 GB

denseLegacy

Baichuan Baichuan 13B

13B8K ctx7.3 GBlegacy

denseLegacy

Baichuan-13B-Chat为Baichuan-13B系列模型中对齐后的版本，预训练模型可见Baichuan-13B-Base。

InternLM InternLM Chat 7B

7B8K ctx3.9 GBlegacy

denseLegacy

InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities. - It provides a versatile toolset for users to flexibly build their own workflows.

Mistral Mistral 7B Instruct v0.3

7B8K ctx3.9 GBlegacy

denseLegacy

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.

Nous Research Nous Hermes 1.0

9B16K ctx5 GBlegacy

denseLegacy

Nous Hermes is a fine-tuned model optimized for instruction following and helpful dialogue. Trained on curated datasets emphasizing quality responses, reasoning, and user alignment.

Alibaba Qwen 2.5 Math 7B

7B4K ctx3.9 GBcurrent

denseLegacy

> [!Warning] > > > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >

HuggingFace SmolLM3 3B

3B128K ctx1.7 GBactive

denseLegacy

SmolLM3 is a fully open 3B-parameter language model with dual-mode reasoning, 128K context via YARN extrapolation, and native support for 6 languages. Pretrained on 11.2T tokens with a staged curriculum of web, code, math, and reasoning data. Post-trained with 140B reasoning tokens and Anchored Preference Optimization.

DeepSeek DeepSeek R1 Distill 8B

8B33K ctx4.5 GBfrontier

denseLegacy

TII Falcon 40B Instruct

40B8K ctx22.4 GBlegacy

denseLegacy

Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the Apache 2.0 license.

InternLM InternLM 7B

7B8K ctx3.9 GBlegacy

denseLegacy

InternLM has open-sourced a 7 billion parameter base model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It provides a versatile toolset for users to flexibly build their own workflows.

Browse AI Models

84 models available

/

Status:

Sort:

Filtered by:

Jamesburton Phi 4 reasoning vision 15B

15B0K ctx8.4 GB

denseLegacy

Mradermacher Dolphin Mistral GLM 4.7 Flash 24B Venice Edition Thinking Uncensored i1

24B0K ctx13.4 GB

denseLegacy

DeepSeek DeepSeek R1 Distill 14B

14B33K ctx7.8 GBfrontier

denseLegacy

DeepSeek DeepSeek LLM 67B

67B4K ctx37.5 GBlegacy

denseLegacy

DavidAU Qwen3 48B A4B Savant Commander Distill 12X Closed Open Heretic Uncensored

48B0K ctx26.9 GB

denseLegacy

NousResearch Hermes 4.3 36B

36B0K ctx20.2 GB

denseLegacy

Alibaba Qwen 3 14B

14B131K ctx7.8 GBfrontier

denseLegacy

Ai21labs AI21 Jamba Reasoning 3B

3B0K ctx1.7 GB

denseLegacy

DeepSeek DeepSeek R1 Distill 7B

7B33K ctx3.9 GBactive

denseLegacy

NVIDIA Nemotron Nano 8B

8B131K ctx4.5 GBactive

denseLegacy

Microsoft Phi 3 Medium 14B

14B128K ctx7.8 GBcurrent

denseLegacy

Microsoft Phi-4 14B

14B16K ctx7.8 GBcurrent

denseLegacy

Our training data is an extension of the data used for Phi-3 and includes a wide variety of sources from:

Instinct AI Solar 7B

7B8K ctx3.9 GBlegacy

denseLegacy

Solar 7B is Upstage's efficient language model built on a depth-upscaled architecture. Offers strong instruction following and reasoning performance optimized for single-GPU inference.

Alibaba Qwen 2.5 14B

14B131K ctx7.8 GBcurrent

denseLegacy

Bartowski ai21labs AI21 Jamba Reasoning 3B

3B0K ctx1.7 GB

denseLegacy

Baichuan Baichuan 13B

13B8K ctx7.3 GBlegacy

denseLegacy

Baichuan-13B-Chat为Baichuan-13B系列模型中对齐后的版本，预训练模型可见Baichuan-13B-Base。

InternLM InternLM Chat 7B

7B8K ctx3.9 GBlegacy

denseLegacy

Mistral Mistral 7B Instruct v0.3

7B8K ctx3.9 GBlegacy

denseLegacy

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.

Nous Research Nous Hermes 1.0

9B16K ctx5 GBlegacy

denseLegacy

Nous Hermes is a fine-tuned model optimized for instruction following and helpful dialogue. Trained on curated datasets emphasizing quality responses, reasoning, and user alignment.

Alibaba Qwen 2.5 Math 7B

7B4K ctx3.9 GBcurrent

denseLegacy

> [!Warning] > > > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >

HuggingFace SmolLM3 3B

3B128K ctx1.7 GBactive

denseLegacy

DeepSeek DeepSeek R1 Distill 8B

8B33K ctx4.5 GBfrontier

denseLegacy

TII Falcon 40B Instruct

40B8K ctx22.4 GBlegacy

denseLegacy

Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the Apache 2.0 license.

InternLM InternLM 7B

7B8K ctx3.9 GBlegacy

denseLegacy