Browse AI Models

We introduce two new state-of-the-art models for local intelligence, on-device computing, and at-the-edge use cases. We call them les Ministraux: Ministral 3B and Ministral 8B.

BigCode StarCoder2 7B

7B16K ctx3.9 GBcurrent

denseLegacy

- Project Website: bigcode-project.org - Paper: Link - Point of Contact: contact@bigcode-project.org - Languages: 17 Programming languages

Mradermacher BaichuanMed OCR 72B i1

72B0K ctx40.3 GB

denseLegacy

Google Gemma 2 9B

9B8K ctx5 GBcurrent

denseLegacy

Gemma 2 9B is Google's mid-size open model built on Gemini research. Features improved reasoning and safety with a novel architecture optimized for efficient inference on consumer hardware.

IBM Granite 3.1 8B

8B128K ctx4.5 GBcurrent

denseLegacy

Model Summary: Granite-3.1-8B-Instruct is a 8B parameter long-context instruct model finetuned from Granite-3.1-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging.

LLaVA LLaVA 1.5 7B

7B4K ctx3.9 GBlegacy

denseLegacy

Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

Cognitive Computations Samantha 7B

7B4K ctx3.9 GBlegacy

denseLegacy

Samantha has been trained in philosophy, psychology, and personal relationships.

WizardLM WizardMath 7B

7B4K ctx3.9 GBlegacy

denseLegacy

📃 [WizardLM] • 📃 [WizardCoder] • 📃 [WizardMath]

01.AI Yi 1.5 9B

9B4K ctx5 GBcurrent

denseLegacy

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Cerebras Cerebras-GPT 13B

13B131K ctx7.3 GBlegacy

denseLegacy

Check out our Blog Post and arXiv paper!

Mistral Ministral 3 3B

3B262K ctx1.7 GBfrontier

multimodalLegacy

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

Microsoft Phi 4 Mini 4B

4B128K ctx2.2 GBfrontier

denseLegacy

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures.

Alibaba Qwen 2.5 7B

7B131K ctx3.9 GBcurrent

denseLegacy

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

Meta CodeLlama 7B Instruct

7B16K ctx3.9 GBlegacy

denseLegacy

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 7B instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.

MosaicML MPT-7B-Instruct

7B8K ctx3.9 GBlegacy

denseLegacy

MPT-7B Instruct is MosaicML's instruction-tuned model with a commercially permissive license. Supports 65K context with ALiBi positional encoding for efficient long-document processing.

Stability AI StableLM 2 12B

12B4K ctx6.7 GBlegacy

denseLegacy

`Stable LM 2 12B Chat` is a 12 billion parameter instruction tuned language model trained on a mix of publicly available datasets and synthetic datasets, utilizing Direct Preference Optimization (DPO).

DeepSeek DeepSeek LLM 7B

7B4K ctx3.9 GBlegacy

denseLegacy

Introducing DeepSeek LLM, an advanced language model comprising 7 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

DeepSeek DeepSeek R1 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy

DeepSeek R1 Distill Qwen 1.5B is a compact reasoning model distilled from DeepSeek-R1, based on Qwen2.5-Math-1.5B. Fine-tuned on 800K curated samples, it achieves 83.9% on MATH-500 and supports chain-of-thought reasoning on resource-constrained devices.

Meta Llama 3.1 8B

8B128K ctx4.5 GBlegacy

denseLegacy

Llama 3.1 8B is Meta's efficient general-purpose model supporting 128K context and multilingual text generation. Optimized for dialogue, summarization, reasoning, and code generation tasks.

Browse AI Models

328 models available

/

Status:

Sort:

Mradermacher Codestral RAG 19B Pruned i1

19B0K ctx10.6 GB

denseLegacy

Mradermacher blossom v3 baichuan2 7b i1

7B0K ctx3.9 GB

denseLegacy

Mradermacher Helply 10.2b chat i1

10.2B0K ctx5.7 GB

denseLegacy

Mradermacher AI21 Jamba2 3B i1

3B0K ctx1.7 GB

denseLegacy

Mradermacher blossom v1 baichuan 7b i1

7B0K ctx3.9 GB

denseLegacy

Mistral Ministral 8B

8B131K ctx4.5 GBcurrent

denseLegacy

We introduce two new state-of-the-art models for local intelligence, on-device computing, and at-the-edge use cases. We call them les Ministraux: Ministral 3B and Ministral 8B.

BigCode StarCoder2 7B

7B16K ctx3.9 GBcurrent

denseLegacy

- Project Website: bigcode-project.org - Paper: Link - Point of Contact: contact@bigcode-project.org - Languages: 17 Programming languages

Mradermacher BaichuanMed OCR 72B i1

72B0K ctx40.3 GB

denseLegacy

Google Gemma 2 9B

9B8K ctx5 GBcurrent

denseLegacy

Gemma 2 9B is Google's mid-size open model built on Gemini research. Features improved reasoning and safety with a novel architecture optimized for efficient inference on consumer hardware.

IBM Granite 3.1 8B

8B128K ctx4.5 GBcurrent

denseLegacy

LLaVA LLaVA 1.5 7B

7B4K ctx3.9 GBlegacy

denseLegacy

Cognitive Computations Samantha 7B

7B4K ctx3.9 GBlegacy

denseLegacy

Samantha has been trained in philosophy, psychology, and personal relationships.

WizardLM WizardMath 7B

7B4K ctx3.9 GBlegacy

denseLegacy

📃 [WizardLM] • 📃 [WizardCoder] • 📃 [WizardMath]

01.AI Yi 1.5 9B

9B4K ctx5 GBcurrent

denseLegacy

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Cerebras Cerebras-GPT 13B

13B131K ctx7.3 GBlegacy

denseLegacy

Check out our Blog Post and arXiv paper!

Mistral Ministral 3 3B

3B262K ctx1.7 GBfrontier

multimodalLegacy

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

Microsoft Phi 4 Mini 4B

4B128K ctx2.2 GBfrontier

denseLegacy

Alibaba Qwen 2.5 7B

7B131K ctx3.9 GBcurrent

denseLegacy

Meta CodeLlama 7B Instruct

7B16K ctx3.9 GBlegacy

denseLegacy

MosaicML MPT-7B-Instruct

7B8K ctx3.9 GBlegacy

denseLegacy

MPT-7B Instruct is MosaicML's instruction-tuned model with a commercially permissive license. Supports 65K context with ALiBi positional encoding for efficient long-document processing.

Stability AI StableLM 2 12B

12B4K ctx6.7 GBlegacy

denseLegacy

DeepSeek DeepSeek LLM 7B

7B4K ctx3.9 GBlegacy

denseLegacy

DeepSeek DeepSeek R1 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy

Meta Llama 3.1 8B

8B128K ctx4.5 GBlegacy

denseLegacy

Llama 3.1 8B is Meta's efficient general-purpose model supporting 128K context and multilingual text generation. Optimized for dialogue, summarization, reasoning, and code generation tasks.