Browse AI Models

328 models available

/

Status:

Sort:

LMSYS Vicuna 7B

7B4K ctx3.9 GBlegacy

denseLegacy

Vicuna is a chat assistant trained by fine-tuning Llama 2 on user-shared conversations collected from ShareGPT.

TII Falcon 7B Instruct

7B8K ctx3.9 GBlegacy

denseLegacy

Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.

Google Gemma 3 4B

4B128K ctx2.2 GBcurrent

denseLegacy

Gemma 3 4B is Google's efficient Gemma 3 model supporting vision and text. Ideal for on-device applications requiring multimodal understanding with fast inference speeds.

Microsoft Phi 3.5 Mini 4B

4B128K ctx2.2 GBlegacy

denseLegacy

Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.

01.AI Yi 1.5 6B

6B4K ctx3.4 GBcurrent

denseLegacy

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Microsoft Phi 3 Mini 3.8B

3.8B128K ctx2.1 GBcurrent

denseLegacy

The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support.

Alibaba Qwen 2.5 Coder 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy

Qwen 2.5 Coder 1.5B is Alibaba's compact code-specific language model from the Qwen2.5 Coder series. Trained on 5.5T tokens including source code, text-code grounding, and synthetic data. Features improvements in code generation, reasoning, and fixing while maintaining general and math capabilities.

NVIDIA Nemotron Mini 4B

4B4K ctx2.2 GBcurrent

denseLegacy

Nemotron-Mini-4B-Instruct is a model for generating responses for roleplaying, retrieval augmented generation, and function calling. It is a small language model (SLM) optimized through distillation, pruning and quantization for speed and on-device deployment. It is a fine-tuned version of nvidia/Minitron-4B-Base, which was pruned and distilled from Nemotron-4 15B using our LLM compression technique. This instruct model is optimized for roleplay, RAG QA, and function calling in English. It supports a context length of 4,096 tokens. This model is ready for commercial use.

Alibaba Qwen 3 4B

4B33K ctx2.2 GBcurrent

denseLegacy

We introduce the updated version of the Qwen3-4B non-thinking mode, named Qwen3-4B-Instruct-2507, featuring the following key enhancements:

Meta Llama 3.2 3B

3B128K ctx1.7 GBlegacy

denseLegacy

Llama 3.2 3B is Meta's compact multilingual text model optimized for edge and mobile deployment. Supports summarization, instruction following, and text generation with strong performance for its size class.

TinyLlama TinyLlama 1.1B

1.1B4K ctx0.6 GBlegacy

denseLegacy

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.

Google Gemma 2 2B

2B8K ctx1.1 GBcurrent

denseLegacy

Gemma 2 2B is Google's lightweight model designed for on-device and edge deployment. Delivers strong text generation and reasoning performance at minimal resource cost.

Meta Llama 3.2 1B

1B128K ctx0.6 GBlegacy

denseLegacy

Llama 3.2 1B is Meta's smallest text model designed for on-device inference. Optimized for multilingual text generation, summarization, and instruction following on resource-constrained hardware.

Alibaba Qwen 3 1.7B

1.7B33K ctx1 GBfrontier

denseLegacy

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:

Google Gemma 3 1B

1B33K ctx0.6 GBcurrent

denseLegacy

Gemma 3 1B is Google's ultra-compact model from the Gemma 3 family. Optimized for mobile and edge inference with surprisingly capable text generation for its parameter count.

Alibaba Qwen 3 0.6B

0.6B33K ctx0.3 GBfrontier

denseLegacy

Browse AI Models

328 models available

/

Status:

Sort:

LMSYS Vicuna 7B

7B4K ctx3.9 GBlegacy

denseLegacy

Vicuna is a chat assistant trained by fine-tuning Llama 2 on user-shared conversations collected from ShareGPT.

TII Falcon 7B Instruct

7B8K ctx3.9 GBlegacy

denseLegacy

Google Gemma 3 4B

4B128K ctx2.2 GBcurrent

denseLegacy

Gemma 3 4B is Google's efficient Gemma 3 model supporting vision and text. Ideal for on-device applications requiring multimodal understanding with fast inference speeds.

Microsoft Phi 3.5 Mini 4B

4B128K ctx2.2 GBlegacy

denseLegacy

01.AI Yi 1.5 6B

6B4K ctx3.4 GBcurrent

denseLegacy

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Microsoft Phi 3 Mini 3.8B

3.8B128K ctx2.1 GBcurrent

denseLegacy

Alibaba Qwen 2.5 Coder 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy

NVIDIA Nemotron Mini 4B

4B4K ctx2.2 GBcurrent

denseLegacy

Alibaba Qwen 3 4B

4B33K ctx2.2 GBcurrent

denseLegacy

We introduce the updated version of the Qwen3-4B non-thinking mode, named Qwen3-4B-Instruct-2507, featuring the following key enhancements:

Meta Llama 3.2 3B

3B128K ctx1.7 GBlegacy

denseLegacy

TinyLlama TinyLlama 1.1B

1.1B4K ctx0.6 GBlegacy

denseLegacy

Google Gemma 2 2B

2B8K ctx1.1 GBcurrent

denseLegacy

Gemma 2 2B is Google's lightweight model designed for on-device and edge deployment. Delivers strong text generation and reasoning performance at minimal resource cost.

Meta Llama 3.2 1B

1B128K ctx0.6 GBlegacy

denseLegacy

Llama 3.2 1B is Meta's smallest text model designed for on-device inference. Optimized for multilingual text generation, summarization, and instruction following on resource-constrained hardware.

Alibaba Qwen 3 1.7B

1.7B33K ctx1 GBfrontier

denseLegacy

Google Gemma 3 1B

1B33K ctx0.6 GBcurrent

denseLegacy

Gemma 3 1B is Google's ultra-compact model from the Gemma 3 family. Optimized for mobile and edge inference with surprisingly capable text generation for its parameter count.

Alibaba Qwen 3 0.6B

0.6B33K ctx0.3 GBfrontier

denseLegacy