Browse AI Models

84 models available

/

Status:

Sort:

Filtered by:

MosaicML MPT-30B-Instruct

30B8K ctx16.8 GBlegacy

denseLegacy

MPT-30B Instruct is MosaicML's large instruction-tuned model offering strong reasoning and generation quality. Features 8K context with ALiBi encoding and efficient inference optimizations.

Alibaba Qwen 3 8B

8B131K ctx4.5 GBfrontier

denseLegacy

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:

Baichuan Baichuan 7B

7B8K ctx3.9 GBlegacy

denseLegacy

Baichuan-7B是由百川智能开发的一个开源的大规模预训练模型。基于Transformer结构，在大约1.2万亿tokens上训练的70亿参数模型，支持中英双语，上下文窗口长度为4096。在标准的中文和英文权威benchmark（C-EVAL/MMLU）上均取得同尺寸最好的效果。

WizardLM WizardLM 13B

13B8K ctx7.3 GBlegacy

denseLegacy

Project Repo: https://github.com/nlpxucan/WizardLM

WizardLM WizardMath 7B

7B4K ctx3.9 GBlegacy

denseLegacy

📃 [WizardLM] • 📃 [WizardCoder] • 📃 [WizardMath]

Cerebras Cerebras-GPT 13B

13B131K ctx7.3 GBlegacy

denseLegacy

Check out our Blog Post and arXiv paper!

Microsoft Phi 4 Mini 4B

4B128K ctx2.2 GBfrontier

denseLegacy

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures.

MosaicML MPT-7B-Instruct

7B8K ctx3.9 GBlegacy

denseLegacy

MPT-7B Instruct is MosaicML's instruction-tuned model with a commercially permissive license. Supports 65K context with ALiBi positional encoding for efficient long-document processing.

DeepSeek DeepSeek R1 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy

DeepSeek R1 Distill Qwen 1.5B is a compact reasoning model distilled from DeepSeek-R1, based on Qwen2.5-Math-1.5B. Fine-tuned on 800K curated samples, it achieves 83.9% on MATH-500 and supports chain-of-thought reasoning on resource-constrained devices.

TII Falcon 7B Instruct

7B8K ctx3.9 GBlegacy

denseLegacy

Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.

Google Gemma 3 4B

4B128K ctx2.2 GBcurrent

denseLegacy

Gemma 3 4B is Google's efficient Gemma 3 model supporting vision and text. Ideal for on-device applications requiring multimodal understanding with fast inference speeds.

Alibaba Qwen 3 4B

4B33K ctx2.2 GBcurrent

denseLegacy

We introduce the updated version of the Qwen3-4B non-thinking mode, named Qwen3-4B-Instruct-2507, featuring the following key enhancements:

Browse AI Models

84 models available

/

Status:

Sort:

Filtered by:

MosaicML MPT-30B-Instruct

30B8K ctx16.8 GBlegacy

denseLegacy

MPT-30B Instruct is MosaicML's large instruction-tuned model offering strong reasoning and generation quality. Features 8K context with ALiBi encoding and efficient inference optimizations.

Alibaba Qwen 3 8B

8B131K ctx4.5 GBfrontier

denseLegacy

Baichuan Baichuan 7B

7B8K ctx3.9 GBlegacy

denseLegacy

WizardLM WizardLM 13B

13B8K ctx7.3 GBlegacy

denseLegacy

Project Repo: https://github.com/nlpxucan/WizardLM

WizardLM WizardMath 7B

7B4K ctx3.9 GBlegacy

denseLegacy

📃 [WizardLM] • 📃 [WizardCoder] • 📃 [WizardMath]

Cerebras Cerebras-GPT 13B

13B131K ctx7.3 GBlegacy

denseLegacy

Check out our Blog Post and arXiv paper!

Microsoft Phi 4 Mini 4B

4B128K ctx2.2 GBfrontier

denseLegacy

MosaicML MPT-7B-Instruct

7B8K ctx3.9 GBlegacy

denseLegacy

MPT-7B Instruct is MosaicML's instruction-tuned model with a commercially permissive license. Supports 65K context with ALiBi positional encoding for efficient long-document processing.

DeepSeek DeepSeek R1 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy

TII Falcon 7B Instruct

7B8K ctx3.9 GBlegacy

denseLegacy

Google Gemma 3 4B

4B128K ctx2.2 GBcurrent

denseLegacy

Gemma 3 4B is Google's efficient Gemma 3 model supporting vision and text. Ideal for on-device applications requiring multimodal understanding with fast inference speeds.

Alibaba Qwen 3 4B

4B33K ctx2.2 GBcurrent

denseLegacy

We introduce the updated version of the Qwen3-4B non-thinking mode, named Qwen3-4B-Instruct-2507, featuring the following key enhancements: