328 models available
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest
GPT-OSS 20B is OpenAI's first open-weight model, a 21B-parameter mixture-of-experts model with 3.6B active parameters per token. Features configurable reasoning effort (low/medium/high), full chain-of-thought visibility, and agentic capabilities including function calling. Runs on devices with 16GB of memory using MXFP4 quantization.
InternLM2.5 has open-sourced a 20 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:
Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
StarCoder 7B is BigCode's code generation model trained on The Stack v1. Supports over 80 programming languages with fill-in-the-middle capability and 8K context window.
Aya Expanse 8B is Cohere's multilingual model supporting 23 languages with strong cross-lingual transfer. Designed for global applications requiring high-quality generation across diverse languages.
Gemma 2 27B is Google's largest Gemma 2 model, offering state-of-the-art performance among open models of similar size. Built on Gemini technology with strong reasoning, code, and multilingual capabilities.
The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.
Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models! This model is an instruction-fine-tuned version of the base model: Mistral-Small-24B-Base-2501.
- Model type: A 7B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets. - Language(s) (NLP): Primarily English - License: MIT - Finetuned from model: mistralai/Mistral-7B-v0.1