84 models available
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.
Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
DeepSeek R1 Distill Qwen 7B is a 7B-parameter reasoning model distilled from the larger DeepSeek-R1. Based on Qwen2.5-Math-7B and fine-tuned on 800K samples from DeepSeek-R1, it delivers strong reasoning with 92.8% on MATH-500 and 49.1 on GPQA Diamond while being far more efficient than the full 671B model.
Nemotron Nano 8B is NVIDIA's reasoning model derived from Llama 3.1 8B Instruct, post-trained for switchable reasoning with on/off modes. Achieves 95.4% on MATH-500 and 54.1% on GPQA Diamond with reasoning enabled. Fits on a single RTX GPU for local deployment.
The Phi-3-Medium-128K-Instruct is a 14B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Medium version in two variants 4k and 128K which is the context length (in tokens) that it can support.
Solar 7B is Upstage's efficient language model built on a depth-upscaled architecture. Offers strong instruction following and reasoning performance optimized for single-GPU inference.
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Baichuan-13B-Chat为Baichuan-13B系列模型中对齐后的版本,预训练模型可见Baichuan-13B-Base。
InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities. - It provides a versatile toolset for users to flexibly build their own workflows.
The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
Nous Hermes is a fine-tuned model optimized for instruction following and helpful dialogue. Trained on curated datasets emphasizing quality responses, reasoning, and user alignment.
> [!Warning] > > > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >
SmolLM3 is a fully open 3B-parameter language model with dual-mode reasoning, 128K context via YARN extrapolation, and native support for 6 languages. Pretrained on 11.2T tokens with a staged curriculum of web, code, math, and reasoning data. Post-trained with 140B reasoning tokens and Anchored Preference Optimization.
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.
Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the Apache 2.0 license.
InternLM has open-sourced a 7 billion parameter base model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It provides a versatile toolset for users to flexibly build their own workflows.