Browse 300+ AI Models for Local Inference

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

DeepSeek DeepSeek V3 671B

671B (37B active)131K ctx375.8 GBfrontier

moeLegacy

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

DeepSeek DeepSeek Coder V2 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

DeepSeek DeepSeek R1 Distill 32B

32B33K ctx17.9 GBfrontier

denseLegacy

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

DeepSeek DeepSeek V2.5 236B

236B (21B active)131K ctx132.2 GBcurrent

moeLegacy

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit DeepSeek-V2 page for more information.

DeepSeek DeepSeek R1 Distill 14B

14B33K ctx7.8 GBfrontier

denseLegacy

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

DeepSeek DeepSeek Coder V2 16B

16B (2.4B active)131K ctx9 GBcurrent

moeLegacy

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

DeepSeek DeepSeek LLM 67B

67B4K ctx37.5 GBlegacy

denseLegacy

Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

DeepSeek DeepSeek R1 Distill 7B

7B33K ctx3.9 GBactive

denseLegacy

DeepSeek R1 Distill Qwen 7B is a 7B-parameter reasoning model distilled from the larger DeepSeek-R1. Based on Qwen2.5-Math-7B and fine-tuned on 800K samples from DeepSeek-R1, it delivers strong reasoning with 92.8% on MATH-500 and 49.1 on GPQA Diamond while being far more efficient than the full 671B model.

DeepSeek DeepSeek R1 Distill 8B

8B33K ctx4.5 GBfrontier

denseLegacy

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

DeepSeek DeepSeek LLM 7B

7B4K ctx3.9 GBlegacy

denseLegacy

Introducing DeepSeek LLM, an advanced language model comprising 7 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

DeepSeek DeepSeek R1 1.5B

1.5B33K ctx0.8 GBactive

denseLegacy

DeepSeek R1 Distill Qwen 1.5B is a compact reasoning model distilled from DeepSeek-R1, based on Qwen2.5-Math-1.5B. Fine-tuned on 800K curated samples, it achieves 83.9% on MATH-500 and supports chain-of-thought reasoning on resource-constrained devices.

Browse AI Models

Browse AI Models