DeepSeek R1 0528 was released January 20, 2025 and breaks from conventional reasoning model training. Instead of relying on human-written reasoning traces, DeepSeek applied reinforcement learning directly to the base DeepSeek-V3 weights. Unconstrained RL let emergent behaviors like self-verification, self-reflection, and long chain-of-thought generation develop organically.
The architecture is a 671B Mixture-of-Experts (MoE) model that activates 37B parameters per forward pass. On AIME 2024, DeepSeek R1 0528 achieves 79.8% Pass@1, on par with OpenAI o1. On MATH-500 it reaches 97.3%. The release documentation also highlights strong code and general reasoning performance.
The MIT License is permissive: many proprietary reasoning models impose stricter restrictions. DeepSeek released six smaller derivatives alongside the full model. The 32B and 70B versions match OpenAI o1-mini performance, giving teams cost-efficient alternatives to the full 671B model.