DeepSeek: A New Breakthrough and What We Can Learn

DeepSeek challenges the norm with a recent breakthrough — smaller models, bigger impact. Learn more to know what it means for the future of AI.

Greystack Technologies
5 min readJan 31, 2025

Just last week, DeepSeek, a Chinese AI startup, launched its groundbreaking AI model, DeepSeek-R1. This event sent shockwaves through the financial markets, leading to a historic $593 billion drop in Nvidia’s market value — the largest in U.S. stock market history. However, the implications of DeepSeek’s innovation extend beyond financial upheaval, signaling a potential paradigm shift in AI development, especially concerning developing smaller, more efficient models.

What is DeepSeek?

DeepSeek is an advanced large language model (LLM) designed to tackle various tasks from natural language processing to even software development.

Founded by Liang Wenfeng in Hangzhou, China, DeepSeek embarked on its AI journey with a focus on efficiency and accessibility. The company prioritized open-source development, culminating in the release of DeepSeek-R1 on January 20, 2025. This approach democratized AI, making it more accessible to a broader audience without the burden of heavy infrastructure costs.

Unlike traditional models that activate all parameters for any given task, DeepSeek employs a Mixture-of-Experts (MoE) system. This approach activates only 37 billion of its 671 billion parameters for specific tasks, significantly reducing computational costs while maintaining high performance.

Its efficient design challenges the prevailing notion that larger models are inherently superior. By activating only a fraction of its parameters for specific tasks, DeepSeek demonstrates that smaller, well-optimized models can achieve high performance. This development opens avenues for more accessible and cost-effective AI solutions, particularly benefiting smaller enterprises and research institutions.

Now, ever since its release, DeepSeek has quickly become one of the most downloaded models on AI open-source platforms. Its performance on benchmarks for math and coding tasks rivaled that of leading models like OpenAI’s o1. The model’s ability to deliver high performance with reduced computational resources marked a significant milestone in AI deployment.

DeepSeek-R1’s benchmark.

Where DeepSeek Succeeded

DeepSeek’s breakthrough success stems from its innovative approach to model training. Here’s how it diverges from traditional methods:

Standard Model Training

The standard AI model training process follows a two-step approach that ensures gradual improvement and alignment. The key steps include:

  • Pre-training: During this stage, the model acquires general language patterns by processing a vast amount of data from diverse sources.
  • Supervised Fine-Tuning (SFT): In this step, training experts meticulously select examples to train the model for specific tasks.

As reliable as SFT is, a few drawbacks have been revealed. The process of generating high-quality training data is time-consuming and labor-intensive. Additionally, the examples provided limit the model’s capabilities, and it lacks incentives to develop novel strategies.

How DeepSeek Trained R1

DeepSeek started by developing DeepSeek-R1-Zero, the pure RL model.

Initially, they experimented with a radical approach by training a model solely through reinforcement learning (RL) without any supervised pre-training. The model learned reasoning entirely through trial and error, receiving rewards based on its outputs.

This demonstrates that LLMs can develop reasoning skills independently and proves that labeled data is not always essential for this task. However, the model faced issues such as mixed-language responses and inconsistent output quality.

Then starts the multi-stage development of the DeepSeek-R1:

  1. Cold Start Fine-Tuning: To overcome the initial challenges, they created a compact but high-quality dataset featuring long Chain-of-Thought (CoT) examples to stabilize the model’s early development. This step allowed the model to grasp structured reasoning and produce more coherent outputs.
  2. Reasoning-Oriented Reinforcement Learning: After the initial fine-tuning, DeepSeek implemented RL once again, this time focusing on improving performance in areas like mathematical reasoning and programming tasks. They also introduced a consistency reward mechanism to encourage clearer, human-like language and eliminate mixed-language outputs.
  3. Rejection Sampling + Supervised Fine-Tuning: After RL improvements, DeepSeek generated a robust dataset by selecting only high-quality model outputs. This curated dataset was then used for supervised fine-tuning, enabling the model to gain expertise across various domains.
  4. Final RL Optimization: The final stage involved another round of RL training to fine-tune alignment and overall performance. This iterative, multi-stage approach ensured that DeepSeek R1 became a robust and high-performing model.

By guiding the model through carefully planned stages rather than relying solely on vast data inputs, DeepSeek was able to demonstrate a smarter and more effective approach to developing advanced AI capabilities.

What DeepSeek’s Breakthrough Tells Us

Small Models Are Now Feasible at a Fraction of the Cost

DeepSeek’s success proves that AI development doesn’t require billion-dollar investments. Optimizing training techniques is enough for smaller models to achieve impressive results. This breakthrough democratizes access to high-performance AI, opening doors for startups and independent researchers.

Reinforcement Learning as a Game-Changer

DeepSeek’s innovative use of reinforcement learning highlights its potential to revolutionize AI training. By integrating adaptive learning mechanisms, models can develop more nuanced reasoning capabilities.

High-Quality Datasets Accelerate AI Development

The quality of training data significantly influences model efficiency. DeepSeek’s strategic dataset curation contributed to its rapid progress. High-quality, well-structured datasets allowed the model to learn faster, reducing reliance on excessive computational power.

Human Labeling and Evaluation

R1-Zero may have been impressive on its own with pure reinforcement learning, however, when it came to training the R1 a key factor in DeepSeek’s success was its emphasis on human involvement in model evaluation and labeling.

Human evaluators played a crucial role in assessing model outputs, ensuring that the results met high-quality standards. By analyzing responses for logical consistency and usability, they helped fine-tune the system to deliver more accurate and human-like outputs.

Furthermore, rejection sampling was guided by expert evaluators to curate datasets for further fine-tuning, making the entire training process more efficient and targeted.

Looking Ahead

DeepSeek has undoubtedly set a new benchmark for AI development. Its cost-efficient approach challenges industry norms and inspires new possibilities. As AI continues to evolve, we’ll be closely monitoring DeepSeek’s impact on the AI landscape and its contributions to the broader tech ecosystem. The future of AI might be more accessible than ever.

If you’re ready to start ahead now, let’s hop on a call and discover how Adaptive Workstack can fast-track your AI project from concept to execution. Request a demo.

--

--

Greystack Technologies
Greystack Technologies

Written by Greystack Technologies

AI, Technology, Business, and Impactful Innovation. | https://greystack.co/

No responses yet