DeepSeek reportedly failed to utilize its primary model following a $294,000 investment
In a groundbreaking move, Chinese AI company DeepSeek has published a research report in the prestigious journal Nature, detailing their latest model, DeepSeek V3. The report has caused a stir in the AI community, with some claiming that the model was substantially cheaper and more efficient to train than Western counterparts. However, a closer look at the facts reveals a more nuanced picture.
The initial $300,000 figure, which has been widely publicised, does not account for the entire end-to-end model training cost. This figure is based on the assumption that H800 GPUs could be rented for $2/hr. In reality, the true cost to train the model was at least 20 times that initial estimate.
The DeepSeek V3 model, while larger than Llama 4 Maverick, used significantly fewer training tokens at 14.8 trillion. This is an impressive feat, considering that Llama 4 required between 22 and 40 trillion tokens and was trained on between 2.38M and 5M GPU hours.
In terms of compute, DeepSeek V3 and R1 are roughly comparable to Meta's Llama 4. The DeepSeek V3 model was trained on 2,048 H800 GPUs for approximately two months. This training process also included about 5,000 GPU hours to generate supervised fine-tuning datasets. The total model required 2.79 million GPU hours at an estimated cost of $5.58 million.
It's important to note that the purchase cost of the 256 GPU servers used to train the models is estimated to be over $51 million. This cost, however, is not included in the initial $300,000 figure.
The confusion arose from supplementary information released alongside the original January paper, which stated the AI model used 64 eight-way H800 boxes totaling 512 GPUs for training. This led some to believe the cost was only $294,000 USD.
The paper focuses on the application of reinforcement learning to imbue the existing V3 base model with 'reasoning' or 'thinking' capabilities. The reinforcement learning process is a post-training process that typically involves reinforcing stepwise reasoning by rewarding models for correct answers. It's worth noting that the researchers had already completed about 95 percent of the work before reaching the reinforcement learning phase detailed in the paper.
In conclusion, while DeepSeek V3's training cost is significantly lower than that of Western models, the initial $300,000 figure does not tell the whole story. The actual cost of the DeepSeek models may be higher due to factors like research and development, data acquisition, data cleaning, and potential false starts or wrong turns. Nevertheless, DeepSeek's achievement is a testament to the progress being made in the AI field, particularly in regions outside of the traditional AI powerhouses.
Read also:
- Exploring Harry Potter's Lineage: Decoding the Enigma of His Half-Blood Ancestry
- Elon Musk Acquires 26,400 Megawatt Gas Turbines for Powering His AI Project, Overlooks Necessary Permits for Operation!
- Predictive modeling introduced in DP World's automotive supply chain operations
- U Power's strategic collaborator UNEX EV has inked a Letter of Intent with Didi Mobility to deploy UOTTA(TM) battery-swapping electric vehicles in Mexico.