Good folks at Epoch AI have just released their most comprehensive database yet, tracking over 800 state-of-the-art and historically notable AI models. This incredible resource provides key insights into the factors driving machine learning progress.
Since 2010, the training compute used to create AI models has been growing at a staggering rate of 4.1x per year. That means the computational power behind these models is doubling roughly every six months! And it's not just the compute that's increasing - the costs are too. Training compute costs for the largest models are doubling every nine months, with the most advanced models now costing hundreds of millions of dollars.
Interestingly, training compute has scaled up faster for language models compared to vision. While the largest vision and language models had similar compute requirements before 2020, language models have since rapidly outpaced vision models, driven by the success of transformer architectures. The size of datasets used to train language models is also doubling approximately every eight months.
Another fascinating trend is that the length of time spent training notable models is growing by about 1.2x per year. While longer training times could ease hardware constraints, there is a tradeoff to consider. For very long runs, waiting for algorithmic and hardware improvements might be more beneficial than simply extending training.
If this continues, by 2028, we will reach cluster prices in the 100 billion dollars, using 10GW of power!