Post
3382
๐ฅ ๐๐ฐ๐๐ง ๐ซ๐๐ฅ๐๐๐ฌ๐๐ฌ ๐ญ๐ก๐๐ข๐ซ ๐.๐ ๐๐๐ฆ๐ข๐ฅ๐ฒ ๐จ๐ ๐ฆ๐จ๐๐๐ฅ๐ฌ: ๐๐๐ฐ ๐๐๐๐ ๐๐จ๐ซ ๐๐ฅ๐ฅ ๐ฌ๐ข๐ณ๐๐ฌ ๐ฎ๐ฉ ๐ญ๐จ ๐๐๐!
The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.
And they didn't sleep: the performance is top of the game for each weight category!
๐๐๐ฒ ๐ข๐ง๐ฌ๐ข๐ ๐ก๐ญ๐ฌ:
๐ All models have ๐ญ๐ฎ๐ด๐ธ ๐๐ผ๐ธ๐ฒ๐ป ๐ฐ๐ผ๐ป๐๐ฒ๐ ๐ ๐น๐ฒ๐ป๐ด๐๐ต
๐ Models pre-trained on 18T tokens, even longer than the 15T of Llama-3
๐ช The flagship ๐ค๐๐ฒ๐ป๐ฎ.๐ฑ-๐ณ๐ฎ๐ ๐ถ๐ ~๐ฐ๐ผ๐บ๐ฝ๐ฒ๐๐ถ๐๐ถ๐๐ฒ ๐๐ถ๐๐ต ๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ญ-๐ฐ๐ฌ๐ฑ๐, ๐ฎ๐ป๐ฑ ๐ต๐ฎ๐ ๐ฎ ๐ฏ-๐ฑ% ๐บ๐ฎ๐ฟ๐ด๐ถ๐ป ๐ผ๐ป ๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ญ-๐ณ๐ฌ๐ ๐ผ๐ป ๐บ๐ผ๐๐ ๐ฏ๐ฒ๐ป๐ฐ๐ต๐บ๐ฎ๐ฟ๐ธ๐.
๐ซ๐ท On top of this, it ๐๐ฎ๐ธ๐ฒ๐ ๐๐ต๐ฒ #๐ญ ๐๐ฝ๐ผ๐ ๐ผ๐ป ๐บ๐๐น๐๐ถ๐น๐ถ๐ป๐ด๐๐ฎ๐น ๐๐ฎ๐๐ธ๐ so it might become my standard for French
๐ป Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!
๐งฎ Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."
๐ Technical report to be released "very soon"
๐ All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"
๐ค All models are available on the HF Hub! โก๏ธ Qwen/qwen25-66e81a666513e518adb90d9e
The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.
And they didn't sleep: the performance is top of the game for each weight category!
๐๐๐ฒ ๐ข๐ง๐ฌ๐ข๐ ๐ก๐ญ๐ฌ:
๐ All models have ๐ญ๐ฎ๐ด๐ธ ๐๐ผ๐ธ๐ฒ๐ป ๐ฐ๐ผ๐ป๐๐ฒ๐ ๐ ๐น๐ฒ๐ป๐ด๐๐ต
๐ Models pre-trained on 18T tokens, even longer than the 15T of Llama-3
๐ช The flagship ๐ค๐๐ฒ๐ป๐ฎ.๐ฑ-๐ณ๐ฎ๐ ๐ถ๐ ~๐ฐ๐ผ๐บ๐ฝ๐ฒ๐๐ถ๐๐ถ๐๐ฒ ๐๐ถ๐๐ต ๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ญ-๐ฐ๐ฌ๐ฑ๐, ๐ฎ๐ป๐ฑ ๐ต๐ฎ๐ ๐ฎ ๐ฏ-๐ฑ% ๐บ๐ฎ๐ฟ๐ด๐ถ๐ป ๐ผ๐ป ๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ญ-๐ณ๐ฌ๐ ๐ผ๐ป ๐บ๐ผ๐๐ ๐ฏ๐ฒ๐ป๐ฐ๐ต๐บ๐ฎ๐ฟ๐ธ๐.
๐ซ๐ท On top of this, it ๐๐ฎ๐ธ๐ฒ๐ ๐๐ต๐ฒ #๐ญ ๐๐ฝ๐ผ๐ ๐ผ๐ป ๐บ๐๐น๐๐ถ๐น๐ถ๐ป๐ด๐๐ฎ๐น ๐๐ฎ๐๐ธ๐ so it might become my standard for French
๐ป Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!
๐งฎ Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."
๐ Technical report to be released "very soon"
๐ All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"
๐ค All models are available on the HF Hub! โก๏ธ Qwen/qwen25-66e81a666513e518adb90d9e