Minje Jeon's picture
1 40

Minje Jeon

minjejeon
ยท

AI & ML interests

None yet

Recent Activity

Organizations

None yet

minjejeon's activity

reacted to m-ric's post with โค๏ธ 11 months ago
view post
Post
2071
[๐๐ž๐ฐ ๐๐š๐ฉ๐ž๐ซ] ๐€๐ฅ๐ฅ ๐ญ๐จ๐ค๐ž๐ง๐ฌ ๐ฌ๐ก๐จ๐ฎ๐ฅ๐ ๐ง๐จ๐ญ ๐ซ๐ž๐ช๐ฎ๐ข๐ซ๐ž ๐ญ๐ก๐ž ๐ฌ๐š๐ฆ๐ž ๐ž๐Ÿ๐Ÿ๐จ๐ซ๐ญ ๐ญ๐จ ๐œ๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž! โ‡’ ๐Œ๐ข๐ฑ๐ญ๐ฎ๐ซ๐ž ๐จ๐Ÿ ๐๐ž๐ฉ๐ญ๐ก๐ฌ ๐Ÿซง๐Ÿ 

Google Researchers were unhappy with the way current decoding generally works: all tokens go through the same layers, thus requiring exactly the same effort to compute.

Whereas in reality, completing the answer to a difficult math problem for instance should be more computationally intense than completing the text of the Declaration of Independence: ๐—ป๐—ผ๐˜ ๐—ฎ๐—น๐—น ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐—ฐ๐—ฟ๐—ฒ๐—ฎ๐˜๐—ฒ๐—ฑ ๐—ฒ๐—พ๐˜‚๐—ฎ๐—น!

โžก๏ธ ๐—ง๐—ต๐—ฒ๐˜† ๐—ต๐—ฎ๐—ฑ ๐˜๐—ต๐—ถ๐˜€ ๐—ด๐—ฒ๐—ป๐—ถ๐˜‚๐˜€ ๐—ถ๐—ฑ๐—ฒ๐—ฎ: ๐Ÿ’ก ๐—ต๐—ฎ๐˜ƒ๐—ถ๐—ป๐—ด ๐—ฎ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป ๐—ด๐—ผ ๐˜๐—ต๐—ฟ๐—ผ๐˜‚๐—ด๐—ต ๐—ฎ ๐—ฏ๐—น๐—ผ๐—ฐ๐—ธ ๐˜€๐—ต๐—ผ๐˜‚๐—น๐—ฑ ๐—ฏ๐—ฒ ๐—ผ๐—ฝ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น. The token can go through the block (thus undergoing expensive self-attention computation) or avoid it through a skip connection.
The routing decision is taken on the block level: each block selects from the total sequence the top-k tokens that will go through it, and the others tokens will skip it. ๐˜›๐˜ฉ๐˜ช๐˜ด ๐˜ข๐˜ญ๐˜ญ๐˜ฐ๐˜ธ๐˜ด ๐˜ต๐˜ฐ ๐˜ค๐˜ฉ๐˜ฐ๐˜ฐ๐˜ด๐˜ฆ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฆ๐˜น๐˜ข๐˜ค๐˜ต ๐™˜๐™–๐™ฅ๐™–๐™˜๐™ž๐™ฉ๐™ฎ ๐˜ฐ๐˜ง ๐˜ข ๐˜ฃ๐˜ญ๐˜ฐ๐˜ค๐˜ฌ, ๐˜ช.๐˜ฆ. ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฑ๐˜ณ๐˜ฐ๐˜ฑ๐˜ฐ๐˜ณ๐˜ต๐˜ช๐˜ฐ๐˜ฏ ๐˜ฐ๐˜ง ๐˜ต๐˜ฐ๐˜ฌ๐˜ฆ๐˜ฏ๐˜ด ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜จ๐˜ฐ ๐˜ต๐˜ฉ๐˜ณ๐˜ฐ๐˜ถ๐˜จ๐˜ฉ ๐˜ช๐˜ต, ๐˜ธ๐˜ฉ๐˜ช๐˜ค๐˜ฉ ๐˜ฅ๐˜ช๐˜ณ๐˜ฆ๐˜ค๐˜ต๐˜ญ๐˜บ ๐˜ช๐˜ฏ๐˜ง๐˜ญ๐˜ถ๐˜ฆ๐˜ฏ๐˜ค๐˜ฆ๐˜ด ๐˜ต๐˜ฉ๐˜ฆ ๐˜ค๐˜ฐ๐˜ฎ๐˜ฑ๐˜ถ๐˜ต๐˜ข๐˜ต๐˜ช๐˜ฐ๐˜ฏ๐˜ข๐˜ญ ๐˜ช๐˜ฏ๐˜ต๐˜ฆ๐˜ฏ๐˜ด๐˜ช๐˜ต๐˜บ ๐˜ฐ๐˜ง ๐˜ต๐˜ฉ๐˜ฆ ๐˜ง๐˜ฐ๐˜ณ๐˜ธ๐˜ข๐˜ณ๐˜ฅ ๐˜ฑ๐˜ข๐˜ด๐˜ด.

This yields Mixture-of-Depths (MoD), with spectacular results.

โœจ ๐—ฅ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€:
๐ŸŽš๏ธ ๐—–๐—ฎ๐—ฝ๐—ฎ๐—ฐ๐—ถ๐˜๐˜† ๐—ฐ๐—ฎ๐—ป ๐—ฏ๐—ฒ ๐˜๐˜‚๐—ป๐—ฒ๐—ฑ ๐—ฎ๐—น๐—น ๐˜๐—ต๐—ฒ ๐˜„๐—ฎ๐˜† ๐—ฑ๐—ผ๐˜„๐—ป ๐˜๐—ผ ๐Ÿญ๐Ÿฎ.๐Ÿฑ% for every second block: thus 87.5% of tokens just skip the block!
๐Ÿš€ For the same training time and performance, >๐Ÿฒ๐Ÿฌ% ๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐˜€๐—ฝ๐—ฒ๐—ฒ๐—ฑ!
๐Ÿค ๐—–๐—ฎ๐—ป ๐—ฏ๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฏ๐—ถ๐—ป๐—ฒ๐—ฑ ๐˜„๐—ถ๐˜๐—ต ๐— ๐—ถ๐˜…๐˜๐˜‚๐—ฟ๐—ฒ-๐—ผ๐—ณ-๐—˜๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜๐˜€ for further improvements.

๐Ÿ“„ ๐—ฃ๐—ฎ๐—ฝ๐—ฒ๐—ฟ ๐—ต๐—ฒ๐—ฟ๐—ฒ ๐Ÿ‘‰ Mixture-of-Depths: Dynamically allocating compute in transformer-based language models (2404.02258)
๐Ÿ“š I added it to my paper collection ๐Ÿ‘‰ m-ric/spinning-up-in-llms-659e698f9dd5a71bd3f579a7
  • 1 reply
ยท
reacted to akhaliq's post with ๐Ÿ‘ 12 months ago
view post
Post
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (2402.17764)

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.