minjejeon (Minje Jeon)

liked a Space 16 days ago

5.32k

MTEB Leaderboard

🥇

Embedding Leaderboard

liked a model 25 days ago

unsloth/Qwen2.5-Coder-7B-bnb-4bit

Text Generation • Updated Nov 12, 2024 • 5.03k • 7

liked a dataset 25 days ago

qwopqwop/ko-arena-hard-auto-v0.1

Viewer • Updated 11 days ago • 500 • 228 • 16

liked 2 models about 2 months ago

unsloth/Qwen2.5-Coder-14B-bnb-4bit

Text Generation • Updated Nov 12, 2024 • 2.78k • 5

Qwen/Qwen2.5-Coder-32B-Instruct-AWQ

Text Generation • Updated Nov 18, 2024 • 73.8k • 24

liked a model 4 months ago

LGAI-EXAONE/EXAONE-3.5-32B-Instruct-AWQ

Text Generation • Updated Dec 11, 2024 • 1.1k • 16

liked a dataset 4 months ago

maywell/ko-calibration

Viewer • Updated Jan 22, 2024 • 38.8k • 48 • 3

liked 2 models 8 months ago

KISTI-KONI/KONI-Llama3-8B-Instruct-20240729

Text Generation • Updated Nov 14, 2024 • 306 • 36

BAAI/bge-multilingual-gemma2

Feature Extraction • Updated Jul 31, 2024 • 138k • 176

liked 2 models 9 months ago

meta-llama/Llama-3.1-8B-Instruct

Text Generation • Updated Sep 25, 2024 • 6M • • 3.81k

BAAI/bge-m3

liked 2 models 10 months ago

bartowski/DeepSeek-Coder-V2-Lite-Base-GGUF

Text Generation • Updated Jun 18, 2024 • 1.63k • 6

mistralai/Codestral-22B-v0.1

Text Generation • Updated Jul 31, 2024 • 11.7k • 1.24k

liked a Space 11 months ago

51

Yi 1.5 34B Chat

🚀

reacted to m-ric's post with ❤️ about 1 year ago

Post

2072

[𝐍𝐞𝐰 𝐏𝐚𝐩𝐞𝐫] 𝐀𝐥𝐥 𝐭𝐨𝐤𝐞𝐧𝐬 𝐬𝐡𝐨𝐮𝐥𝐝 𝐧𝐨𝐭 𝐫𝐞𝐪𝐮𝐢𝐫𝐞 𝐭𝐡𝐞 𝐬𝐚𝐦𝐞 𝐞𝐟𝐟𝐨𝐫𝐭 𝐭𝐨 𝐜𝐨𝐦𝐩𝐮𝐭𝐞! ⇒ 𝐌𝐢𝐱𝐭𝐮𝐫𝐞 𝐨𝐟 𝐝𝐞𝐩𝐭𝐡𝐬 🫧🐠

Google Researchers were unhappy with the way current decoding generally works: all tokens go through the same layers, thus requiring exactly the same effort to compute.

Whereas in reality, completing the answer to a difficult math problem for instance should be more computationally intense than completing the text of the Declaration of Independence: 𝗻𝗼𝘁 𝗮𝗹𝗹 𝘁𝗼𝗸𝗲𝗻𝘀 𝗮𝗿𝗲 𝗰𝗿𝗲𝗮𝘁𝗲𝗱 𝗲𝗾𝘂𝗮𝗹!

➡️ 𝗧𝗵𝗲𝘆 𝗵𝗮𝗱 𝘁𝗵𝗶𝘀 𝗴𝗲𝗻𝗶𝘂𝘀 𝗶𝗱𝗲𝗮: 💡 𝗵𝗮𝘃𝗶𝗻𝗴 𝗮 𝘁𝗼𝗸𝗲𝗻 𝗴𝗼 𝘁𝗵𝗿𝗼𝘂𝗴𝗵 𝗮 𝗯𝗹𝗼𝗰𝗸 𝘀𝗵𝗼𝘂𝗹𝗱 𝗯𝗲 𝗼𝗽𝘁𝗶𝗼𝗻𝗮𝗹. The token can go through the block (thus undergoing expensive self-attention computation) or avoid it through a skip connection.
The routing decision is taken on the block level: each block selects from the total sequence the top-k tokens that will go through it, and the others tokens will skip it. 𝘛𝘩𝘪𝘴 𝘢𝘭𝘭𝘰𝘸𝘴 𝘵𝘰 𝘤𝘩𝘰𝘰𝘴𝘦 𝘵𝘩𝘦 𝘦𝘹𝘢𝘤𝘵 𝙘𝙖𝙥𝙖𝙘𝙞𝙩𝙮 𝘰𝘧 𝘢 𝘣𝘭𝘰𝘤𝘬, 𝘪.𝘦. 𝘵𝘩𝘦 𝘱𝘳𝘰𝘱𝘰𝘳𝘵𝘪𝘰𝘯 𝘰𝘧 𝘵𝘰𝘬𝘦𝘯𝘴 𝘵𝘩𝘢𝘵 𝘨𝘰 𝘵𝘩𝘳𝘰𝘶𝘨𝘩 𝘪𝘵, 𝘸𝘩𝘪𝘤𝘩 𝘥𝘪𝘳𝘦𝘤𝘵𝘭𝘺 𝘪𝘯𝘧𝘭𝘶𝘦𝘯𝘤𝘦𝘴 𝘵𝘩𝘦 𝘤𝘰𝘮𝘱𝘶𝘵𝘢𝘵𝘪𝘰𝘯𝘢𝘭 𝘪𝘯𝘵𝘦𝘯𝘴𝘪𝘵𝘺 𝘰𝘧 𝘵𝘩𝘦 𝘧𝘰𝘳𝘸𝘢𝘳𝘥 𝘱𝘢𝘴𝘴.

This yields Mixture-of-Depths (MoD), with spectacular results.

✨ 𝗥𝗲𝘀𝘂𝗹𝘁𝘀:
🎚️ 𝗖𝗮𝗽𝗮𝗰𝗶𝘁𝘆 𝗰𝗮𝗻 𝗯𝗲 𝘁𝘂𝗻𝗲𝗱 𝗮𝗹𝗹 𝘁𝗵𝗲 𝘄𝗮𝘆 𝗱𝗼𝘄𝗻 𝘁𝗼 𝟭𝟮.𝟱% for every second block: thus 87.5% of tokens just skip the block!
🚀 For the same training time and performance, >𝟲𝟬% 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝘀𝗽𝗲𝗲𝗱!
🤝 𝗖𝗮𝗻 𝗯𝗲 𝗰𝗼𝗺𝗯𝗶𝗻𝗲𝗱 𝘄𝗶𝘁𝗵 𝗠𝗶𝘅𝘁𝘂𝗿𝗲-𝗼𝗳-𝗘𝘅𝗽𝗲𝗿𝘁𝘀 for further improvements.

📄 𝗣𝗮𝗽𝗲𝗿 𝗵𝗲𝗿𝗲 👉 Mixture-of-Depths: Dynamically allocating compute in transformer-based language models (2404.02258)
📚 I added it to my paper collection 👉 m-ric/spinning-up-in-llms-659e698f9dd5a71bd3f579a7

1 reply

·

liked a Space about 1 year ago

1.3k

C4AI Command Models

🌟

Start a chat to get answers and explanations from a language model

liked a model about 1 year ago

openchat/openchat-3.5-0106-gemma

Text Generation • Updated May 18, 2024 • 1.96k • 57

liked 2 Spaces about 1 year ago

153

Top Contributors To Follow

🔔

Meet the most impactful users on Hugging Face

1.23k

Big Code Models Leaderboard

📈

Submit code models for evaluation on benchmarks

reacted to akhaliq's post with 👍 about 1 year ago

Post

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (2402.17764)

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

Minje Jeon

AI & ML interests

Recent Activity

Organizations

minjejeon's activity

MTEB Leaderboard

unsloth/Qwen2.5-Coder-7B-bnb-4bit

qwopqwop/ko-arena-hard-auto-v0.1

unsloth/Qwen2.5-Coder-14B-bnb-4bit

Qwen/Qwen2.5-Coder-32B-Instruct-AWQ

LGAI-EXAONE/EXAONE-3.5-32B-Instruct-AWQ

maywell/ko-calibration

KISTI-KONI/KONI-Llama3-8B-Instruct-20240729

BAAI/bge-multilingual-gemma2

meta-llama/Llama-3.1-8B-Instruct

BAAI/bge-m3

bartowski/DeepSeek-Coder-V2-Lite-Base-GGUF

mistralai/Codestral-22B-v0.1

Yi 1.5 34B Chat

C4AI Command Models

openchat/openchat-3.5-0106-gemma

Top Contributors To Follow

Big Code Models Leaderboard