Cross's picture

Cross

dillfrescott

AI & ML interests

AI, anime, computers

Recent Activity

reacted to m-ric's post with πŸ‘ about 16 hours ago
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: π—ͺ𝗲𝗹𝗰𝗼𝗺𝗲 π— π—Όπ—±π—²π—Ώπ—»π—•π—˜π—₯𝗧! πŸ€— We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models. The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs). It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs. Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub. ➑️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT. π—§π—Ÿ;𝗗π—₯: πŸ›οΈ Architecture changes: β‡’ First, standard modernizations: - Rotary positional embeddings (RoPE) - Replace GeLU with GeGLU, - Use Flash Attention 2 ✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead. πŸ₯‡ As a result, the model tops the game of encoder models: It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster! Read the blog post πŸ‘‰ https://huggingface.co/blog/modernbert
reacted to m-ric's post with πŸ”₯ about 16 hours ago
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: π—ͺ𝗲𝗹𝗰𝗼𝗺𝗲 π— π—Όπ—±π—²π—Ώπ—»π—•π—˜π—₯𝗧! πŸ€— We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models. The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs). It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs. Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub. ➑️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT. π—§π—Ÿ;𝗗π—₯: πŸ›οΈ Architecture changes: β‡’ First, standard modernizations: - Rotary positional embeddings (RoPE) - Replace GeLU with GeGLU, - Use Flash Attention 2 ✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead. πŸ₯‡ As a result, the model tops the game of encoder models: It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster! Read the blog post πŸ‘‰ https://huggingface.co/blog/modernbert
View all activity

Organizations

The Waifu Research Department's profile picture

dillfrescott's activity

New activity in fishaudio/fish-speech-1.5 about 4 hours ago
New activity in nvidia/Hymba-1.5B-Instruct 28 days ago
New activity in marcop/musika_ae about 2 months ago

Question

#1 opened about 2 months ago by
dillfrescott
New activity in deepseek-ai/DeepSeek-V2.5 3 months ago

Awesome model

6
#5 opened 3 months ago by
dillfrescott
New activity in mattshumer/Reflection-Llama-3.1-70B 3 months ago
New activity in stabilityai/stable-audio-open-1.0 7 months ago

Congrats!

15
#1 opened 7 months ago by
aaronday3
New activity in uthree/tinyvc 7 months ago

Question

#1 opened 7 months ago by
dillfrescott
New activity in jondurbin/airoboros-70b-3.3 7 months ago

Question

2
#1 opened 8 months ago by
dillfrescott
New activity in lllyasviel/IC-Light 8 months ago

custom foreground

1
#3 opened 8 months ago by
molo322

error loading model

2
#1 opened 8 months ago by
vikasrij
New activity in bartowski/Llama-3-ChatQA-1.5-8B-GGUF 8 months ago

Question

3
#1 opened 8 months ago by
dillfrescott
New activity in alpindale/WizardLM-2-8x22B 8 months ago
New activity in mistral-community/Mistral-7B-v0.2 9 months ago

8x22B?

6
#4 opened 9 months ago by
saishf