2 5 15

wenyang

notoookay

AI & ML interests

NLP, RL

Recent Activity

liked a dataset about 12 hours ago

BAAI/CCI3-HQ

replied to singhsidhukuldeep's post about 12 hours ago

Exciting breakthrough in AI: @Meta's new Byte Latent Transformer (BLT) revolutionizes language models by eliminating tokenization! The BLT architecture introduces a groundbreaking approach that processes raw bytes instead of tokens, achieving state-of-the-art performance while being more efficient and robust. Here's what makes it special: >> Key Innovations Dynamic Patching: BLT groups bytes into variable-sized patches based on entropy, allocating more compute power where the data is more complex. This results in up to 50% fewer FLOPs during inference compared to traditional token-based models. Three-Component Architecture: • Lightweight Local Encoder that converts bytes to patch representations • Powerful Global Latent Transformer that processes patches • Local Decoder that converts patches back to bytes >> Technical Advantages • Matches performance of Llama 3 at 8B parameters while being more efficient • Superior handling of non-English languages and rare character sequences • Remarkable 99.9% accuracy on spelling tasks • Better scaling properties than token-based models >> Under the Hood The system uses an entropy model to determine patch boundaries, cross-attention mechanisms for information flow, and hash n-gram embeddings for improved representation. The architecture allows simultaneous scaling of both patch and model size while maintaining fixed inference costs. This is a game-changer for multilingual AI and could reshape how we build future language models. Excited to see how this technology evolves!

liked a Space 3 days ago

data-agents/jupyter-agent

View all activity

Organizations

notoookay's activity

upvoted an article 2 months ago

Article

Accelerate 1.0.0

Sep 13

• 51

upvoted a collection 4 months ago

Tulu V2.5 Suite

Collection

A suite of models trained using DPO and PPO across a wide variety (up to 14) of preference datasets. See https://arxiv.org/abs/2406.09279 for more! • 44 items • Updated 26 days ago • 14

upvoted a collection 7 months ago

[lecture artifacts] aligning open language models

Collection

artifacts referenced in the talk timeline! Slides: https://docs.google.com/presentation/d/1quMyI4BAx4rvcDfk8jjv063bmHg4RxZd9mhQloXpMn0/edit?usp=sharin • 63 items • Updated Apr 17 • 56

upvoted an article 8 months ago

Article

Welcome Llama 3 - Meta's new open LLM

Apr 18

• 281

upvoted a paper 8 months ago

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 27