Mohammed Hamdy's picture

Mohammed Hamdy

mmhamdy

·

AI & ML interests

TechBio | AI4Sci | NLP | Reinforcement Learning

Recent Activity

posted an update 7 days ago

What inspired the Transformer architecture in the "Attention Is All You Need" paper? And how were various ideas combined to create this groundbreaking model? In this lengthy article, I explore the story and the origins of some of the ideas introduced in the paper. We'll explore everything from the fundamental attention mechanism that lies at its heart to the surprisingly simple explanation for its name, Transformer. 💡 Examples of ideas explored in the article: ✅ What was the inspiration for the attention mechanism? ✅ How did we go from attention to self-attention? ✅ Did the team have any other names in mind for the model? and more... I aim to tell the story of Transformers as I would have wanted to read it, and hopefully, one that appeals to others interested in the details of this fascinating idea. This narrative draws from video interviews, lectures, articles, tweets/Xs, and some digging into the literature. I have done my best to be accurate, but errors are possible. If you find inaccuracies or have any additions, please do reach out, and I will gladly make the necessary updates. Read the article: https://huggingface.co/blog/mmhamdy/pandemonium-the-transformers-story

published an article 7 days ago

Pandemonium: The Transformers Story

published an article 11 days ago

Osirian AI: A Call For The Resurrection And Reuse Of Deep Learning Models.

View all activity

Organizations

mmhamdy's activity

upvoted a paper 27 days ago

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published 30 days ago • 113

upvoted an article about 1 month ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

Mar 4

• 72

upvoted 2 collections about 1 month ago

C4AI Aya Vision

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated Mar 4 • 68

CHASE

Generate challenging synthetic data to evaluate LLMs • 5 items • Updated Feb 21 • 4

upvoted a paper about 1 month ago

How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20 • 17

upvoted a collection about 1 month ago

Reasoning Datasets

38 items • Updated 1 day ago • 3

upvoted 2 papers about 1 month ago

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 33

From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Paper • 2502.13791 • Published Feb 19 • 5

upvoted a paper 2 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 118

upvoted 7 papers 3 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 98

METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring

Paper • 2501.02045 • Published Jan 3 • 21

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Paper • 2501.01895 • Published Jan 3 • 55

LiveBench: A Challenging, Contamination-Free LLM Benchmark

Paper • 2406.19314 • Published Jun 27, 2024 • 23

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 82

PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion

Paper • 2412.17780 • Published Dec 23, 2024 • 4

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 9

upvoted a collection 4 months ago

Falcon3

Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. • 40 items • Updated Feb 13 • 84

upvoted a paper 4 months ago

The Future of Open Human Feedback

Paper • 2408.16961 • Published Aug 15, 2024 • 21