Aymeric Roucher

m-ric

AI & ML interests

MLE at Hugging Face šŸ¤— LLMs, Agents, RAG, Multimodal.

Articles

Organizations

Posts 9

view post
Post
2159
šššš©šžš« š‘šžšÆš¢šžš°: š‘š”šØ-šŸ - šƒšØ š§šØš­ š®š¬šž ššš„š„ š­šØš¤šžš§š¬ šžšŖš®ššš„š„š² š¢š§ š²šØš®š« š­š«ššš¢š§š¢š§š ! āš–ļøā›”ļø

A new paper topping Daily papers questions a hidden assumption in LLM training:

šŸ¤” š™Žš™š™¤š™Ŗš™”š™™ š™¬š™š š™§š™šš™–š™”š™”š™® š™Ŗš™Øš™š š™–š™”š™” š™©š™¤š™ š™šš™£š™Ø š™šš™¦š™Ŗš™–š™”š™”š™® š™žš™£ š™¤š™Ŗš™§ š™‡š™‡š™ˆ'š™Ø š™©š™§š™–š™žš™£š™žš™£š™œ ?

Some tokens are more relevant than others, and some are mostly noise (just look up the history of š˜šš˜°š˜­š˜Ŗš˜„š˜Žš˜°š˜­š˜„š˜”š˜¢š˜Øš˜Ŗš˜¬š˜¢š˜³š˜±).

So this paper introduces š—¦š—²š—¹š—²š—°š˜š—¶š˜ƒš—² š—Ÿš—®š—»š—“š˜‚š—®š—“š—² š— š—¼š—±š—²š—¹š—¶š—»š—“, which is actually really simple:
āž”ļø A specific metric measures the relevance of each token. Then during training, only the top k% tokens for this relevance metric count in the loss calculation.

Authors test this method by training models on the difficult MATH dataset (only competition mathematics problems).

āž”ļø Their technique seems like a new must-do in LLM training: Training is much faster and reaches an impressive performance!

š‘šžš¬š®š„š­š¬:
ā—† ā±ļø Training is x5 to x10 faster to reach equivalent performance compared to standard language modeling.
ā—† šŸ’Ŗ Their 1B model achieves close to GPT4 Chain-of-Thought performance on MATH!
ā—† šŸš€ Their 7B model match performance of the state-of-the-art DeepSeek for the same size, while trained on only 3% of tokens

š€ššš¢š­š¢šØš§ššš„ š¢š§š¬š¢š š”š­š¬ šŸ’”
ā—† Datasets used for pre-training, even after pre-filtering, still contain a large proportion of noisy tokens šŸ˜–
ā—† Authors show that when you reduce loss on noisy tokens, you actually reduce accuracy (Figure 7). So Selective Language Modeling seems fundamental! āœ…

Find great reads in @akhaliq 's Daily Papers šŸ‘‰ https://huggingface.co/papers
Paper added to my collection šŸ‘‰ m-ric/spinning-up-in-llms-659e698f9dd5a71bd3f579a7
view post
Post
2037
š—”š—²š˜„ š—¦š—½š—®š—°š—²: š˜¼š™„ š™š™§š™–š™«š™šš™” š™„š™”š™–š™£š™£š™šš™§ šŸ—ŗļøšŸ•ļø Plan your next vacation in a few minutes!

I wanted to try out if a powerful LLM like Mixtral-8x7b had geographical reasoning capabilities.
So I built a small space that prompts the LLM to provide a JSON list of places based on a user input.

And the result was impressive! šŸ¤Æ

ā‡’ š—œš˜ š˜€š—²š—²š—ŗš˜€ š—¹š—¶š—øš—² š— š—¶š˜…š˜š—暝—®š—¹ š—µš—®š˜€ š—® š—“š—暝—®š˜€š—½ š—¼š—³ š—“š—²š—¼š—“š—暝—®š—½š—µš—¶š—°š—®š—¹ š—°š—¼š—»š—°š—²š—½š˜š˜€ š—¹š—¶š—øš—² š—”š—¼š—暝˜š—µ - š—¦š—¼š˜‚š˜š—µ, š—¼š—æ š˜€š—½š—®š˜š—¶š—®š—¹ š—®š—¹š—¶š—“š—»š—ŗš—²š—»š˜.šŸ§­ Not just describing these concepts, but really applying them in practice, for instance to successfully answer "give me 4 European cities that are aligned on the map". This is a š—»š—¶š—°š—² š—²š˜…š—®š—ŗš—½š—¹š—² š—¼š—³ š—®š—» š—²š—ŗš—²š—暝—“š—²š—»š˜ š—°š—®š—½š—®š—Æš—¶š—¹š—¶š˜š˜†, since nothing in the LLM's training data should prepare it for this specific task.

Anyway, I added API calls and a nice visualization on top of the LLM, streaming output, caching for the answers and locations... and ta-da! āœØ I got the š—”š—œ š—§š—暝—®š˜ƒš—²š—¹ š—£š—¹š—®š—»š—»š—²š—æ.

š™”š™¤š™Ŗ š™˜š™–š™£ š™™š™šš™Øš™˜š™§š™žš™—š™š š™žš™© š™®š™¤š™Ŗš™§ š™©š™§š™žš™„, š™–š™£š™™ š™žš™© š™¬š™žš™”š™” š™˜š™¤š™¢š™š š™Ŗš™„ š™¬š™žš™©š™ š™£š™žš™˜š™š š™–š™£š™™ š™˜š™¤š™£š™«š™šš™£š™žš™šš™£š™© š™”š™¤š™˜š™–š™©š™žš™¤š™£š™Ø!

š™š™§š™® š™žš™© š™š™šš™§š™š šŸ‘‰ m-ric/ai-travel-planner

Thank you @freddyaboulton for the šššš›ššŠšššš’šš˜_šššš˜šš•šš’ššžšš– component, and @clem , @pngwn , @abidlabs for your ideas and support!

models

None public yet