attentionmech

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

lmstudio-community/Mistral-Small-Instruct-2409-GGUF

liked a model 1 day ago

google/gemma-3-1b-it

liked a model 2 days ago

mistralai/Mistral-Small-24B-Instruct-2501

View all activity

Organizations

attentionmech's activity

liked 2 models 1 day ago

lmstudio-community/Mistral-Small-Instruct-2409-GGUF

Text Generation • Updated Sep 17, 2024 • 155 • 22

google/gemma-3-1b-it

Text Generation • Updated 7 days ago • 97.2k • 185

liked 2 models 2 days ago

mistralai/Mistral-Small-24B-Instruct-2501

Text Generation • Updated Feb 2 • 221k • • 880

mlabonne/gemma-3-27b-it-abliterated

Image-Text-to-Text • Updated 1 day ago • 336 • 72

reacted to mlabonne's post with 👍 2 days ago

Post

5365

✂️ Gemma 3 Abliterated

I noticed that Gemma 3 was much more resilient to refusal removal than other models like Qwen 2.5.

I experimented with different recipes and improved the abliteration technique I wrote about last year.

It's still experimental but the refusal rate is super low in my tests. Enjoy!

mlabonne/gemma-3-4b-it-abliterated
mlabonne/gemma-3-12b-it-abliterated
mlabonne/gemma-3-27b-it-abliterated

reacted to joaogante's post with 🤗 2 days ago

Post

3396

New sampling strategy dropped in 🤗 transformers -- Min P sampling 🔥

Are you tired of having top_k arbitrarily discarding high-quality continuations? Or top_p forgetting to exclude low-probability tokens, derailing your generation? Try out the new min_p flag in generate, fresh from a PR merged today! 🥬

Min P consists of a dynamic token filter -- as opposed to Top K, which keeps the K most likely tokens, and Top P, which keeps the most likely tokens up to a fixed cumulative probability, both static filters. Min P takes a base probability (defined in the min_p flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy?
👉 High probability token present -> aggressive filter (we don't want to miss on that high-probability case and risk derailing generation)
👉 No high probability token present -> relaxed filter (there are many continuation possibilities that the model finds plausible)

You should set min_p to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1.

Kudos to @kalomaze and @menhguin for creating this technique 🔥 Read their discussion in the original issue for benchmarks (https://github.com/huggingface/transformers/issues/27670)

Copy-pasteable version of the example in the image below here: https://pastebin.com/VqXNtuxd

Have fun experimenting! 😎

liked 2 models 3 days ago

unsloth/Llama-3.2-3B

Text Generation • Updated Jan 23 • 29.7k • • 12

meta-llama/Llama-2-7b-hf

Text Generation • Updated Apr 17, 2024 • 893k • 1.98k

liked a model 4 days ago

senstella/csm-1b-mlx

Updated 5 days ago • 11

reacted to mlabonne's post with 🤗 5 days ago

Post

6244

🆕 LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course