14 11 56

Davide Cifarelli

davide221

davide221

AI & ML interests

codeLLM, reasoning, architecture optimization (sparsity/quantization) & interpretability

Recent Activity

liked a model about 1 month ago

microsoft/OmniParser

View all activity

Organizations

davide221's activity

liked a model about 1 month ago

microsoft/OmniParser

Image-Text-to-Text • Updated 1 day ago • 10.5k • 1.41k

liked 2 models 3 months ago

mattshumer/Reflection-Llama-3.1-70B

Text Generation • Updated Sep 24 • 1.97k • 1.71k

NousResearch/Hermes-2-Pro-Llama-3-70B

Text Generation • Updated Sep 8 • 262 • 31

liked a dataset 3 months ago

AI-MO/NuminaMath-CoT

Viewer • Updated 9 days ago • 860k • 3.15k • 253

liked 2 models 4 months ago

black-forest-labs/FLUX.1-dev

Text-to-Image • Updated Aug 16 • 1.36M • • 6.9k

minimaxir/sdxl-wrong-lora

Text-to-Image • Updated Aug 24, 2023 • 5.19k • • 119

updated a dataset 4 months ago

davide221/stealth_jailbreak

Viewer • Updated Jul 25 • 1k • 11

upvoted an article 5 months ago

Article

The Rise of Agentic Data Generation

•

Jul 15

• 78

reacted to yushun0410's post with 🔥 5 months ago

Post

4613

Hi Huggingfacers!

Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training.

The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers.

Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks!

Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793

Code: https://github.com/zyushun/Adam-mini