Quazimoto's picture

Quazimoto PRO

Quazim0t0

AI & ML interests

the hunchback of huggingface 🔙 joined: 1-20-2025 🦥unsloth user 4️⃣ Phi User 🔨 ai hobbyist 📫 On Leaderboards Top 100-200 - Be back when HF decides to bring back features.

Recent Activity

updated a model 15 days ago
Quazim0t0/Qwn-Multi-GGUF
published a model 15 days ago
Quazim0t0/Qwn-Multi-GGUF
updated a model 15 days ago
Quazim0t0/Qwn-Multi
View all activity

Organizations

Seance Table's profile picture

Quazim0t0's activity

reacted to lbourdois's post with ❤️ 25 days ago
view post
Post
2212
We introduce FAT5 (Flash Attention T5) ⚡

An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations.
The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE.
The result kernel is 2 times faster than a SPDA implementation.
We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer.

The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining.

All other optimizations are described in a 📝 subsequent blog post available on @huggingface 🤗: CATIE-AQ/FAT5-report.

This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ €1,900) and a low carbon footprint (13.5kg eq CO2).

The model's weights are also available on Hugging Face: CATIE-AQ/FAT5-small.
Not very useful in practice, it's a PoC and not an instructed model (it's planned for later).

All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific domain: https://github.com/catie-aq/flashT5

Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.
reacted to AtAndDev's post with 👍 about 1 month ago
view post
Post
4203
There seems to multiple paid apps shared here that are based on models on hf, but some ppl sell their wrappers as "products" and promote them here. For a long time, hf was the best and only platform to do oss model stuff but with the recent AI website builders anyone can create a product (really crappy ones btw) and try to sell it with no contribution to oss stuff. Please dont do this, or try finetuning the models you use...
Sorry for filling yall feed with this bs but yk...
  • 6 replies
·
reacted to burtenshaw's post with 🤗 about 1 month ago
view post
Post
1938
everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!

1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running

git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft


plus this with --no-deps

git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly


2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.

4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.

from trl import GRPOConfig

training_args = GRPOConfig(
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 1,
    num_generations = 2,
    max_prompt_length = 256,
    max_completion_length = 1024 - 256,
    num_train_epochs = 1,
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
)


5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)


if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.

reasoning-course
  • 2 replies
·
posted an update about 1 month ago
view post
Post
1128
Thank you to the Open LLM Leaderboard's team for offering it to the community for as long as they did. I only recently joined HF, and it provided a lot of incentive and information to make better models.

Always will remember getting to #112 :D

Anyone have a solid way to test my models privately? Please let me know!

  • 1 reply
·
reacted to onekq's post with 👍 about 1 month ago
view post
Post
751
A bigger and harder pain point for reasoning model is to switch modes.

We now have powerful models capable of either system I thinking or system II thinking, but not both, much less switching between the two. But humans can do this quite easily.

ChatGPT and others push the burden to users to switch between models. I guess this is the best we have now.
  • 2 replies
·
reacted to AdinaY's post with 🔥 about 1 month ago
reacted to thomwolf's post with 🚀 about 1 month ago
view post
Post
2811
We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions
reacted to Lunzima's post with 🚀 about 1 month ago
view post
Post
1312
I'm currently experimenting with the SFT dataset Lunzima/alpaca_like_dataset to further boost the performance of NQLSG-Qwen2.5-14B-MegaFusion-v9.x. This includes data sourced from DeepSeek-R1 or other cleaned results (excluding CoTs). Additionally, datasets that could potentially enhance the model's performance in math and programming/code, as well as those dedicated to specific uses like Swahili, are part of the mix.
@sometimesanotion @sthenno @wanlige
  • 1 reply
·
reacted to awacke1's post with 🚀 about 1 month ago
view post
Post
2270
I introduce MIT license

ML Model Specialize Fine Tuner app "SFT Tiny Titans" 🚀

Demo video with source.

Download, train, SFT, and test your models, easy as 1-2-3!
URL: awacke1/TorchTransformers-NLP-CV-SFT
  • 2 replies
·
reacted to BrigitteTousi's post with 🚀 about 1 month ago
reacted to sandhawalia's post with 🔥 about 1 month ago
view post
Post
1916
LeRobot goes to driving school. World's largest open-source self driving dataset. Ready for end-to-end learning with LeRobot.

3 years, 30 German cities, 60 driving instructors and students. https://huggingface.co/blog/lerobot-goes-to-driving-school

Coming this summer — LeRobot driver.
reacted to fdaudens's post with 🤗 about 1 month ago
view post
Post
5737
Honored to be named among their 12 pioneers and power players in the news industry in the 2025 Tech Trends Report from Future Today Strategy Group.

Incredible group to be part of - each person is doing groundbreaking work at the intersection of AI and journalism. Worth following them all: they're consistently sharing practical insights on building the future of news.

Take the time to read this report, it's packed with insights as always. The news & information section's #1 insight hits hard: "The most substantive economic impact of AI to date has been licensing payouts for a handful of big publishers. The competition will start shifting in the year ahead to separate AI 'haves' that have positioned themselves to grow from the 'have-nots.'"

This AI-driven divide is something I've been really concerned about. Now is the time to build more than ever!

👉 Full report here: https://ftsg.com/wp-content/uploads/2025/03/FTSG_2025_TR_FINAL_LINKED.pdf
  • 2 replies
·
replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
603
Update to the Imagine side-project.
Just uploaded the 16Bit & Q4

Samples: (Used a base Microsoft Phi4 model)
*You may experience bugs with either the model or the Open WebUI function*
Open WebUI function: https://openwebui.com/f/quaz93/imagine_phi
Quazim0t0/Imagine-v0.5-16bit - Haven't tested
Quazim0t0/ImagineTest-v0.5-GGUF - Tested (Pictures)

Dataset: Quazim0t0/Amanita-Imagine
Small Dataset of 500+ entries, still working on it here and there when I can.
Pictures use the Open Web UI function I provided.
  • 1 reply
·
replied to their post about 1 month ago
view reply

I'll try to get that out for you when I get a chance.

reacted to AdinaY's post with 🔥🚀 about 1 month ago
reacted to ZennyKenny's post with 👍 about 2 months ago
view post
Post
1893
I've spent most of time working with AI on user-facing apps like Chatbots and TextGen, but today I decided to work on something that I think has a lot of applications for Data Science teams: ZennyKenny/comment_classification

This Space supports uploading a user CSV and categorizing the fields based on user-defined categories. The applications of AI in production are truly endless. 🚀
posted an update about 2 months ago
view post
Post
2240
Debugging Tags:
Imagine, Associated Thoughts, Dialectical Analysis, Backwards Induction, Metacognition, and Normal Thought Processes such as <think> or <begin_of_thought>

Edit: Uploaded new images w/ a Open WebUI function to organize the tags.
Open WebUI Function: https://openwebui.com/f/quaz93/imagine_phi

This Phi-4 model is part of a test project that I called Micro-Dose. My goal was to use a small dataset to activate reasoning and other cognitive processes without relying on a large dataset.

I found that this was possible with a tiny dataset of just 90 rows, specifically designed as math problems. In the initial iterations, the dataset only activated reasoning when a math-related question was asked. I then made a few changes to the dataset’s structure, including the order of information and the naming of tags. You can see the sample results in the pictures. Not really anything special, just thought I'd share.

Tweaked the dataset a bit:
Quazim0t0/Imagine-Phi-v0.2-GGUF
Quazim0t0/MicroDoseV0.2


First image shows the new tags, second shows the regular thought process and the third is the model in combination with web searches
 
  • 2 replies
·
reacted to lingvanex-mt's post with 🔥 about 2 months ago
view post
Post
3511
Dear HF Community!

Our company open-sourced machine translation models for 12 rare languages under MIT license.

You can use them freely with OpenNMT translation framework. Each model is about 110 mb and has an excellent performance, ( about 40000 characters / s on Nvidia RTX 3090 )

Download models there

@lingvanex

You can test translation quality there:

https://lingvanex.com/translate/