Natalie H

xinnn63

AI & ML interests

None yet

Recent Activity

liked a dataset 26 days ago

fka/awesome-chatgpt-prompts

liked a dataset 26 days ago

nvidia/Llama-Nemotron-Post-Training-Dataset

reacted to etemiz's post with 👀 26 days ago

Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some. Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :) Has anyone tried parallel QLoRa and merge before? I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training". I want to release some cool stuff when I have the time: - how an answer to a single question changes over time, with each training round or day - a chart to show AHA alignment over training rounds

View all activity

Organizations

None yet

xinnn63's activity

liked 2 datasets 26 days ago

fka/awesome-chatgpt-prompts

Viewer • Updated Jan 6 • 203 • 10.9k • 7.71k

nvidia/Llama-Nemotron-Post-Training-Dataset

Viewer • Updated 6 days ago • 3.91M • 6.73k • 421

reacted to etemiz's post with 👀 26 days ago

Post

1703

Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some.

Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :)

Has anyone tried parallel QLoRa and merge before?

I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training".

I want to release some cool stuff when I have the time:
- how an answer to a single question changes over time, with each training round or day
- a chart to show AHA alignment over training rounds

3 replies

reacted to fdaudens's post with 🔥 26 days ago

Post

2010

🔊 Meet Orpheus: A breakthrough open-source TTS model that matches human-level speech with empathy & emotion.
- Available in 4 sizes (150M-3B parameters)
- delivers ultra-fast streaming
- zero-shot voice cloning.
- Apache 2.0 license

canopylabs/orpheus-tts-67d9ea3f6c05a941c06ad9d2

1 reply

reacted to MonsterMMORPG's post with 🚀 26 days ago

Post

1458

Extending Wan 2.1 generated video - First 14b 720p text to video, then using last frame automatically to to generate a video with 14b 720p image to video - with RIFE 32 FPS 10 second 1280x720p video

Our app has this fully automated : https://www.patreon.com/posts/123105403

Here how it works image : https://ibb.co/b582z3R6

Workflow is easy

Use your favorite app to generate initial video.

Get last frame

Give last frame to image to video model - with matching model and resolution

Generate

And merge

Then use MMAudio to add sound

I made it automated in my Wan 2.1 app but can be made with ComfyUI easily as well . I can extend as many as times i want :)

Here initial video

Prompt: Close-up shot of a Roman gladiator, wearing a leather loincloth and armored gloves, standing confidently with a determined expression, holding a sword and shield. The lighting highlights his muscular build and the textures of his worn armor.

Negative Prompt: Overexposure, static, blurred details, subtitles, paintings, pictures, still, overall gray, worst quality, low quality, JPEG compression residue, ugly, mutilated, redundant fingers, poorly painted hands, poorly painted faces, deformed, disfigured, deformed limbs, fused fingers, cluttered background, three legs, a lot of people in the background, upside down

Used Model: WAN 2.1 14B Text-to-Video

Number of Inference Steps: 20

CFG Scale: 6

Sigma Shift: 10

Seed: 224866642

Number of Frames: 81

Denoising Strength: N/A

LoRA Model: None

TeaCache Enabled: True

TeaCache L1 Threshold: 0.15

TeaCache Model ID: Wan2.1-T2V-14B

Precision: BF16

Auto Crop: Enabled

Final Resolution: 1280x720

Generation Duration: 770.66 seconds

5 replies

reacted to ginipick's post with 🔥 26 days ago

Post

4899

🌈✨ FLUX 'Every Text Imaginator'
Multilingual Text-Driven Image Generation and Editing

Demo: ginigen/Every-Text

📝 What is FLUX Text Imaginator?
FLUX Text Imaginator is an innovative tool that leverages cutting-edge FLUX diffusion models to create and edit images with perfectly integrated multilingual text. Unlike other image generation models, FLUX possesses exceptional capability to naturally incorporate text in various languages including Korean, English, Chinese, Japanese, Russian, French, Spanish and more into images!

✨ FLUX's Multilingual Text Processing Strengths

🔤 Superior Multilingual Text Rendering: FLUX renders text with amazing accuracy, including non-English languages and special characters
🇰🇷 Perfect Korean Language Support: Accurately represents complex Korean combined characters
🈶 Excellent East Asian Language Handling: Naturally expresses complex Chinese characters and Japanese text
🔍 Sophisticated Text Placement: Precise text positioning using <text1>, <text2>, <text3> placeholders
🎭 Diverse Text Styles: Text representation in various styles including handwriting, neon, signage, billboards, and more
🔄 Automatic Translation Feature: Korean prompts are automatically translated to English for optimal results

🚀 How It Works

Text Generation Mode:

Enter your prompt (with optional text placeholders)
Specify your desired text in any language
Generate high-quality images with naturally integrated text using FLUX's powerful multilingual processing capabilities
Get two different versions of your image for each generation

Image Editing Mode:

Upload any image
Add editing instructions
Specify new text to add or replace (multilingual support)
Create naturally edited images with FLUX's sophisticated text processing abilities

💻 Technical Details
FLUX's Core Technologies:
-Text-Aware Diffusion Model
-Multilingual Processing Engine
-Korean-English Translation Pipeline
-Optimized Pipeline

2 replies

reacted to kalkey's post with 👀 26 days ago

Post

466

we are using the hugging face pro model but all of a sudden we are getting this below error "Error: "The model meta-llama/Llama-3.2-11B-Vision-Instruct is too large to be loaded automatically (21GB > 10GB)." need help, please

4 replies

reacted to chansung's post with ❤️ 26 days ago

Post

2573

Mistral AI Small 3.1 24B is not only commercial free but also the best model in a single GPU deployment.

I packed up all the information you need to know in a single picture. Hope this helps! :)

1 reply

reacted to MohamedRashad's post with 👀 26 days ago

Post

2106

For those interested in trying the new canopylabs/orpheus-3b-0.1-ft model i made a space for you:

MohamedRashad/Orpheus-TTS

4 replies

reacted to sharpenb's post with 🔥 26 days ago

Post

3087

We open-sourced the pruna package that can be easily installed with pip install pruna :) It allows to easily ccompress and evaluate AI models including transformers and diffusers.

- Github repo: https://github.com/PrunaAI/pruna
- Documentation: https://docs.pruna.ai/en/stable/index.html

With open-sourcing, people can now inspect and contribute to the open code. Beyond the code, we provide detailed readme, tutorials, benchmarks, and documentation to make transparent compression, evaluation, and saving/loading/serving of AI models.

Happy to share it with you and always interested in collecting your feedback :)

2 replies

reacted to daavoo's post with 🔥 26 days ago

Post

2014

🤖 🗺Mapped all(?) the swimming pools ️🏊 around another town with https://github.com/mozilla-ai/osm-ai-helper.

This time, I have mapped and contributed to https://www.openstreetmap.org more than 100 swimming pools around my wife's hometown. Only took about 20min to find them all (+~3 min verification) in a free Colab GPU🚀

Try it yourself around a single point: mozilla-ai/osm-ai-helper

reacted to Akjava's post with 👍 26 days ago

Post

505

I've shared Hugging Face Spaces for CPU-based RAG and T5/Flan-T5 models. The smolagents-rag space sometimes produces high-quality answers, but it can be slow. Qwen2.5-0.5B is as fast as a CPU implementation and generates answers of acceptable quality. I've found that Gemma3-4B produces significantly more stable answers than the 1B version.

Rag
Akjava/Gemma3-4B-llamacpp-cpu-rag-smolagents
Akjava/Qwen2.5-0.5B-Rag-Thinking-Flan-T5

t5/flan-t5
Akjava/llamacpp-flan-t5-large-grammar-synthesis
Akjava/llamacpp-madlad400-3b-mt-2jp

Huggingface Free CPU Limitations
When duplicating a space, the build process(llama-cpp-python) can occasionally become stuck, requiring a manual restart to finish.
Spaces may unexpectedly stop functioning or even be deleted, leading to the need to rework them. Refer to issue for more information.

reacted to lbourdois's post with ❤️ 26 days ago

Post

2237

We introduce FAT5 (Flash Attention T5) ⚡

An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations.
The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE.
The result kernel is 2 times faster than a SPDA implementation.
We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer.

The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining.

All other optimizations are described in a 📝 subsequent blog post available on @huggingface 🤗: CATIE-AQ/FAT5-report.

This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ €1,900) and a low carbon footprint (13.5kg eq CO2).

The model's weights are also available on Hugging Face: CATIE-AQ/FAT5-small.
Not very useful in practice, it's a PoC and not an instructed model (it's planned for later).

All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific domain: https://github.com/catie-aq/flashT5 ⭐

Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.

reacted to clem's post with 👀 26 days ago

Post

3718

Should we assemble affordable open-source robots at Hugging Face for the community. Would you buy them? At what price?

8 replies

reacted to onekq's post with 👀 26 days ago

Post

1575

I like to benchmark 💵o1-pro💵 but it is way too expensive for me 🤦‍♂️

4 replies

reacted to prithivMLmods's post with 🔥 26 days ago

Post

2312

Play with Orpheus TTS, a Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been fine-tuned to deliver human-level speech synthesis 🔥🗣️

👉GitHub [ Demo ] : https://github.com/PRITHIVSAKTHIUR/Orpheus-TTS-Edge

Demo supporting both text-to-speech and text-to-llm responses in speech.

> voice: tara, dan, emma, josh
> emotion: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>.

🥠Orpheus-3b-0.1-ft
Model Page: canopylabs/orpheus-3b-0.1-ft

🥠Orpheus-3b-0.1-ft
Colab Inference Notebook: https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k-pQomz3N?usp=sharing

🥠Finetune [ orpheus-3b-0.1-pretrained ]
Resource: https://github.com/canopyai/Orpheus-TTS/tree/main/finetune

🥠Model-releases:
https://canopylabs.ai/model-releases

1 reply

reacted to gtvracer's post with 👀 26 days ago

Post

463

I'm getting this all of a sudden, even generated a new token but still get a 401. anyone else seeing this?
Exception:401 Client Error: Unauthorized for url: https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.2-3B-Instruct/v1/chat/completions (Request ID: Root=1-67dc6b20-3a4697761ad9315c06ca928a;d914bcf1-063a-4df2-acc2-8e0170ddccb3)

4 replies

reacted to AdinaY's post with 👍 26 days ago

Post

2131

FlexWorld 🔥 an open framework that generates 3D scenes from a single image!

Model: GSAI-ML/FlexWorld
Paper: FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis (2503.13265)

✨ 360° rotation & zooming
✨ High quality novel views powered by video-to-video diffusion model
✨ Progressive 3D expansion

2 replies

reacted to ShihuaHuang's post with 👀 26 days ago

Post

1446

Our work, DEIM, is available on HF: https://huggingface.co/papers/2412.04234. The SoTA COCO real-time object detector.

reacted to Jaward's post with 👀 26 days ago

Post

1764

Finally, the ground truth / AlexNet’s original source code is available to all.
Context: AlexNet had a historic win in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), reducing error rate from 26% (previous best) to 15.3%. It’s a deep CNN with 8 layers (5 convolutional + 3 fully connected), pioneering the use of ReLU activations for faster training, dropout for regularization, and GPU acceleration for large-scale learning. This moment marked the beginning of the deep learning revolution, inspiring architectures like VGG, ResNet, and modern transformers.
Code: https://github.com/computerhistory/AlexNet-Source-Code