Natalie H's picture
3

Natalie H

xinnn63
Β·

AI & ML interests

None yet

Recent Activity

Organizations

None yet

xinnn63's activity

reacted to etemiz's post with πŸ‘€ 26 days ago
view post
Post
1703
Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some.

Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :)

Has anyone tried parallel QLoRa and merge before?

I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training".

I want to release some cool stuff when I have the time:
- how an answer to a single question changes over time, with each training round or day
- a chart to show AHA alignment over training rounds
  • 3 replies
Β·
reacted to fdaudens's post with πŸ”₯ 26 days ago
view post
Post
2010
πŸ”Š Meet Orpheus: A breakthrough open-source TTS model that matches human-level speech with empathy & emotion.
- Available in 4 sizes (150M-3B parameters)
- delivers ultra-fast streaming
- zero-shot voice cloning.
- Apache 2.0 license

canopylabs/orpheus-tts-67d9ea3f6c05a941c06ad9d2
  • 1 reply
Β·
reacted to MonsterMMORPG's post with πŸš€ 26 days ago
view post
Post
1458
Extending Wan 2.1 generated video - First 14b 720p text to video, then using last frame automatically to to generate a video with 14b 720p image to video - with RIFE 32 FPS 10 second 1280x720p video

Our app has this fully automated :Β https://www.patreon.com/posts/123105403

Here how it works image : https://ibb.co/b582z3R6

Workflow is easy

Use your favorite app to generate initial video.

Get last frame

Give last frame to image to video model - with matching model and resolution

Generate

And merge

Then use MMAudio to add sound

I made it automated in my Wan 2.1 app but can be made with ComfyUI easily as well . I can extend as many as times i want :)

Here initial video

Prompt: Close-up shot of a Roman gladiator, wearing a leather loincloth and armored gloves, standing confidently with a determined expression, holding a sword and shield. The lighting highlights his muscular build and the textures of his worn armor.

Negative Prompt: Overexposure, static, blurred details, subtitles, paintings, pictures, still, overall gray, worst quality, low quality, JPEG compression residue, ugly, mutilated, redundant fingers, poorly painted hands, poorly painted faces, deformed, disfigured, deformed limbs, fused fingers, cluttered background, three legs, a lot of people in the background, upside down

Used Model: WAN 2.1 14B Text-to-Video

Number of Inference Steps: 20

CFG Scale: 6

Sigma Shift: 10

Seed: 224866642

Number of Frames: 81

Denoising Strength: N/A

LoRA Model: None

TeaCache Enabled: True

TeaCache L1 Threshold: 0.15

TeaCache Model ID: Wan2.1-T2V-14B

Precision: BF16

Auto Crop: Enabled

Final Resolution: 1280x720

Generation Duration: 770.66 seconds



Β·
reacted to ginipick's post with πŸ”₯ 26 days ago
view post
Post
4899
🌈✨ FLUX 'Every Text Imaginator'
Multilingual Text-Driven Image Generation and Editing

Demo: ginigen/Every-Text

πŸ“ What is FLUX Text Imaginator?
FLUX Text Imaginator is an innovative tool that leverages cutting-edge FLUX diffusion models to create and edit images with perfectly integrated multilingual text. Unlike other image generation models, FLUX possesses exceptional capability to naturally incorporate text in various languages including Korean, English, Chinese, Japanese, Russian, French, Spanish and more into images!

✨ FLUX's Multilingual Text Processing Strengths

πŸ”€ Superior Multilingual Text Rendering: FLUX renders text with amazing accuracy, including non-English languages and special characters
πŸ‡°πŸ‡· Perfect Korean Language Support: Accurately represents complex Korean combined characters
🈢 Excellent East Asian Language Handling: Naturally expresses complex Chinese characters and Japanese text
πŸ” Sophisticated Text Placement: Precise text positioning using <text1>, <text2>, <text3> placeholders
🎭 Diverse Text Styles: Text representation in various styles including handwriting, neon, signage, billboards, and more
πŸ”„ Automatic Translation Feature: Korean prompts are automatically translated to English for optimal results

πŸš€ How It Works

Text Generation Mode:

Enter your prompt (with optional text placeholders)
Specify your desired text in any language
Generate high-quality images with naturally integrated text using FLUX's powerful multilingual processing capabilities
Get two different versions of your image for each generation


Image Editing Mode:

Upload any image
Add editing instructions
Specify new text to add or replace (multilingual support)
Create naturally edited images with FLUX's sophisticated text processing abilities

πŸ’» Technical Details
FLUX's Core Technologies:
-Text-Aware Diffusion Model
-Multilingual Processing Engine
-Korean-English Translation Pipeline
-Optimized Pipeline
  • 2 replies
Β·
reacted to kalkey's post with πŸ‘€ 26 days ago
view post
Post
466
we are using the hugging face pro model but all of a sudden we are getting this below error "Error: "The model meta-llama/Llama-3.2-11B-Vision-Instruct is too large to be loaded automatically (21GB > 10GB)." need help, please

  • 4 replies
Β·
reacted to chansung's post with ❀️ 26 days ago
view post
Post
2573
Mistral AI Small 3.1 24B is not only commercial free but also the best model in a single GPU deployment.

I packed up all the information you need to know in a single picture. Hope this helps! :)
  • 1 reply
Β·
reacted to MohamedRashad's post with πŸ‘€ 26 days ago
reacted to sharpenb's post with πŸ”₯ 26 days ago
view post
Post
3087
We open-sourced the pruna package that can be easily installed with pip install pruna :) It allows to easily ccompress and evaluate AI models including transformers and diffusers.

- Github repo: https://github.com/PrunaAI/pruna
- Documentation: https://docs.pruna.ai/en/stable/index.html

With open-sourcing, people can now inspect and contribute to the open code. Beyond the code, we provide detailed readme, tutorials, benchmarks, and documentation to make transparent compression, evaluation, and saving/loading/serving of AI models.

Happy to share it with you and always interested in collecting your feedback :)
  • 2 replies
Β·
reacted to daavoo's post with πŸ”₯ 26 days ago
view post
Post
2014
πŸ€– πŸ—ΊMapped all(?) the swimming pools ️🏊 around another town with https://github.com/mozilla-ai/osm-ai-helper.

This time, I have mapped and contributed to https://www.openstreetmap.org more than 100 swimming pools around my wife's hometown. Only took about 20min to find them all (+~3 min verification) in a free Colab GPUπŸš€

Try it yourself around a single point: mozilla-ai/osm-ai-helper
reacted to Akjava's post with πŸ‘ 26 days ago
view post
Post
505
I've shared Hugging Face Spaces for CPU-based RAG and T5/Flan-T5 models. The smolagents-rag space sometimes produces high-quality answers, but it can be slow. Qwen2.5-0.5B is as fast as a CPU implementation and generates answers of acceptable quality. I've found that Gemma3-4B produces significantly more stable answers than the 1B version.

Rag
Akjava/Gemma3-4B-llamacpp-cpu-rag-smolagents
Akjava/Qwen2.5-0.5B-Rag-Thinking-Flan-T5

t5/flan-t5
Akjava/llamacpp-flan-t5-large-grammar-synthesis
Akjava/llamacpp-madlad400-3b-mt-2jp

Huggingface Free CPU Limitations
When duplicating a space, the build process(llama-cpp-python) can occasionally become stuck, requiring a manual restart to finish.
Spaces may unexpectedly stop functioning or even be deleted, leading to the need to rework them. Refer to issue for more information.
reacted to lbourdois's post with ❀️ 26 days ago
view post
Post
2237
We introduce FAT5 (Flash Attention T5) ⚑

An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations.
The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE.
The result kernel is 2 times faster than a SPDA implementation.
We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer.

The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining.

All other optimizations are described in a πŸ“ subsequent blog post available on @huggingface πŸ€—: CATIE-AQ/FAT5-report.

This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ €1,900) and a low carbon footprint (13.5kg eq CO2).

The model's weights are also available on Hugging Face: CATIE-AQ/FAT5-small.
Not very useful in practice, it's a PoC and not an instructed model (it's planned for later).

All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific domain: https://github.com/catie-aq/flashT5 ⭐

Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.
reacted to clem's post with πŸ‘€ 26 days ago
view post
Post
3718
Should we assemble affordable open-source robots at Hugging Face for the community. Would you buy them? At what price?
Β·
reacted to onekq's post with πŸ‘€ 26 days ago
view post
Post
1575
I like to benchmark πŸ’΅o1-proπŸ’΅ but it is way too expensive for me πŸ€¦β€β™‚οΈ
Β·
reacted to prithivMLmods's post with πŸ”₯ 26 days ago
view post
Post
2312
Play with Orpheus TTS, a Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been fine-tuned to deliver human-level speech synthesis πŸ”₯πŸ—£οΈ

πŸ‘‰GitHub [ Demo ] : https://github.com/PRITHIVSAKTHIUR/Orpheus-TTS-Edge

Demo supporting both text-to-speech and text-to-llm responses in speech.

> voice: tara, dan, emma, josh
> emotion: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>.

πŸ₯ Orpheus-3b-0.1-ft
Model Page: canopylabs/orpheus-3b-0.1-ft

πŸ₯ Orpheus-3b-0.1-ft
Colab Inference Notebook: https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k-pQomz3N?usp=sharing

πŸ₯ Finetune [ orpheus-3b-0.1-pretrained ]
Resource: https://github.com/canopyai/Orpheus-TTS/tree/main/finetune

πŸ₯ Model-releases:
https://canopylabs.ai/model-releases
  • 1 reply
Β·
reacted to gtvracer's post with πŸ‘€ 26 days ago
reacted to AdinaY's post with πŸ‘ 26 days ago
reacted to ShihuaHuang's post with πŸ‘€ 26 days ago
view post
Post
1446

Our work, DEIM, is available on HF: https://huggingface.co/papers/2412.04234. The SoTA COCO real-time object detector.

reacted to Jaward's post with πŸ‘€ 26 days ago
view post
Post
1764
Finally, the ground truth / AlexNet’s original source code is available to all.
Context: AlexNet had a historic win in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), reducing error rate from 26% (previous best) to 15.3%. It’s a deep CNN with 8 layers (5 convolutional + 3 fully connected), pioneering the use of ReLU activations for faster training, dropout for regularization, and GPU acceleration for large-scale learning. This moment marked the beginning of the deep learning revolution, inspiring architectures like VGG, ResNet, and modern transformers.
Code: https://github.com/computerhistory/AlexNet-Source-Code