Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

ginipick 
posted an update 2 days ago
view post
Post
4165
🌈✨ FLUX 'Every Text Imaginator'
Multilingual Text-Driven Image Generation and Editing

Demo: ginigen/Every-Text

📝 What is FLUX Text Imaginator?
FLUX Text Imaginator is an innovative tool that leverages cutting-edge FLUX diffusion models to create and edit images with perfectly integrated multilingual text. Unlike other image generation models, FLUX possesses exceptional capability to naturally incorporate text in various languages including Korean, English, Chinese, Japanese, Russian, French, Spanish and more into images!

✨ FLUX's Multilingual Text Processing Strengths

🔤 Superior Multilingual Text Rendering: FLUX renders text with amazing accuracy, including non-English languages and special characters
🇰🇷 Perfect Korean Language Support: Accurately represents complex Korean combined characters
🈶 Excellent East Asian Language Handling: Naturally expresses complex Chinese characters and Japanese text
🔍 Sophisticated Text Placement: Precise text positioning using <text1>, <text2>, <text3> placeholders
🎭 Diverse Text Styles: Text representation in various styles including handwriting, neon, signage, billboards, and more
🔄 Automatic Translation Feature: Korean prompts are automatically translated to English for optimal results

🚀 How It Works

Text Generation Mode:

Enter your prompt (with optional text placeholders)
Specify your desired text in any language
Generate high-quality images with naturally integrated text using FLUX's powerful multilingual processing capabilities
Get two different versions of your image for each generation


Image Editing Mode:

Upload any image
Add editing instructions
Specify new text to add or replace (multilingual support)
Create naturally edited images with FLUX's sophisticated text processing abilities

💻 Technical Details
FLUX's Core Technologies:
-Text-Aware Diffusion Model
-Multilingual Processing Engine
-Korean-English Translation Pipeline
-Optimized Pipeline
  • 2 replies
·
clem 
posted an update 1 day ago
view post
Post
2286
Should we assemble affordable open-source robots at Hugging Face for the community. Would you buy them? At what price?
·
sharpenb 
posted an update 2 days ago
view post
Post
2694
We open-sourced the pruna package that can be easily installed with pip install pruna :) It allows to easily ccompress and evaluate AI models including transformers and diffusers.

- Github repo: https://github.com/PrunaAI/pruna
- Documentation: https://docs.pruna.ai/en/stable/index.html

With open-sourcing, people can now inspect and contribute to the open code. Beyond the code, we provide detailed readme, tutorials, benchmarks, and documentation to make transparent compression, evaluation, and saving/loading/serving of AI models.

Happy to share it with you and always interested in collecting your feedback :)
  • 1 reply
·
burtenshaw 
posted an update about 23 hours ago
view post
Post
1172
The Hugging Face Agents Course now includes three major agent frameworks!

🔗 https://huggingface.co/agents-course

This includes LlamaIndex, LangChain, and our very own smolagents. We've worked to integrate the three frameworks in distinctive ways so that learners can reflect on when and where to use each.

This also means that you can follow the course if you're already familiar with one of these frameworks, and soak up some of the fundamental knowledge in earlier units.

Hopefully, this makes the agents course as open to as many people as possible.
  • 2 replies
·
onekq 
posted an update about 13 hours ago
view post
Post
503
Folks, let's get ready.🥳 We will be busy soon. 😅🤗https://github.com/huggingface/transformers/pull/36878
lbourdois 
posted an update 2 days ago
view post
Post
1739
We introduce FAT5 (Flash Attention T5) ⚡

An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations.
The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE.
The result kernel is 2 times faster than a SPDA implementation.
We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer.

The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining.

All other optimizations are described in a 📝 subsequent blog post available on @huggingface 🤗: CATIE-AQ/FAT5-report.

This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ €1,900) and a low carbon footprint (13.5kg eq CO2).

The model's weights are also available on Hugging Face: CATIE-AQ/FAT5-small.
Not very useful in practice, it's a PoC and not an instructed model (it's planned for later).

All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific domain: https://github.com/catie-aq/flashT5

Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.
etemiz 
posted an update 2 days ago
view post
Post
1555
Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some.

Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :)

Has anyone tried parallel QLoRa and merge before?

I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training".

I want to release some cool stuff when I have the time:
- how an answer to a single question changes over time, with each training round or day
- a chart to show AHA alignment over training rounds
  • 3 replies
·
prithivMLmods 
posted an update 1 day ago
view post
Post
1544
Play with Orpheus TTS, a Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been fine-tuned to deliver human-level speech synthesis 🔥🗣️

👉Demo: prithivMLmods/Orpheus-Edge

Demo supporting both text-to-speech and text-to-llm responses in speech.

> voice: tara, dan, emma, josh
> emotion: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>.

🥠Orpheus-3b-0.1-ft
Model Page: canopylabs/orpheus-3b-0.1-ft

🥠Orpheus-3b-0.1-ft
Colab Inference Notebook: https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k-pQomz3N?usp=sharing

🥠Finetune [ orpheus-3b-0.1-pretrained ]
Resource: https://github.com/canopyai/Orpheus-TTS/tree/main/finetune

🥠Model-releases:
https://canopylabs.ai/model-releases
  • 1 reply
·
daavoo 
posted an update 2 days ago
AdinaY 
posted an update 1 day ago