Prithiv Sakthi PRO

prithivMLmods

AI & ML interests

multi-modal, realism engine, adapters, computer vision

Recent Activity

updated a Space 2 minutes ago
prithivMLmods/Gen-Vision
updated a Space 14 minutes ago
prithivMLmods/Gen-Vision
updated a Space 14 minutes ago
prithivMLmods/Gen-Vision
View all activity

Articles

Organizations

Stanford AI's profile picture DataScienceEngineering's profile picture AI FILMS's profile picture Samsung Electronics's profile picture MISATO-dataset's profile picture GEM benchmark's profile picture OpenGVLab's profile picture MusicAI's profile picture BigScience Biomedical Datasets's profile picture OpenVINO Toolkit's profile picture LLMs's profile picture ONNXConfig for all's profile picture Gradio-Themes-Party's profile picture scikit-learn's profile picture Open-Source AI Meetup's profile picture AMD's profile picture lora concepts library's profile picture Platzi Community's profile picture Kornia AI's profile picture Tune a video concepts library's profile picture Université Dauphine-PSL's profile picture Keras Dreambooth Event's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture The Waifu Research Department's profile picture Musika's profile picture Blog-explorers's profile picture OpenSky's profile picture AI Tamil Nadu's profile picture OpenLLM France's profile picture huggingPartyParis's profile picture Team Tonic's profile picture That Time I got Reincarnated as a Hugging Face Organization's profile picture LocalLLaMA's profile picture Major TOM's profile picture MLX Community's profile picture C4AI Community's profile picture M4-ai's profile picture Chinese LLMs on Hugging Face's profile picture Dataset Tools's profile picture Nerdy Face's profile picture Stranger Zone's profile picture open/ acc's profile picture Data Is Better Together Contributor's profile picture

prithivMLmods's activity

reacted to their post with 🤗 about 4 hours ago
view post
Post
406
Milestone for Flux.1 Dev 🔥

💢The Flux.1 Dev model has crossed 1️⃣0️⃣,0️⃣0️⃣0️⃣ creative public adapters! 🎈
🔗 https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev

💢This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges

💢 Here’s the 10,000th public adapter : 😜
+ strangerzonehf/Flux-3DXL-Partfile-0006

💢 Page :
+ https://huggingface.co/strangerzonehf

💢 Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
posted an update about 6 hours ago
view post
Post
406
Milestone for Flux.1 Dev 🔥

💢The Flux.1 Dev model has crossed 1️⃣0️⃣,0️⃣0️⃣0️⃣ creative public adapters! 🎈
🔗 https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev

💢This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges

💢 Here’s the 10,000th public adapter : 😜
+ strangerzonehf/Flux-3DXL-Partfile-0006

💢 Page :
+ https://huggingface.co/strangerzonehf

💢 Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
reacted to merve's post with 🤗 1 day ago
view post
Post
1815
Last week we were blessed with open-source models! A recap 💝
merve/nov-29-releases-674ccc255a57baf97b1e2d31

🖼️ Multimodal
> At Hugging Face we released SmolVLM, a performant and efficient smol vision language model 💗
> Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents 🤖
> Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length
> ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM
> Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning model📖
> Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT 📕

💬 LLMs
> Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet 🔥
> AliBaba has released Marco-o1, a new open-source reasoning model 💥
> NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer)

⏯️ Image/Video Generation
> Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation
> Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res ⏯️
> Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla 🏷️

Audio
> OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
reacted to Jaward's post with 🧠 3 days ago
view post
Post
2320
Implements compute-efficient DeepPCR algorithm which parallelizes sequential operations thus speeding up inference and training of neural networks. DeepPCR can significantly reduce the time complexity in operations such as denoising in latent diffusion space from O(L) to O(log2 L).

Code: https://github.com/Jaykef/ai-algorithms/blob/main/deep_pcr.ipynb
reacted to victor's post with 👍 4 days ago
view post
Post
1492
Qwen/QwQ-32B-Preview shows us the future (and it's going to be exciting)...

I tested it against some really challenging reasoning prompts and the results are amazing 🤯.

Check this dataset for the results: victor/qwq-misguided-attention
  • 2 replies
·
posted an update 5 days ago
view post
Post
2471
Fine-Textured [Polygon] Character 3D Design Renders 🙉

Adapters capable of providing better lighting control (Bn+, Bn-) and richer textures compared to previous sets require more contextual prompts for optimal performance.

The ideal settings are achieved at inference steps around 30–35, with the best dimensions being 1280 x 832 [ 3:2 ]. However, it also performs well with the default settings of 1024 x 1024 [ 1:1 ].

💢Models DLC :
+ strangerzonehf/Flux-3DXL-Partfile-0001
+ strangerzonehf/Flux-3DXL-Partfile-0002
+ strangerzonehf/Flux-3DXL-Partfile-0003
+ strangerzonehf/Flux-3DXL-Partfile-0004
+ strangerzonehf/Flux-3DXL-Partfile-C0001

💢Collections :
1] strangerzonehf/flux-3dxl-engine-674833c14a001d5b1fdb5139
2] prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be

💢Space :
1] prithivMLmods/FLUX-LoRA-DLC

💢Page :
1] Stranger Zone: https://huggingface.co/strangerzonehf

.
.
.
@prithivMLmods 🤗
reacted to dylanebert's post with 🚀 6 days ago
view post
Post
1458
Generate meshes with AI locally in Blender

📢 New open-source release

meshgen, a local blender integration of LLaMa-Mesh, is open source and available now 🤗

get started here: https://github.com/huggingface/meshgen
reacted to andito's post with 🔥 6 days ago
view post
Post
3056
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
reacted to merve's post with 🚀 7 days ago
view post
Post
3692
Small yet mighty! 💫

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO 💝
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO 💗
reacted to vilarin's post with 🔥 7 days ago
view post
Post
1276
A few days ago, Blackforestlabs released FLUX.1 Tools, which has surprised everyone with its quality and effects. Now that diffusers support these features, you can easily deploy and build your own Tools.
Combined with the powerful Gradio and ZeroGPU, you can experience the Tools immediately, which is truly wonderful.
I was impressed by the Flux.1 Fill dev, so here I've built a demo for it, making it easy to use for inpainting and outpainting images.

🏄Model: black-forest-labs/FLUX.1-Fill-dev
🦖Demo: vilarin/Flux.1-Fill-dev
👏diffusers: https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/flux
reacted to davidberenstein1957's post with 🔥 7 days ago
view post
Post
1661
Let’s make a generation of amazing image-generation models

The best image generation models are trained on human preference datasets, where annotators have selected the best image from a choice of two. Unfortunately, many of these datasets are closed source so the community cannot train open models on them. Let’s change that!

The community can contribute image preferences for an open-source dataset that could be used for building AI models that convert text to image, like the flux or stable diffusion families. The dataset will be open source so everyone can use it to train models that we can all use.

Blog: https://huggingface.co/blog/burtenshaw/image-preferences
reacted to maxiw's post with 🤗 7 days ago
view post
Post
1919
You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")


Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B
·
posted an update 8 days ago
view post
Post
3179
HF Posts Receipts 🏆🚀

[ HF POSTS RECEIPT ] : prithivMLmods/HF-POSTS-RECEIPT

🥠The one thing that needs to be remembered is the 'username'.

🥠And yeah, thank you, @maxiw , for creating the awesome dataset and sharing them here! 🙌

🥠[ Dataset ] : maxiw/hf-posts

.
.
.
@prithivMLmods
reacted to Dref360's post with 🤝 8 days ago
view post
Post
1253
New week, new #cv Gradio app for human understanding.( Dref360/human-interaction-demo) 🥳

This demo highlights when a person touches an object. For instance, it is useful to know if someone is touching a wall, a vase or a door. It works for multiple people too!

Still using nielsr/vitpose-base-simple for pose estimation, very excited to see the PR approved!


reacted to maxiw's post with 🔥 8 days ago
view post
Post
1088
🤖 Controlling Computers with Small Models 🤖

We just released PTA-1, a fine-tuned Florence-2 for localization of GUI text and elements. It runs with ~150ms inference time on a RTX 4080. This means you can now start building fast on-device computer use agents!

Model: AskUI/PTA-1
Demo: AskUI/PTA-1
  • 1 reply
·
reacted to victor's post with 🔥 9 days ago
view post
Post
1979
Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer 🔥
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights 🚀.
reacted to akhaliq's post with ❤️ 9 days ago
view post
Post
2563
anychat

supports chatgpt, gemini, perplexity, claude, meta llama, grok all in one app

try it out there: akhaliq/anychat
posted an update 9 days ago
view post
Post
4077
CRISP 🔥 [ Isometric-3D-Cinematography / Isometric-3D-Obj / 3D-Kawaii / Long Toons ]

[ Flux DLC ] : prithivMLmods/FLUX-LoRA-DLC

[ Stranger Zone ] : https://huggingface.co/strangerzonehf

🎃[ Isometric 3D Cinematography ] : strangerzonehf/Flux-Isometric-3D-Cinematography
🎃[ Isometric 3D ] : strangerzonehf/Flux-Isometric-3D-LoRA
🎃[ Cute 3D Kawaii ] : strangerzonehf/Flux-Cute-3D-Kawaii-LoRA
🌚[ Long Toon 3D ] : prithivMLmods/Flux-Long-Toon-LoRA

[ Stranger Zone Collection ] : https://huggingface.co/collections/prithivMLmods/stranger-zone-collections-6737118adcf2cb40d66d0c7e

[ Flux Collection ] : prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be

[ Flux Mix ] : prithivMLmods/Midjourney-Flux

.
.
.
@prithivMLmods
reacted to hexgrad's post with 🔥 10 days ago
reacted to AdinaY's post with 🔥 11 days ago