Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

openfree 
posted an update 1 day ago
view post
Post
2486
✨ DreamO Video: From Customized Images to Videos ✨
Hello, AI creators! Today I'm introducing a truly special project. DreamO Video is an integrated framework that generates customized images based on reference images and transforms them into videos with natural movement. 🎬✨

openfree/DreamO-video

🔍 Key Features

Image Reference (IP): Maintain object appearance while applying to new backgrounds and situations
ID Preservation: Retain facial features across various environments
Style Transfer: Apply unique styles from reference images to other content
🎞️ Video Generation: Create natural 2-second videos from generated images

💡 How to Use

Upload Reference Images: One or two images (people, objects, landscapes, etc.)
Select Task Type: Choose between IP (Image Preservation), ID (Face Feature Retention), or Style
Enter Prompt: Describe your desired result (e.g., "a woman playing guitar on a cloud")
Click Generate Image: ✨ Create customized AI images!
Generate Video: Click the 🎬 button on the generated image to create a 2-second natural video

🚀 Usage Examples

👗 Virtual Fitting: Combine clothes and people to visualize outfit appearance
🖼️ Artwork Transformation: Create new images in your favorite styles
📸 Portrait Modification: Create appearances in various environments and situations
🎭 Character Design: Develop new characters based on reference images
🎥 Short Animations: Transform static images into vivid videos

⚠️ Demo Version Notice
In the current demo version, video generation is restricted to 2 seconds only. The full version supports generation of up to 60 seconds.
📊 Latest Updates

2025.05.13: DreamO Video Integration version released!
2025.05.11: Improved oversaturation and unnatural face issues

Create amazing content with DreamO Video! If you have any questions or feedback, please don't hesitate to contact us. We look forward to seeing your creations! 💫🎨
#AI #ImageGeneration #VideoGeneration #DreamO #HuggingFace
  • 1 reply
·
merve 
posted an update 1 day ago
view post
Post
3563
VLMS 2025 UPDATE 🔥

We just shipped a blog on everything latest on vision language models, including
🤖 GUI agents, agentic VLMs, omni models
📑 multimodal RAG
⏯️ video LMs
🤏🏻 smol models
..and more! https://huggingface.co/blog/vlms-2025
  • 1 reply
·
MonsterMMORPG 
posted an update 1 day ago
view post
Post
1649
Transfer Any Clothing Into A New Person & Turn Any Person Into A 3D Figure - ComfyUI Tutorial

ComfyUI is super hard to use but I have literally prepared 1-click way to install and use 2 amazing workflows. First workflow is generating a person wearing any clothing. The second workflow is turning any person image into a 3D toy like figure image.

Tutorial Link : https://youtu.be/ZzYnhKeaJBs

Video Chapters
0:00:00 Intro: Two One-Click ComfyUI Workflows (Clothing Gen & 3D Figure)
0:00:34 Effort & Goal: Easy Installation & Use of Complex Workflows
0:00:49 Setup Part 1: ComfyUI Prerequisite & Downloading Project Zip File
0:01:06 Setup Part 2: Extracting Zip into ComfyUI Folder (WinRAR 'Extract Here' Tip)
0:01:18 Setup Part 3: Running update_comfyui.bat for Latest ComfyUI Version
0:01:37 Setup Part 4: Running install_clothing_and_3D.bat (Installs Nodes & Requirements)
0:02:03 Model Downloads: Intro to Swarm UI Auto-Installer & Automatic Updater
0:02:28 Using Swarm UI: Launching Windows_start_download_models_app.bat
0:02:51 Model Selection in Swarm UI: Flux Dev GGUF Q8 & Low VRAM Options
0:03:05 Configuring Model Download Path: Pointing to Your ComfyUI/models Folder
0:03:22 Downloading Flux Model: GGUF Quality Levels Explained (Q8, Q6, Q5, Q4, KM, KS)
0:04:10 Downloading Workflow Bundle: 'Clothing Migration Workflow Bundle' for All Models
0:04:38 Starting ComfyUI: Using Windows_run_GPU.bat & Optional .bat File Customization
0:05:16 Workflow 1 (Clothing): Loading via Drag & Drop, Selecting Input Garment Image
0:05:33 Workflow 1 (Clothing) Params: Crafting the Main Generation Prompt & Adding Extra Text
0:06:02 Workflow 1 (Clothing) Params: GPU-Dependent Model Loader (GGUF Q8 vs Full Precision FP16)
0:06:22 Workflow 1 (Clothing) Low VRAM: Block Swapping with FP16 Flux Dev Model (UNET Loader)
.
.
.
ginipick 
posted an update 2 days ago
view post
Post
4971
# 🌟 3D Model to Video: Easy GLB Conversion Tool 🌟

demo link: ginigen/3D-VIDEO

Hello there! Would you like to transform your 3D models into stunning animations? This space can help you! ✨

## 🔍 What Can It Do?

This tool converts your uploaded GLB model into:
1. 🎮 A transformed GLB file
2. 🎬 An animated GIF preview
3. 📋 A metadata JSON file

## ✅ Key Features

* 🖥️ Works in headless server environments (EGL + pyglet-headless → pyrender fallback)
* 🔍 Objects in GIFs appear 3x larger (global scale ×3)
* 🎨 Clean interface with pastel background

## 🎮 Animation Types

* 🔄 Rotate - Object rotates around the Y-axis
* ⬆️ Float - Object moves smoothly up and down
* 💥 Explode - Object moves sideways
* 🧩 Assemble - Object returns to its original position
* 💓 Pulse - Object changes in size
* 🔄 Swing - Object swings around the Z-axis

## 🛠️ How to Use

1. Upload your GLB model 📤
2. Select your desired animation type 🎬
3. Adjust the duration and FPS ⏱️
4. Click the "Generate Animation" button ▶️
5. Download your results 📥

## 💻 Technical Details

* Rendering system using trimesh and pyrender
* Automatic fallback method for rendering failures to ensure stability
* GIF generation supporting up to 60 frames

Breathe life into your static 3D models with this tool! 🚀 If you have any questions or feedback, please let us know. Happy 3D modeling! ✨
ArturoNereu 
posted an update about 15 hours ago
view post
Post
857
I’ve been learning AI for several years (coming from the games industry), and along the way, I curated a list of the tools, courses, books, papers, and models that actually helped me understand things.

I turned this into a GitHub repo:
https://github.com/ArturoNereu/AI-Study-Group

If you’re just getting started, I recommend:

📘 Deep Learning – A Visual Approach: https://www.glassner.com/portfolio/deep-learning-a-visual-approach
🎥 Dive into LLMs with Andrej Karpathy: https://youtu.be/7xTGNNLPyMI?si=aUTq_qUzyUx36BsT
🧠 The 🤗 Agents course](https://huggingface.co/learn/agents-course/

The repo has grown with help from the community (Reddit, Discord, etc.) and I’ll keep updating it.

If you have any favorite resources, I’d love to include them.
AdinaY 
posted an update 1 day ago
view post
Post
1741
Matrix Game 🎮 an interactive foundation model for controllable game world generation, released by Skywork AI.

Skywork/Matrix-Game

✨ 17B with MIT licensed
✨ Diffusion-based image-to-world video generation via keyboard & mouse input
✨ GameWorld Score benchmark for Minecraft world models
✨ Massive Matrix Game Dataset with fine-grained action labels
dhruv3006 
posted an update 1 day ago
view post
Post
1275
The era of local Computer Use AI Agents is here.

Meet UI-TARS-1.5-7B-6bit, now running natively on Apple Silicon via MLX.

The video is of UI-TARS-1.5-7B-6bit completing the prompt "draw a line from the red circle to the green circle, then open reddit in a new tab" running entirely on MacBook. The video is just a replay, during actual usage it took between 15s to 50s per turn with 720p screenshots (on avg its ~30s per turn), this was also with many apps open so it had to fight for memory at times.

Built using c/ua : https://github.com/trycua/cua

Join us making them here: https://discord.gg/4fuebBsAUj

Kudos to the MLX community here on huggingface : mlx-community

VirtualOasis 
posted an update 2 days ago
view post
Post
2263
Automatic Multi-Modal Research Agent
I am thinking of building an Automatic Research Agent that can boost creativity!

Input: Topics or data sources
Processing: Automated deep research
Output: multimodal results (such as reports, videos, audio, diagrams) & multi-platform publishing.

There is a three-stage process
In the initial Stage, output for text-based content in markdown format allows for user review before transformation into various other formats, such as PDF or HTML.

The second stage transforms the output into other modalities, like audio, video, diagrams, and translations into different languages.

The final stage focuses on publishing multi-modal content across multiple platforms like X, GitHub, Hugging Face, YouTube, and podcasts, etc.
hesamation 
posted an update 2 days ago
view post
Post
2798
this book actually exists for free, “the little book of deep learning”. best to refresh your mind about DL basics:
> foundations of machine learning
> how models train
> common layers (dropout, pooling…)
> basic intro to LLMs
actually optimized for mobile.

Book: https://fleuret.org/public/lbdl.pdf
DawnC 
posted an update about 20 hours ago
view post
Post
1138
🚀 VisionScout Now Speaks More Like Me — Thanks to LLMs!
I'm thrilled to share a major update to VisionScout, my end-to-end vision system.

Beyond robust object detection (YOLOv8) and semantic context (CLIP), VisionScout now features a powerful LLM-based scene narrator (Llama 3.2), improving the clarity, accuracy, and fluidity of scene understanding.

This isn’t about replacing the pipeline , it’s about giving it a better voice. ✨

⭐️ What the LLM Brings
Fluent, Natural Descriptions:
The LLM transforms structured outputs into human-readable narratives.

Smarter Contextual Flow:
It weaves lighting, objects, zones, and insights into a unified story.

Grounded Expression:
Carefully prompt-engineered to stay factual — it enhances, not hallucinates.

Helpful Discrepancy Handling:
When YOLO and CLIP diverge, the LLM adds clarity through reasoning.

VisionScout Still Includes:
🖼️ YOLOv8-based detection (Nano / Medium / XLarge)
📊 Real-time stats & confidence insights
🧠 Scene understanding via multimodal fusion
🎬 Video analysis & object tracking

🎯 My Goal
I built VisionScout to bridge the gap between raw vision data and meaningful understanding.
This latest LLM integration helps the system communicate its insights in a way that’s more accurate, more human, and more useful.

Try it out 👉 DawnC/VisionScout

If you find this update valuable, a Like❤️ or comment means a lot!

#LLM #ComputerVision #MachineLearning #TechForLife