Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

MonsterMMORPG

posted an update 1 day ago

Post

2402

WAN 2.1 FusionX + Self Forcing LoRA are the New Best of Local Video Generation with Only 8 Steps + FLUX Upscaling Guide : https://www.youtube.com/watch?v=Xbn93GRQKsQ

Tutorial : https://www.youtube.com/watch?v=Xbn93GRQKsQ

Video Chapters

0:00 Introduction to the New FusionX Video Model & FLUX Upscaling
0:30 One-Click Presets & The SwarmUI Model Downloader Explained
1:07 Achieving Hyper-Realism with the FLUX 2x Latent Upscale Preset
1:58 How to Download & Install the SwarmUI Model Downloader
2:49 Downloading Full Models vs. Downloading Just The LoRAs
3:48 Final Setup: Updating SwarmUI & Importing The New Presets
4:32 Generating a Video: Applying the FusionX Image-to-Video Preset
5:03 Critical Step: Correcting The Model's Native Resolution Metadata
5:55 Finalizing Image-to-Video Settings (Frame Count & RIFE Interpolation)
6:49 Troubleshooting Performance: Identifying Low GPU Usage & Shared VRAM Bug
8:35 The Solution: Disabling Sage Attention for Image-to-Video Models
10:02 Final Result: Showcasing The Amazing HD Quality Animation
10:40 How to Use the FusionX Text-to-Video Model with Presets
11:49 Text-to-Video Result & Quality Comparison
12:08 How to Use the FusionX LoRA with the Base Wan 2.1 Model
13:07 FLUX Tutorial: Downloading The Required Upscaler & Face Models
13:48 Generating a High-Quality Image with The Official FLUX Preset
14:50 Using Automatic Face Segmentation & Inpainting with FLUX
16:05 The Ultimate Upgrade: Applying The FLUX 2x Latent Upscaler Preset
16:32 Final Result: Comparing Standard vs. 2x Upscaled Image Quality
16:50 Outro & Sneak Peek of The New Ultimate Video Processing App

6 replies

merve

posted an update 2 days ago

Post

3387

Releases of the past week are here merve/releases-june-13-6852c3c1eaf1e0c24c958860

Here's our picks 🤓
So many interesting models released past week in open AI! 🤖

🖼️ Computer Vision/VLMs
> nanonets/Nanonets-OCR-s is the new state-of-the-art OCR model that can handle checkboxes, watermarks, tables (OS)
> Meta released facebook/v-jepa-2-6841bad8413014e185b497a6, new sota video embeddings with two new classification models (OS)
> ByteDance-Seed/SeedVR2-3B is a new 3B video restoration model (OS)

Audio
> Stepfun released stepfun-ai/Step-Audio-AQAA, new large (137B 🤯) audio language model that takes in audio and generates audio (OS)

🤖 Robotics
> nvidia released nvidia/GR00T-N1.5-3B, new open foundation vision language action model

3D
> tencent/Hunyuan3D-2.1 is the new version of Hunyuan by Tencent that can generate 3D assets from text and image prompts

clem

posted an update about 22 hours ago

Post

1123

We got a visitor to the office today!

pollen-robotics ,

lerobot ,

unitreerobotics meetings!

prithivMLmods

posted an update 3 days ago

Post

3539

The demo for the MonkeyOCR Recognition model, which adopts a Structure-Recognition-Relation (SRR) triplet paradigm & Nanonets-OCR-s a powerful, state-of-the-art image-to-markdown OCR model that goes far beyond traditional text extraction and other experimental document OCR models, is combined into a single space.

✦ Try the demo here : prithivMLmods/core-OCR
✦ Try Nanonets-OCR-s demo here : prithivMLmods/Multimodal-OCR

⤷ MonkeyOCR Recognition : echo840/MonkeyOCR
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ coreOCR-7B-050325-preview : prithivMLmods/coreOCR-7B-050325-preview
⤷ Nanonets-OCR-s : nanonets/Nanonets-OCR-s

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Also, include a sample OCR test using the VisionOCR-3B-061125 model and the Qwen2-VL-OCR-2B-Instruct model.
⤷ Blog : https://huggingface.co/blog/prithivMLmods/visionocr-3b-061125-vs-qwen2-vl-ocr-2b-instruct

To know more about it, visit the model card of the respective model. !!

multimodalart

posted an update 1 day ago

Post

1578

Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing

1 reply

openfree

posted an update 2 days ago

Post

2801

🎯 Open GAMMA - AI PPT Generator 'GamJa'

🚀 Project Introduction
Revolutionary AI presentation generator presented by OpenFree AI Community! Create professional-level PPTs with just a few clicks.
🆓 Completely FREE! Create Premium PPTs with Free GAMMA! 🎉

DEMO: openfree/Open-GAMMA

✨ Key Features

🤖 Powered by FACTS Grounding Leaderboard 2nd RANK LLM
Base Model: vidraft/gemma-3-R1984-27B
Perfect support for English/Korean/Multi-language
Automatic speaker notes generation

🎨 Premium Visuals
3D style AI image generation
5 design themes (Professional, Modern, Nature, Creative, Minimal)
FLUX style diagram images
Automatic emoji bullet points

📊 Smart Diagrams
Process Flow, Concept Map, WBS, Radial, Synoptic Chart
Content analysis-based automatic diagram generation
Perfect Korean font support

💡 Main Features

📝 Intelligent Content Generation
Auto-generate 3-20 slides just by entering a topic
Latest information through web search
Reference PDF, CSV, TXT files

🖼️ Visual Automation
3D images for cover & conclusion slides
Auto-generate 2 content-based diagrams
Add 2 FLUX style images

🎯 Customizable Design
5 professional themes
3 layout styles
Automatic emoji mapping system

💰 Premium Features for FREE!
Create professional-grade presentations with Free GAMMA (Open GAMMA) that rivals paid PPT generation services! 🚀

3 replies

merve

posted an update about 24 hours ago

Post

1257

stop using VLMs blindly ✋🏻

compare different VLM outputs on a huge variety of inputs (from reasoning to OCR!) 🔥 visionLMsftw/comparevlms

> has support for multiple VLMs: google/gemma-3-27b-it, Qwen/Qwen2.5-VL-7B-Instruct, Qwen/Qwen2.5-VL-32B-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct, HuggingFaceTB/SmolVLM2-2.2B-Instruct
> recommend us new models or inputs, we'll add 🫡

so far I figured out
> for fact-checks, you need a relatively bigger size (7B is ok!)
> Gemma 3 gets downgrade without pan and scan (especially for 📑)
> Qwen2.5VL-32B is very talkative, great for reasoning but not good for simple tasks 🗣️

2 replies

codelion

posted an update 1 day ago

Post

1337

DeepThink Plugin: Bringing Gemini 2.5's Parallel Reasoning to Open Models

Just released an open-source plugin that implements Google's "Deep Think" reasoning approach for models like DeepSeek R1, Qwen3, and other open models.

Google's recent Gemini 2.5 report introduced Deep Think - a technique where models generate multiple hypotheses in parallel and critique them before arriving at final answers. It achieves SOTA results on math olympiads and competitive coding benchmarks.

Our implementation works by modifying the inference pipeline to explore multiple solution paths simultaneously, then synthesizing the best approach. Instead of single-pass generation, models run an internal debate before responding.

Key features:
- Works with any model that supports structured reasoning patterns
- Implements parallel thinking during response generation
- Particularly effective for complex reasoning tasks, math, and coding problems
- Increases inference time but significantly improves answer quality

The plugin won the Cerebras & OpenRouter Qwen 3 Hackathon, validating that this approach works well beyond Google's proprietary implementation.

GitHub: https://github.com/codelion/optillm/tree/main/optillm/plugins/deepthink
Demo: https://www.youtube.com/watch?v=b06kD1oWBA4

The goal is democratizing advanced reasoning capabilities that were previously locked behind APIs. Perfect for researchers and practitioners working with local deployments who want enhanced reasoning without dependency on proprietary services.

Performance notes: Currently about 2-3x slower inference but much better results on complex problems. Working on adaptive triggering to only activate when problems benefit from parallel reasoning.

Would love feedback from the HF community and collaborations on optimizing the approach further. Open to PRs and always interested in making open models more capable.

Jaward

posted an update 1 day ago

Post

1260

not sure of what to make of this but solving autonomous/selective reflection seems like a big deal in current agent frameworks. We did hit on this with iterative self-refinement in our AutoAgents framework (https://ijcai.org/proceedings/2024/0003.pdf). Nice read, looking forward to the code.
Paper: Scaling Test-time Compute for LLM Agents (2506.12928)

ghostai1

posted an update 3 days ago

Post

2542

# Reinforcement Learning societal impact: A Deep Dive

Artificial Intelligence, or AI, is revolutionizing the way we live, work, and interact with our environment. With advancements in Reinforcement Learning (RL), machines are becoming increasingly intelligent and capable of making decisions autonomously. This shift is having a significant impact on society as we know it.

One of the most notable aspects of RL is its ability to learn from experience. By observing and interacting with its surroundings, an AI-driven RL system can adapt to new situations and make decisions based on real-world data. This has huge implications for industries like healthcare, where AI can be used to analyze patient data and provide personalized treatment plans, or finance, where it can help predict market trends and make more informed investment decisions.

Furthermore, RL is driving innovation in robotics and automation. Autonomous vehicles, for example, rely on RL to navigate complex environments safely and efficiently. Similarly, manufacturing processes are being automated with RL-powered robots that can learn and improve their performance over time.

While these advancements bring countless benefits, they also raise concerns about privacy, security, and job displacement. It's crucial that we continue to develop ethical guidelines for AI usage and invest in reskilling programs to help workers transition into new roles as automation becomes more prevalent.

In conclusion, the societal impact of AI-driven Reinforcement Learning is vast and multifaceted. From healthcare to finance, transportation to manufacturing, RL is transforming industries and shaping our future in ways we've only just begun to comprehend. As we continue to harness the power of this technology, it's important that we also consider its implications and strive to create a world where AI enhances human potential, rather than replaces it.

Recently active users