6 144 182

Inui

Norm

https://normxu.github.io/

AI & ML interests

Video Diffusion; Large Language Model; Object Detection; OCR

Recent Activity

reacted to Kseniase's post with 🚀 4 days ago

5 New implementations of Diffusion Models Diffusion models are widely used for image and video generation but remain underexplored in text generation, where autoregressive models (ARMs) dominate. Unlike ARMs, which produce tokens sequentially, diffusion models iteratively refine noise through denoising steps, offering greater flexibility and speed. Recent advancements show a shift toward using diffusion models in place of, or alongside, ARMs. Researchers also combine strengths from both methods and integrate autoregressive concepts into diffusion. Here are 5 new implementations of diffusion models: 1. Mercury family of diffusion LLMs (dLLMs) by Inception Labs -> https://www.inceptionlabs.ai/news It applies diffusion to text and code data, enabling sequence generation 10x faster than today's top LLMs. Now available Mercury Coder can run at over 1,000 tokens/sec on NVIDIA H100s. 2. Diffusion of Thoughts (DoT) -> https://huggingface.co/papers/2402.07754 Integrates diffusion models with Chain-of-Thought. DoT allows reasoning steps to diffuse gradually over time. This flexibility enables balancing between reasoning quality and computational cost. 3. LLaDA -> https://huggingface.co/papers/2502.09992 Shows diffusion models' potential in replacing ARMs. Trained with pre-training and SFT, LLaDA masks tokens, predicts them via a Transformer, and optimizes a likelihood bound. LLaDA matches key LLM skills, and surpasses GPT-4o in reversal poetry. 4. LanDiff -> https://huggingface.co/papers/2503.04606 This hybrid text-to-video model combines autoregressive and diffusion paradigms, introducing a semantic tokenizer, an LM for token generation, and a streaming diffusion model. LanDiff outperforms models like Sora. 5. General Interpolating Discrete Diffusion (GIDD) -> https://huggingface.co/papers/2503.04482 A flexible noising process with a novel diffusion ELBO enables combining masking and uniform noise, allowing diffusion models to correct mistakes, where ARMs struggle.

liked a model 8 days ago

PKU-Alignment/Align-DS-V

upvoted a paper 9 days ago

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

View all activity

Organizations

Norm's activity

upvoted a paper 9 days ago

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25, 2024 • 62

upvoted a paper 10 days ago

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published 11 days ago • 65

upvoted a paper 13 days ago

Mobius: Text to Seamless Looping Video Generation via Latent Shift

Paper • 2502.20307 • Published 15 days ago • 17

upvoted a paper 15 days ago

Large Language Diffusion Models

Paper • 2502.09992 • Published 28 days ago • 103

upvoted a paper 18 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 22 days ago • 129

upvoted 2 papers 22 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 23 days ago • 164

Phantom: Subject-consistent video generation via cross-modal alignment

Paper • 2502.11079 • Published 26 days ago • 52

upvoted a collection 24 days ago

Deepseek Papers

Collection

Deepseek papers collection • 18 items • Updated 24 days ago • 168

upvoted a paper 29 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 29

upvoted a paper 30 days ago

Magic 1-For-1: Generating One Minute Video Clips within One Minute

Paper • 2502.07701 • Published about 1 month ago • 34

upvoted a paper about 1 month ago

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published Feb 4 • 62

upvoted 9 papers about 2 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 346