William Lamkin's picture

William Lamkin

phanes

·

AI & ML interests

None yet

Organizations

phanes's activity

upvoted a paper about 3 hours ago

4Diffusion: Multi-view Video Diffusion Model for 4D Generation

Paper • 2405.20674 • Published 3 days ago • 6

upvoted a paper 4 days ago

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Paper • 2405.18750 • Published 5 days ago • 16

upvoted a paper 5 days ago

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Paper • 2405.18386 • Published 6 days ago • 13

upvoted 8 papers 6 days ago

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Paper • 2405.17414 • Published 7 days ago • 7

Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

Paper • 2405.15216 • Published 10 days ago • 11

Part123: Part-aware 3D Reconstruction from a Single-view Image

Paper • 2405.16888 • Published 7 days ago • 10

Matryoshka Multimodal Models

Paper • 2405.17430 • Published 7 days ago • 29

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Paper • 2405.17405 • Published 7 days ago • 13

Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Paper • 2405.16822 • Published 7 days ago • 11

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

Paper • 2405.16537 • Published 8 days ago • 15

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Paper • 2405.15757 • Published 10 days ago • 12

upvoted a paper 7 days ago

Look Once to Hear: Target Speech Hearing with Noisy Examples

Paper • 2405.06289 • Published 24 days ago • 3

upvoted 4 papers 10 days ago

Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling

Paper • 2405.14847 • Published 11 days ago • 6

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published 11 days ago • 27

Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras

Paper • 2405.14866 • Published 11 days ago • 5

ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published 12 days ago • 21

upvoted 5 papers 13 days ago

Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices

Paper • 2405.12211 • Published 14 days ago • 1

Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published 14 days ago • 22

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Paper • 2405.11252 • Published 16 days ago • 11

Imp: Highly Capable Large Multimodal Models for Mobile Devices

Paper • 2405.12107 • Published 14 days ago • 23

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published 15 days ago • 53

upvoted 2 papers 14 days ago

Grounded 3D-LLM with Referent Tokens

Paper • 2405.10370 • Published 18 days ago • 8

INDUS: Effective and Efficient Language Models for Scientific Applications

Paper • 2405.10725 • Published 17 days ago • 23

upvoted 5 papers 17 days ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published 18 days ago • 96

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

Paper • 2405.10315 • Published 18 days ago • 9

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Paper • 2405.09874 • Published 18 days ago • 15

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published 18 days ago • 19

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published 18 days ago • 38

upvoted 2 papers 19 days ago

Compositional Text-to-Image Generation with Dense Blob Representations

Paper • 2405.08246 • Published 21 days ago • 11

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Paper • 2405.08054 • Published 21 days ago • 21

upvoted 4 papers 20 days ago

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Paper • 2405.07990 • Published 21 days ago • 15

SUTRA: Scalable Multilingual Language Model Architecture

Paper • 2405.06694 • Published 27 days ago • 34

Large Language Models as Planning Domain Generators

Paper • 2405.06650 • Published Apr 2 • 8

LogoMotion: Visually Grounded Code Generation for Content-Aware Animation

Paper • 2405.07065 • Published 23 days ago • 16

upvoted 13 papers about 1 month ago

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2 • 47

Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting

Paper • 2404.19758 • Published Apr 30 • 10

DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

Paper • 2401.16465 • Published Jan 29 • 10

SAGS: Structure-Aware 3D Gaussian Splatting

Paper • 2404.19149 • Published Apr 29 • 13

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 68

NeRF-XL: Scaling NeRFs with Multiple GPUs

Paper • 2404.16221 • Published Apr 24 • 11

Interactive3D: Create What You Want by Interactive 3D Generation

Paper • 2404.16510 • Published Apr 25 • 18

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22 • 21

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 122

Music Consistency Models

Paper • 2404.13358 • Published Apr 20 • 12

Long-form music generation with latent diffusion

Paper • 2404.10301 • Published Apr 16 • 23

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Paper • 2404.14396 • Published Apr 22 • 17

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published Apr 21 • 26

upvoted 13 papers about 2 months ago

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

Paper • 2404.11565 • Published Apr 17 • 12

MeshLRM: Large Reconstruction Model for High-Quality Mesh

Paper • 2404.12385 • Published Apr 18 • 24

Scaling Instructable Agents Across Many Simulated Worlds

Paper • 2404.10179 • Published Mar 13 • 23

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15 • 11

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15 • 20

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Paper • 2404.09833 • Published Apr 15 • 27

TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14 • 42

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 26

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 28

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11 • 41

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11 • 46

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Paper • 2404.07199 • Published Apr 10 • 22

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Paper • 2404.06903 • Published Apr 10 • 14