ChaangHaan
's Collections
Music
updated
aMUSEd: An Open MUSE Reproduction
Paper
•
2401.01808
•
Published
•
29
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper
•
2401.01885
•
Published
•
28
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via
Stein Identity
Paper
•
2401.00604
•
Published
•
6
LARP: Language-Agent Role Play for Open-World Games
Paper
•
2312.17653
•
Published
•
32
Learning Vision from Models Rivals Learning Vision from Data
Paper
•
2312.17742
•
Published
•
16
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
•
2312.16862
•
Published
•
31
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
Paper
•
2312.16457
•
Published
•
14
InsActor: Instruction-driven Physics-based Characters
Paper
•
2312.17135
•
Published
•
10
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
Time-Decoupled Training and Reusable Coop-Diffusion
Paper
•
2312.16486
•
Published
•
7
SSR-Encoder: Encoding Selective Subject Representation for
Subject-Driven Generation
Paper
•
2312.16272
•
Published
•
7
Prompt Expansion for Adaptive Text-to-Image Generation
Paper
•
2312.16720
•
Published
•
6
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
57
Make-A-Character: High Quality Text-to-3D Character Generation within
Minutes
Paper
•
2312.15430
•
Published
•
28
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Paper
•
2312.15715
•
Published
•
20
LangSplat: 3D Language Gaussian Splatting
Paper
•
2312.16084
•
Published
•
15
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and
Erasing Applications
Paper
•
2312.16145
•
Published
•
9
Supervised Knowledge Makes Large Language Models Better In-context
Learners
Paper
•
2312.15918
•
Published
•
9
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Paper
•
2312.14233
•
Published
•
16
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
Visual-Linguistic Tasks
Paper
•
2312.14238
•
Published
•
19
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Paper
•
2312.14878
•
Published
•
14
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Paper
•
2312.14385
•
Published
•
6
Shai: A large language model for asset management
Paper
•
2312.14203
•
Published
•
5
LLM4VG: Large Language Models Evaluation for Video Grounding
Paper
•
2312.14206
•
Published
•
3
DreamTuner: Single Image is Enough for Subject-Driven Generation
Paper
•
2312.13691
•
Published
•
27
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
Paper
•
2312.13913
•
Published
•
23
Time is Encoded in the Weights of Finetuned Language Models
Paper
•
2312.13401
•
Published
•
20
PIA: Your Personalized Image Animator via Plug-and-Play Modules in
Text-to-Image Models
Paper
•
2312.13964
•
Published
•
19
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image
Inpainting with Diffusion Models
Paper
•
2312.14091
•
Published
•
16
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Paper
•
2312.13789
•
Published
•
14
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion
Models with RL Finetuning
Paper
•
2312.13980
•
Published
•
14
Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation
Paper
•
2312.13469
•
Published
•
11
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed
Diffusion Models
Paper
•
2312.13763
•
Published
•
10
ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors
Paper
•
2312.13324
•
Published
•
10
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Paper
•
2312.13314
•
Published
•
8
HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs
Paper
•
2312.14140
•
Published
•
7
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper
•
2312.12456
•
Published
•
41
Generative Multimodal Models are In-Context Learners
Paper
•
2312.13286
•
Published
•
35
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model
Paper
•
2312.13252
•
Published
•
27
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Paper
•
2312.12490
•
Published
•
17
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
•
2312.12742
•
Published
•
13
Repaint123: Fast and High-quality One Image to 3D Generation with
Progressive Controllable 2D Repainting
Paper
•
2312.13271
•
Published
•
5
LLM in a flash: Efficient Large Language Model Inference with Limited
Memory
Paper
•
2312.11514
•
Published
•
259
StarVector: Generating Scalable Vector Graphics Code from Images
Paper
•
2312.11556
•
Published
•
28
3D-LFM: Lifting Foundation Model
Paper
•
2312.11894
•
Published
•
14
HAAR: Text-Conditioned Generative Model of 3D Strand-based Human
Hairstyles
Paper
•
2312.11666
•
Published
•
13
Jack of All Tasks, Master of Many: Designing General-purpose
Coarse-to-Fine Vision-Language Model
Paper
•
2312.12423
•
Published
•
13
MixRT: Mixed Neural Representations For Real-Time NeRF Rendering
Paper
•
2312.11841
•
Published
•
11
Tracking Any Object Amodally
Paper
•
2312.12433
•
Published
•
12
FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple
Super-Resolution Pipeline
Paper
•
2312.11537
•
Published
•
7
TIP: Text-Driven Image Processing with Semantic and Restoration
Instructions
Paper
•
2312.11595
•
Published
•
6
Text-Conditioned Resampler For Long Form Video Understanding
Paper
•
2312.11897
•
Published
•
6
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided
Document Generation
Paper
•
2312.11532
•
Published
•
6
Customize-It-3D: High-Quality 3D Creation from A Single Image Using
Subject-Specific Knowledge Prior
Paper
•
2312.11535
•
Published
•
7
Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint
Method
Paper
•
2312.12030
•
Published
•
5
VecFusion: Vector Font Generation with Diffusion
Paper
•
2312.10540
•
Published
•
21
Rich Human Feedback for Text-to-Image Generation
Paper
•
2312.10240
•
Published
•
19
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip
Connection Editing
Paper
•
2312.11392
•
Published
•
19
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Paper
•
2312.11370
•
Published
•
20
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Paper
•
2312.10763
•
Published
•
18
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Paper
•
2312.11461
•
Published
•
18
MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual
Storytelling via Multi-Layered Semantic-Aware Denoising
Paper
•
2312.10899
•
Published
•
14
MAG-Edit: Localized Image Editing in Complex Scenarios via
Mask-Based Attention-Adjusted
Guidance
Paper
•
2312.11396
•
Published
•
10
Cascade Speculative Drafting for Even Faster LLM Inference
Paper
•
2312.11462
•
Published
•
8
Silkie: Preference Distillation for Large Visual Language Models
Paper
•
2312.10665
•
Published
•
11
VidToMe: Video Token Merging for Zero-Shot Video Editing
Paper
•
2312.10656
•
Published
•
10
ProTIP: Progressive Tool Retrieval Improves Planning
Paper
•
2312.10332
•
Published
•
7
Your Student is Better Than Expected: Adaptive Teacher-Student
Collaboration for Text-Conditional Diffusion Models
Paper
•
2312.10835
•
Published
•
6
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient
Volumetric Encoder
Paper
•
2312.11459
•
Published
•
5
GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View
Synthesis
Paper
•
2312.11458
•
Published
•
4
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper
•
2312.09911
•
Published
•
54
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper
•
2312.10003
•
Published
•
38
DreamTalk: When Expressive Talking Head Generation Meets Diffusion
Probabilistic Models
Paper
•
2312.09767
•
Published
•
25
MobileSAMv2: Faster Segment Anything to Everything
Paper
•
2312.09579
•
Published
•
21
Point Transformer V3: Simpler, Faster, Stronger
Paper
•
2312.10035
•
Published
•
18
Weight subcloning: direct initialization of transformers using larger
pretrained ones
Paper
•
2312.09299
•
Published
•
18
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion
Models
Paper
•
2312.09608
•
Published
•
14
Self-Evaluation Improves Selective Generation in Large Language Models
Paper
•
2312.09300
•
Published
•
15
Stable Score Distillation for High-Quality 3D Generation
Paper
•
2312.09305
•
Published
•
8
Faithful Persona-based Conversational Dataset Generation with Large
Language Models
Paper
•
2312.10007
•
Published
•
7
StemGen: A music generation model that listens
Paper
•
2312.08723
•
Published
•
48
TinyGSM: achieving >80% on GSM8k with small language models
Paper
•
2312.09241
•
Published
•
38
CogAgent: A Visual Language Model for GUI Agents
Paper
•
2312.08914
•
Published
•
30
VideoLCM: Video Latent Consistency Model
Paper
•
2312.09109
•
Published
•
22
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style
Models on Dense Captions
Paper
•
2312.08578
•
Published
•
17
Pixel Aligned Language Models
Paper
•
2312.09237
•
Published
•
15
SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained
Geometry and Appearance
Paper
•
2312.08889
•
Published
•
12
Vision-Language Models as a Source of Rewards
Paper
•
2312.09187
•
Published
•
12
FineControlNet: Fine-level Text Control for Image Generation with
Spatially Aligned Text Control Injection
Paper
•
2312.09252
•
Published
•
10
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Paper
•
2312.09067
•
Published
•
14
LIME: Localized Image Editing via Attention Regularization in Diffusion
Models
Paper
•
2312.09256
•
Published
•
9
General Object Foundation Model for Images and Videos at Scale
Paper
•
2312.09158
•
Published
•
9
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D
Generation
Paper
•
2312.08754
•
Published
•
7
VL-GPT: A Generative Pre-trained Transformer for Vision and Language
Understanding and Generation
Paper
•
2312.09251
•
Published
•
7
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Paper
•
2312.09246
•
Published
•
6
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
•
2312.07987
•
Published
•
41
Distributed Inference and Fine-tuning of Large Language Models Over The
Internet
Paper
•
2312.08361
•
Published
•
26
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Paper
•
2312.07661
•
Published
•
17
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper
•
2312.07843
•
Published
•
15
Invariant Graph Transformer
Paper
•
2312.07859
•
Published
•
7
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Paper
•
2312.08344
•
Published
•
10
ProNeRF: Learning Efficient Projection-Aware Ray Sampling for
Fine-Grained Implicit Neural Radiance Fields
Paper
•
2312.08136
•
Published
•
4
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper
•
2312.07537
•
Published
•
26
VILA: On Pre-training for Visual Language Models
Paper
•
2312.07533
•
Published
•
21
FreeControl: Training-Free Spatial Control of Any Text-to-Image
Diffusion Model with Any Condition
Paper
•
2312.07536
•
Published
•
17
Interfacing Foundation Models' Embeddings
Paper
•
2312.07532
•
Published
•
11
CCM: Adding Conditional Controls to Text-to-Image Consistency Models
Paper
•
2312.06971
•
Published
•
11
Steering Llama 2 via Contrastive Activation Addition
Paper
•
2312.06681
•
Published
•
12
Honeybee: Locality-enhanced Projector for Multimodal LLM
Paper
•
2312.06742
•
Published
•
10
Fast Training of Diffusion Transformer with Extreme Masking for 3D Point
Clouds Generation
Paper
•
2312.07231
•
Published
•
7
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Paper
•
2312.07509
•
Published
•
8
"I Want It That Way": Enabling Interactive Decision Support Using Large
Language Models and Constraint Programming
Paper
•
2312.06908
•
Published
•
6
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
•
2312.06550
•
Published
•
58
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D
Prior
Paper
•
2312.06655
•
Published
•
24
Photorealistic Video Generation with Diffusion Models
Paper
•
2312.06662
•
Published
•
24
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Paper
•
2312.06109
•
Published
•
21
Context Tuning for Retrieval Augmented Generation
Paper
•
2312.05708
•
Published
•
17
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"
Paper
•
2312.06571
•
Published
•
13
Efficient Quantization Strategies for Latent Diffusion Models
Paper
•
2312.05431
•
Published
•
12
Federated Full-Parameter Tuning of Billion-Sized Language Models with
Communication Cost under 18 Kilobytes
Paper
•
2312.06353
•
Published
•
7
Evaluation of Large Language Models for Decision Making in Autonomous
Driving
Paper
•
2312.06351
•
Published
•
6
Using Captum to Explain Generative Language Models
Paper
•
2312.05491
•
Published
•
4
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
Sequence Processing
Paper
•
2312.05605
•
Published
•
3
DreaMoving: A Human Dance Video Generation Framework based on Diffusion
Models
Paper
•
2312.05107
•
Published
•
38
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Paper
•
2312.04655
•
Published
•
21
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D
priors
Paper
•
2312.04963
•
Published
•
17
Customizing Motion in Text-to-Video Diffusion Models
Paper
•
2312.04966
•
Published
•
11
PathFinder: Guided Search over Multi-Step Reasoning Paths
Paper
•
2312.05180
•
Published
•
10
MVDD: Multi-View Depth Diffusion Models
Paper
•
2312.04875
•
Published
•
10
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
Models with 3D Parallelism
Paper
•
2312.04916
•
Published
•
7
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Paper
•
2312.04837
•
Published
•
3
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper
•
2312.03818
•
Published
•
32
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper
•
2312.04474
•
Published
•
31
Controllable Human-Object Interaction Synthesis
Paper
•
2312.03913
•
Published
•
23
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
•
2312.03793
•
Published
•
18
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
•
2312.04461
•
Published
•
62
Pearl: A Production-ready Reinforcement Learning Agent
Paper
•
2312.03814
•
Published
•
15
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper
•
2312.04410
•
Published
•
15
GenTron: Delving Deep into Diffusion Transformers for Image and Video
Generation
Paper
•
2312.04557
•
Published
•
13
NeRFiller: Completing Scenes via Generative 3D Inpainting
Paper
•
2312.04560
•
Published
•
12
Large Language Models for Mathematicians
Paper
•
2312.04556
•
Published
•
12
Gen2Det: Generate to Detect
Paper
•
2312.04566
•
Published
•
10
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Paper
•
2312.04483
•
Published
•
7
Efficient Monotonic Multihead Attention
Paper
•
2312.04515
•
Published
•
7
Generating Illustrated Instructions
Paper
•
2312.04552
•
Published
•
8
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction
Tuning
Paper
•
2312.03849
•
Published
•
6
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
•
2312.03491
•
Published
•
34
Relightable Gaussian Codec Avatars
Paper
•
2312.03704
•
Published
•
30
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic
Gaussians
Paper
•
2312.03029
•
Published
•
25
MotionCtrl: A Unified and Flexible Motion Controller for Video
Generation
Paper
•
2312.03641
•
Published
•
21
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Paper
•
2312.03209
•
Published
•
18
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian
Splatting
Paper
•
2312.03461
•
Published
•
16
Context Diffusion: In-Context Aware Image Generation
Paper
•
2312.03584
•
Published
•
15
LooseControl: Lifting ControlNet for Generalized Depth Conditioning
Paper
•
2312.03079
•
Published
•
13
DreamComposer: Controllable 3D Object Generation via Multi-View
Conditions
Paper
•
2312.03611
•
Published
•
8
MagicStick: Controllable Video Editing via Control Handle
Transformations
Paper
•
2312.03047
•
Published
•
10
Self-conditioned Image Generation via Generating Representations
Paper
•
2312.03701
•
Published
•
8
Generative agent-based modeling with actions grounded in physical,
social, or digital space using Concordia
Paper
•
2312.03664
•
Published
•
9
Language-Informed Visual Concept Learning
Paper
•
2312.03587
•
Published
•
6
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded
Diffusion Model
Paper
•
2312.02238
•
Published
•
26
LivePhoto: Real Image Animation with Text-guided Motion Control
Paper
•
2312.02928
•
Published
•
17
Describing Differences in Image Sets with Natural Language
Paper
•
2312.02974
•
Published
•
14
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper
•
2312.02432
•
Published
•
13
DragVideo: Interactive Drag-style Video Editing
Paper
•
2312.02216
•
Published
•
11
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human
Captures
Paper
•
2312.02963
•
Published
•
10
Fine-grained Controllable Video Generation via Object Appearance and
Context
Paper
•
2312.02919
•
Published
•
11
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper
•
2312.02981
•
Published
•
9
Training Chain-of-Thought via Latent-Variable Inference
Paper
•
2312.02179
•
Published
•
9
Alchemist: Parametric Control of Material Properties with Diffusion
Models
Paper
•
2312.02970
•
Published
•
8
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Paper
•
2312.02949
•
Published
•
12
GPT4Point: A Unified Framework for Point-Language Understanding and
Generation
Paper
•
2312.02980
•
Published
•
8
Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions
Paper
•
2312.02772
•
Published
•
7
Magicoder: Source Code Is All You Need
Paper
•
2312.02120
•
Published
•
80
VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models
Paper
•
2312.00845
•
Published
•
37
DeepCache: Accelerating Diffusion Models for Free
Paper
•
2312.00858
•
Published
•
22
Nash Learning from Human Feedback
Paper
•
2312.00886
•
Published
•
15
DiffiT: Diffusion Vision Transformers for Image Generation
Paper
•
2312.02139
•
Published
•
14
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for
Real-time Human Novel View Synthesis
Paper
•
2312.02155
•
Published
•
13
Object Recognition as Next Token Prediction
Paper
•
2312.02142
•
Published
•
12
GIVT: Generative Infinite-Vocabulary Transformers
Paper
•
2312.02116
•
Published
•
11
Paper
•
2312.00860
•
Published
•
9
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
Fine-grained Correctional Human Feedback
Paper
•
2312.00849
•
Published
•
9
Style Aligned Image Generation via Shared Attention
Paper
•
2312.02133
•
Published
•
9
Generative Rendering: Controllable 4D-Guided Video Generation with 2D
Diffusion Models
Paper
•
2312.01409
•
Published
•
9
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
Paper
•
2312.01407
•
Published
•
7
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via
Local-Global Iterative Training
Paper
•
2312.01663
•
Published
•
4
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
140
Merlin:Empowering Multimodal LLMs with Foresight Minds
Paper
•
2312.00589
•
Published
•
25
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper
•
2312.00777
•
Published
•
22
SeaLLMs -- Large Language Models for Southeast Asia
Paper
•
2312.00738
•
Published
•
24
MoMask: Generative Masked Modeling of 3D Human Motions
Paper
•
2312.00063
•
Published
•
16
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
Paper
•
2312.00093
•
Published
•
15
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion
Models
Paper
•
2312.00079
•
Published
•
15
Dolphins: Multimodal Language Model for Driving
Paper
•
2312.00438
•
Published
•
13
Instruction-tuning Aligns LLMs to the Human Brain
Paper
•
2312.00575
•
Published
•
12
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style
Adapter
Paper
•
2312.00330
•
Published
•
11
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Paper
•
2312.00109
•
Published
•
10
PyNeRF: Pyramidal Neural Radiance Fields
Paper
•
2312.00252
•
Published
•
9
Towards Accurate Differential Diagnosis with Large Language Models
Paper
•
2312.00164
•
Published
•
9
FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
Paper
•
2312.00451
•
Published
•
10
Text-Guided 3D Face Synthesis -- From Generation to Editing
Paper
•
2312.00375
•
Published
•
9
X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap
Between Text-to-2D and Text-to-3D Generation
Paper
•
2312.00085
•
Published
•
7
FusionFrames: Efficient Architectural Aspects for Text-to-Video
Generation Pipeline
Paper
•
2311.13073
•
Published
•
57
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper
•
2311.13384
•
Published
•
51
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
•
2311.13600
•
Published
•
44
Diffusion Model Alignment Using Direct Preference Optimization
Paper
•
2311.12908
•
Published
•
48
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
Model
Paper
•
2311.13231
•
Published
•
27
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Paper
•
2311.13435
•
Published
•
17
Visual In-Context Prompting
Paper
•
2311.13601
•
Published
•
17
Diffusion360: Seamless 360 Degree Panoramic Image Generation based on
Diffusion Models
Paper
•
2311.13141
•
Published
•
14
MagicDance: Realistic Human Dance Video Generation with Motions & Facial
Expressions Transfer
Paper
•
2311.12052
•
Published
•
31
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Paper
•
2311.12198
•
Published
•
21
NeuroPrompts: An Adaptive Framework to Optimize Prompts for
Text-to-Image Generation
Paper
•
2311.12229
•
Published
•
26
Exponentially Faster Language Modelling
Paper
•
2311.10770
•
Published
•
118
Make Pixels Dance: High-Dynamic Video Generation
Paper
•
2311.10982
•
Published
•
68
Orca 2: Teaching Small Language Models How to Reason
Paper
•
2311.11045
•
Published
•
73
System 2 Attention (is something you might need too)
Paper
•
2311.11829
•
Published
•
40
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper
•
2311.11501
•
Published
•
34
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human
Expression
Paper
•
2311.10794
•
Published
•
26
AutoStory: Generating Diverse Storytelling Images with Minimal Human
Effort
Paper
•
2311.11243
•
Published
•
15
Drivable 3D Gaussian Avatars
Paper
•
2311.08581
•
Published
•
47
GRIM: GRaph-based Interactive narrative visualization for gaMes
Paper
•
2311.09213
•
Published
•
13
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations
Paper
•
2311.08469
•
Published
•
11
PEARL: Personalizing Large Language Model Writing Assistants with
Generation-Calibrated Retrievers
Paper
•
2311.09180
•
Published
•
8
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads
to Answers Faster
Paper
•
2311.08263
•
Published
•
16