Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published 3 days ago • 16
Compositional Text-to-Image Generation with Dense Blob Representations Paper • 2405.08246 • Published 6 days ago • 10
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model Paper • 2404.19759 • Published 19 days ago • 21
PuLID: Pure and Lightning ID Customization via Contrastive Alignment Paper • 2404.16022 • Published 25 days ago • 16
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning Paper • 2404.15449 • Published 26 days ago • 11
MotionMaster: Training-free Camera Motion Transfer For Video Generation Paper • 2404.15789 • Published 25 days ago • 10
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • 25 days ago • 39
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models Paper • 2404.09204 • Published Apr 14 • 10
BRAVE: Broadening the visual encoding of vision-language models Paper • 2404.07204 • Published Apr 10 • 14
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 59
TextCraftor: Your Text Encoder Can be Image Quality Controller Paper • 2403.18978 • Published Mar 27 • 12
FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing Paper • 2403.18605 • Published Mar 27 • 5
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 37
EgoLifter: Open-world 3D Segmentation for Egocentric Perception Paper • 2403.18118 • Published Mar 26 • 7
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation Paper • 2403.17694 • Published Mar 26 • 10
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions Paper • 2403.16627 • Published Mar 25 • 20
NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices Paper • 2403.10425 • Published Mar 15 • 2
SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents Paper • 2403.08715 • Published Mar 13 • 19
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts Paper • 2403.08268 • Published Mar 13 • 15
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting Paper • 2403.08551 • Published Mar 13 • 8
CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model Paper • 2403.05034 • Published Mar 8 • 17
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion Paper • 2403.05121 • Published Mar 8 • 16
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8 • 38
Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis Paper • 2403.04116 • Published Mar 7 • 3
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models Paper • 2403.03003 • Published Mar 5 • 8
RePLan: Robotic Replanning with Perception and Language Models Paper • 2401.04157 • Published Jan 8 • 3
MOSAIC: A Modular System for Assistive and Interactive Cooking Paper • 2402.18796 • Published Feb 29 • 22
GPTVQ: The Blessing of Dimensionality for LLM Quantization Paper • 2402.15319 • Published Feb 23 • 19
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter Paper • 2402.10896 • Published Feb 16 • 13
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively Paper • 2401.02955 • Published Jan 5 • 16
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts Paper • 2312.10763 • Published Dec 17, 2023 • 17
General Object Foundation Model for Images and Videos at Scale Paper • 2312.09158 • Published Dec 14, 2023 • 8
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation Paper • 2312.08754 • Published Dec 14, 2023 • 6
"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming Paper • 2312.06908 • Published Dec 12, 2023 • 5
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents Paper • 2312.01279 • Published Dec 3, 2023 • 3
M^{2}UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models Paper • 2311.11255 • Published Nov 19, 2023 • 3
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections Paper • 2311.10678 • Published Nov 17, 2023 • 5
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Paper • 2311.10122 • Published Nov 16, 2023 • 25
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 446
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models Paper • 2308.13137 • Published Aug 25, 2023 • 14