Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch Paper • 2311.03099 • Published Nov 6, 2023 • 28
ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding Paper • 2305.08275 • Published May 14, 2023 • 2
Small Models are Valuable Plug-ins for Large Language Models Paper • 2305.08848 • Published May 15, 2023 • 3
Symbol tuning improves in-context learning in language models Paper • 2305.08298 • Published May 15, 2023 • 3
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation Paper • 2305.09662 • Published May 16, 2023 • 3
CodeT5+: Open Code Large Language Models for Code Understanding and Generation Paper • 2305.07922 • Published May 13, 2023 • 4
Towards Expert-Level Medical Question Answering with Large Language Models Paper • 2305.09617 • Published May 16, 2023 • 5
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? Paper • 2305.07759 • Published May 12, 2023 • 30
Universal Source Separation with Weakly Labelled Data Paper • 2305.07447 • Published May 11, 2023 • 2
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation Paper • 2403.08857 • Published Mar 13 • 3
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper • 2305.07185 • Published May 12, 2023 • 9
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention Paper • 2305.07027 • Published May 11, 2023 • 3
Exploiting Diffusion Prior for Real-World Image Super-Resolution Paper • 2305.07015 • Published May 11, 2023 • 4
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Paper • 2305.06500 • Published May 11, 2023 • 4
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers Paper • 2305.07011 • Published May 11, 2023 • 4
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model Paper • 2305.06908 • Published May 11, 2023 • 5
Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era Paper • 2305.06131 • Published May 10, 2023 • 2
AudioSlots: A slot-centric generative model for audio separation Paper • 2305.05591 • Published May 9, 2023 • 3
To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review Paper • 2304.09355 • Published Apr 19, 2023 • 4
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models Paper • 2305.05189 • Published May 9, 2023 • 2
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance Paper • 2305.05176 • Published May 9, 2023 • 4
COLA: How to adapt vision-language models to Compose Objects Localized with Attributes? Paper • 2305.03689 • Published May 5, 2023 • 2
Otter: A Multi-Modal Model with In-Context Instruction Tuning Paper • 2305.03726 • Published May 5, 2023 • 6
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs Paper • 2305.03111 • Published May 4, 2023 • 7
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning Paper • 2406.19741 • Published 5 days ago • 46
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning Paper • 2407.00782 • Published 2 days ago • 19
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published 4 days ago • 33
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix Paper • 2407.00367 • Published 4 days ago • 6
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI Paper • 2407.00106 • Published 6 days ago • 4
E3 TTS: Easy End-to-End Diffusion-based Text to Speech Paper • 2311.00945 • Published Nov 2, 2023 • 12
Towards Robust Speech Representation Learning for Thousands of Languages Paper • 2407.00837 • Published 2 days ago • 5
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation Paper • 2407.00788 • Published 2 days ago • 13
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Paper • 2406.18009 • Published 7 days ago • 13
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published 6 days ago • 20
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models Paper • 2407.01519 • Published 1 day ago • 18
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published 1 day ago • 22
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published 1 day ago • 60
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models Paper • 2312.12487 • Published Dec 19, 2023 • 8
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models Paper • 2312.16693 • Published Dec 27, 2023 • 13
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices Paper • 2312.16886 • Published Dec 28, 2023 • 19
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action Paper • 2312.17172 • Published Dec 28, 2023 • 26
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 Paper • 2312.16171 • Published Dec 26, 2023 • 32
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 56
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published 6 days ago • 49
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published 4 days ago • 69
Direct Preference Knowledge Distillation for Large Language Models Paper • 2406.19774 • Published 5 days ago • 15
GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality Paper • 2406.18462 • Published 7 days ago • 8