Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise Paper • 2410.03017 • Published 15 days ago • 24
Prithvi WxC: Foundation Model for Weather and Climate Paper • 2409.13598 • Published 28 days ago • 35
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Paper • 2409.08239 • Published Sep 12 • 15
Platypus: A Generalized Specialist Model for Reading Text in Various Forms Paper • 2408.14805 • Published Aug 27 • 12
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine Paper • 2408.02900 • Published Aug 6 • 25
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 154
CoverBench: A Challenging Benchmark for Complex Claim Verification Paper • 2408.03325 • Published Aug 6 • 14
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names Paper • 2408.00298 • Published Aug 1 • 9
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language Paper • 2406.05629 • Published Jun 9 • 7
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation Paper • 2406.08392 • Published Jun 12 • 18
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published Jun 14 • 76
DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents Paper • 2406.13144 • Published Jun 19 • 11
MotionBooth: Motion-Aware Customized Text-to-Video Generation Paper • 2406.17758 • Published Jun 25 • 18
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation Paper • 2406.18522 • Published Jun 26 • 40
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 94
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs Paper • 2407.00653 • Published Jun 30 • 11
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Paper • 2407.02371 • Published Jul 2 • 49
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published Jul 3 • 18
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published Jul 8 • 34
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions Paper • 2407.06358 • Published Jul 8 • 17
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Paper • 2406.05370 • Published Jun 8 • 14
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation Paper • 2404.12753 • Published Apr 19 • 41
Navarasa 2.0 Models Collection Collection of models Navarasa 2.0 Models finetuned with Gemma on 15 Indian languages • 5 items • Updated Mar 18 • 13
RakutenAI-7B: Extending Large Language Models for Japanese Paper • 2403.15484 • Published Mar 21 • 12
Common Corpus Collection The largest public domain dataset for training LLMs. • 27 items • Updated Jul 17 • 112
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Paper • 2403.03100 • Published Mar 5 • 34
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect Paper • 2403.03853 • Published Mar 6 • 63
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss Paper • 2402.10790 • Published Feb 16 • 40
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction Paper • 2402.12712 • Published Feb 20 • 14
Lumos : Empowering Multimodal LLMs with Scene Text Recognition Paper • 2402.08017 • Published Feb 12 • 24
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Paper • 2402.08093 • Published Feb 12 • 54
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting Paper • 2402.07207 • Published Feb 11 • 7
AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts Paper • 2402.07625 • Published Feb 12 • 11
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs Paper • 2402.07872 • Published Feb 12 • 15
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like Paper • 2402.07383 • Published Feb 12 • 13
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Paper • 2402.07456 • Published Feb 12 • 41
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss Paper • 2402.05008 • Published Feb 7 • 19
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation Paper • 2402.05054 • Published Feb 7 • 25
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains Paper • 2402.05140 • Published Feb 6 • 20
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation Paper • 2402.04324 • Published Feb 6 • 23
ScreenAI: A Vision-Language Model for UI and Infographics Understanding Paper • 2402.04615 • Published Feb 7 • 36
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback Paper • 2402.01391 • Published Feb 2 • 41
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 67
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion Paper • 2402.03162 • Published Feb 5 • 17
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 20
TravelPlanner: A Benchmark for Real-World Planning with Language Agents Paper • 2402.01622 • Published Feb 2 • 33
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Paper • 2401.16420 • Published Jan 29 • 54
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis Paper • 2401.17093 • Published Jan 30 • 18