mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval Paper • 2407.19669 • Published Jul 29 • 19
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28 • 22
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains Paper • 2407.18961 • Published Jul 18 • 38
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29 • 37
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper • 2407.19672 • Published Jul 29 • 54
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework Paper • 2407.20729 • Published Jul 30 • 25
A Large Encoder-Decoder Family of Foundation Models For Chemical Language Paper • 2407.20267 • Published Jul 24 • 31
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings Paper • 2407.20581 • Published Jul 30 • 23
Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation Paper • 2407.20445 • Published Jul 29 • 20
SHIC: Shape-Image Correspondences with no Keypoint Supervision Paper • 2407.18907 • Published Jul 26 • 39
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 31
Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture Paper • 2407.19593 • Published Jul 28 • 12
Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification Paper • 2407.19340 • Published Jul 27 • 56
ImagiNet: A Multi-Content Dataset for Generalizable Synthetic Image Detection via Contrastive Learning Paper • 2407.20020 • Published Jul 29 • 19
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Paper • 2407.19594 • Published Jul 28 • 19
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle Paper • 2407.19548 • Published Jul 28 • 22
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28 • 60
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer Paper • 2311.12052 • Published Nov 18, 2023 • 32
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning Paper • 2311.12631 • Published Nov 21, 2023 • 13
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction Paper • 2311.12024 • Published Nov 20, 2023 • 18
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics Paper • 2311.12198 • Published Nov 20, 2023 • 22
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Paper • 2311.12454 • Published Nov 21, 2023 • 29
Adaptive Shells for Efficient Neural Radiance Field Rendering Paper • 2311.10091 • Published Nov 16, 2023 • 18
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives Paper • 2311.09227 • Published Sep 29, 2023 • 6
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs Paper • 2311.09257 • Published Nov 14, 2023 • 45
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models Paper • 2311.10093 • Published Nov 16, 2023 • 57
Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video Paper • 2311.02848 • Published Nov 6, 2023 • 3
Tailoring Self-Rationalizers with Multi-Reward Distillation Paper • 2311.02805 • Published Nov 6, 2023 • 3
Co-training and Co-distillation for Quality Improvement and Compression of Language Models Paper • 2311.02849 • Published Nov 6, 2023 • 3
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning Paper • 2311.02103 • Published Nov 1, 2023 • 16
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs Paper • 2311.02262 • Published Nov 3, 2023 • 10
Levels of AGI: Operationalizing Progress on the Path to AGI Paper • 2311.02462 • Published Nov 4, 2023 • 32
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge Paper • 2308.16458 • Published Aug 31, 2023 • 10
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following Paper • 2309.00615 • Published Sep 1, 2023 • 11
CityDreamer: Compositional Generative Model of Unbounded 3D Cities Paper • 2309.00610 • Published Sep 1, 2023 • 18
CroissantLLM: A Truly Bilingual French-English Language Model Paper • 2402.00786 • Published Feb 1 • 25
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers Paper • 2305.07011 • Published May 11, 2023 • 5
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation Paper • 2309.00398 • Published Sep 1, 2023 • 20
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior Paper • 2309.00359 • Published Sep 1, 2023 • 20
FACET: Fairness in Computer Vision Evaluation Benchmark Paper • 2309.00035 • Published Aug 31, 2023 • 16
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 65
TokenFlow: Consistent Diffusion Features for Consistent Video Editing Paper • 2307.10373 • Published Jul 19, 2023 • 57
Meta-Transformer: A Unified Framework for Multimodal Learning Paper • 2307.10802 • Published Jul 20, 2023 • 43
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Paper • 2407.01392 • Published Jul 1 • 39