-
YAYI 2: Multilingual Open-Source Large Language Models
Paper • 2312.14862 • Published • 12 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 55 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 62 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 36
Collections
Discover the best community collections!
Collections including paper arxiv:2404.07965
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 94 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 74 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 255
-
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 34 -
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 61 -
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Paper • 2403.18795 • Published • 17 -
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Paper • 2404.04478 • Published • 11
-
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Paper • 2210.14986 • Published • 4 -
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Paper • 2311.10702 • Published • 17 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 72 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 29
-
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper • 2312.00777 • Published • 19 -
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Paper • 2312.03641 • Published • 19 -
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Paper • 2312.04557 • Published • 12 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 9
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 43 -
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Paper • 2310.04378 • Published • 19 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 43 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 117
-
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 2 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 16 -
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper • 2311.02103 • Published • 15 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 12
-
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Paper • 2311.00059 • Published • 17 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 37 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 56
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 77 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 51 -
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 37 -
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 83
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 21 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 16 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 8 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 6