The FinBen: An Holistic Financial Benchmark for Large Language Models Paper • 2402.12659 • Published Feb 20 • 13
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization Paper • 2402.13249 • Published Feb 20 • 10
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks Paper • 2311.07463 • Published Nov 13, 2023 • 13
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper • 2305.07185 • Published May 12, 2023 • 8
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 43
PEFTDebias : Capturing debiasing information using PEFTs Paper • 2312.00434 • Published Dec 1, 2023 • 1
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers Paper • 2402.01911 • Published Feb 2 • 2
Empirical Study of PEFT techniques for Winter Wheat Segmentation Paper • 2310.01825 • Published Oct 3, 2023 • 2
LoRA: Low-Rank Adaptation of Large Language Models Paper • 2106.09685 • Published Jun 17, 2021 • 24
L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ Paper • 2402.04902 • Published Feb 7 • 1
Self-Instruct: Aligning Language Model with Self Generated Instructions Paper • 2212.10560 • Published Dec 20, 2022 • 5
Efficient Training of Language Models to Fill in the Middle Paper • 2207.14255 • Published Jul 28, 2022 • 1
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method Paper • 2402.17193 • Published Feb 27 • 23
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 560
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model Paper • 2402.17412 • Published Feb 27 • 21
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT Paper • 2402.16840 • Published Feb 26 • 23
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models Paper • 2402.10524 • Published Feb 16 • 19
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models Paper • 2402.10986 • Published Feb 16 • 73
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 44
Training-Free Long-Context Scaling of Large Language Models Paper • 2402.17463 • Published Feb 27 • 16
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 92
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs Paper • 2403.02775 • Published Mar 5 • 11
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 32
OneBit: Towards Extremely Low-bit Large Language Models Paper • 2402.11295 • Published Feb 17 • 21
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11 • 52
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11 • 24
V3D: Video Diffusion Models are Effective 3D Generators Paper • 2403.06738 • Published Mar 11 • 27
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 49
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 69
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 24
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 119
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 54
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 53
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 50
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model Paper • 2206.14371 • Published Jun 29, 2022 • 3
Model Stock: All we need is just a few fine-tuned models Paper • 2403.19522 • Published Mar 28 • 9
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published 30 days ago • 98
Long-context LLMs Struggle with Long In-context Learning Paper • 2404.02060 • Published about 1 month ago • 31
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding Paper • 2403.04797 • Published Mar 5 • 1
ReFT: Representation Finetuning for Language Models Paper • 2404.03592 • Published 28 days ago • 66
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 37
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30 • 39
Latxa: An Open Language Model and Evaluation Suite for Basque Paper • 2403.20266 • Published Mar 29 • 3
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published 24 days ago • 61
Learn Your Reference Model for Real Good Alignment Paper • 2404.09656 • Published 17 days ago • 79
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) Paper • 2404.00579 • Published Mar 31 • 1
RoFormer: Enhanced Transformer with Rotary Position Embedding Paper • 2104.09864 • Published Apr 20, 2021 • 7
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published 28 days ago • 57
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications Paper • 2404.13506 • Published 12 days ago • 1