TUDB-Labs
/

alpaca-mixlora-7b

Text Generation

Transformers

Inference Endpoints

Model card Files Files and versions Community

mikecovlee commited on Apr 5

Commit

866a94c

•

1 Parent(s): eb06a94

Update README.md

Browse files

Files changed (1) hide show

README.md +22 -7

README.md CHANGED Viewed

@@ -3,21 +3,36 @@ license: apache-2.0
 datasets:
 - yahma/alpaca-cleaned
 ---
-# MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts
-<div align="left"><img src="https://huggingface.co/scu-kdde/alpaca-mixlora-7b/resolve/main/MixLoRA.png" width=60%"></div>
-GitHub: [mikecovlee/mlora](https://github.com/mikecovlee/mlora)
 Large Language Models (LLMs) have showcased exceptional performance across a wide array of Natural Language Processing (NLP) tasks. Fine-tuning techniques are commonly utilized to tailor pre-trained models to specific applications. While methods like LoRA have effectively tackled GPU memory constraints during fine-tuning, their applicability is often restricted. On the other hand, Mix-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance while maintaining a reduced parameter count. However, the resource requirements of these models pose challenges, particularly for consumer-grade GPUs.
 To address this challenge, we propose MixLoRA, an innovative approach aimed at constructing a resource-efficient sparse MoE model. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model through fine-tuning, employing a top-k routing strategy. Unlike other LoRA MoE methods, MixLoRA enhances model performance by utilizing independently configurable attention layer LoRA adapters, supporting LoRA and its variants for the construction of experts, and applying auxiliary load balance loss to address the imbalance problem of the router.
-In experiments, MixLoRA achieves commendable performance across all evaluation metrics in both single-task and multi-task learning scenarios. Implemented within the m-LoRA framework, MixLoRA enables parallel fine-tuning, inference, and evaluation of multiple mixture-of-experts models on a single 24GB consumer-grade GPU without quantization, thereby reducing GPU memory consumption by 41\% and latency during the training process by 17\%.
 ## Citation
-Please cite the repo if you use the code in this repo.
 ```bibtex
 @misc{alpaca-mixlora-7b,
   author = {Dengchun, Li and Yingzi, Ma and Naizheng, Wang and Lei, Duan and Jie, Zuo and Mingjie, Tang},
   title = {MixLoRA LoRA MoE adapter based on AlpacaCleaned dataset and LLaMA-2-7B base model},
@@ -30,7 +45,7 @@ Please cite the repo if you use the code in this repo.
 ## Copyright
 Copyright © 2023-2024 All Rights Reserved.
-This project is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
 ```
 Licensed under the Apache License, Version 2.0 (the "License");

 datasets:
 - yahma/alpaca-cleaned
 ---
+# MixLoRA: Resource-Efficient Model with Mix-of-Experts Architecture for Enhanced LoRA Performance
+<div align="left"><img src="./assets/MixLoRA.png" width=60%"></div>
 Large Language Models (LLMs) have showcased exceptional performance across a wide array of Natural Language Processing (NLP) tasks. Fine-tuning techniques are commonly utilized to tailor pre-trained models to specific applications. While methods like LoRA have effectively tackled GPU memory constraints during fine-tuning, their applicability is often restricted. On the other hand, Mix-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance while maintaining a reduced parameter count. However, the resource requirements of these models pose challenges, particularly for consumer-grade GPUs.
 To address this challenge, we propose MixLoRA, an innovative approach aimed at constructing a resource-efficient sparse MoE model. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model through fine-tuning, employing a top-k routing strategy. Unlike other LoRA MoE methods, MixLoRA enhances model performance by utilizing independently configurable attention layer LoRA adapters, supporting LoRA and its variants for the construction of experts, and applying auxiliary load balance loss to address the imbalance problem of the router.
+In experiments, MixLoRA achieves commendable performance across all evaluation metrics in both single-task and multi-task learning scenarios. Implemented within the m-LoRA framework, MixLoRA enables parallel fine-tuning, inference, and evaluation of multiple mixture-of-experts models on a single 24GB consumer-grade GPU without quantization, thereby reducing GPU memory consumption by 41% and latency during the training process by 17%.
+| PEFT Method | # Params (%) | ARC-e | ARC-c | BoolQ | OBQA | PIQA | AVG. |
+|-------------|--------------|-------|-------|-------|------|------|------|
+| LoRA        | 2.6%         | 73.8  | 50.9  | 62.2  | 80.4 | 69.9 | 67.4 |
+| DoRA        | 2.6%         | 76.5  | 59.8  | 71.7  | 80.6 | 78.8 | 73.5 |
+| **MixLoRA** | 2.6%         | 76.5  | 58.1  | 73.8  | 84.4 | 82.6 | 75.1 |
+| **MixDoRA** | 2.6%         | 78.3  | 59.6  | 74.2  | 84.4 | 83.6 | 76.0 |
+The table above presents the performance of MixLoRA and compares these results with outcomes obtained by employing LoRA and DoRA for fine-tuning. The results demonstrate that the language model with MixLoRA achieves commendable performance across all evaluation methods. All methods are fine-tuned and evaluated with [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on m-LoRA, with all metrics reported as accuracy.
 ## Citation
+If MixLoRA has been useful for your work, please consider citing it using the appropriate citation format for your publication.
 ```bibtex
+@misc{MixLoRA,
+  author = {Dengchun, Li and Yingzi, Ma and Naizheng, Wang and Lei, Duan and Jie, Zuo and Mingjie, Tang},
+  title = {MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts},
+  year = {2024},
+  publisher = {GitHub},
+  howpublished = {\url{https://github.com/mikecovlee/mlora}},
+}
 @misc{alpaca-mixlora-7b,
   author = {Dengchun, Li and Yingzi, Ma and Naizheng, Wang and Lei, Duan and Jie, Zuo and Mingjie, Tang},
   title = {MixLoRA LoRA MoE adapter based on AlpacaCleaned dataset and LLaMA-2-7B base model},
 ## Copyright
 Copyright © 2023-2024 All Rights Reserved.
+m-LoRA and the weights of alpaca-mixlora-7b are licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
 ```
 Licensed under the Apache License, Version 2.0 (the "License");