mikecovlee commited on
Commit
eb06a94
1 Parent(s): 0455091

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -125
README.md CHANGED
@@ -3,142 +3,24 @@ license: apache-2.0
3
  datasets:
4
  - yahma/alpaca-cleaned
5
  ---
6
- # MixLoRA: Resource-Efficient Model with Mix-of-Experts Architecture for Enhanced LoRA Performance
7
 
8
  <div align="left"><img src="https://huggingface.co/scu-kdde/alpaca-mixlora-7b/resolve/main/MixLoRA.png" width=60%"></div>
9
 
10
- GitHub: https://github.com/mikecovlee/mlora
11
 
12
- The fundamental concept of MixLoRA is based on a pre-trained model with all parameters frozen, such as LLaMA-7B. It involves training multiple LoRA expert modules on top of its fully connected layer (FFN). Simultaneously, a routing layer (Gate Linear) is trained, creating a more powerful Mixture of Experts (MoE) language model. Theoretically, this approach allows achieving performance similar to existing MoE models but with fewer resources.
13
 
14
- In addition, MixLoRA also allows simultaneous fine-tuning of the attention layer, contributing to improved fine-tuning outcomes. In experiments, when the attention layer is fine-tuned simultaneously, the MixLoRA model composed of 8 experts exhibits a faster rate of loss reduction compared to the MixLoRA model with 9 experts.
15
 
16
- MixLoRA exists within m-LoRA in a specific adapter form. Consequently, m-LoRA is capable of simultaneously loading, training, and fine-tuning multiple distinct MixLoRA models. However, it's essential to note that these models must be based on the same pre-trained model.
17
-
18
- ## MMLU Scores
19
-
20
- |Model|Configuration|MMLU Average|STEM|Social Sciences|Humanities|Other|
21
- |-----------------|---------------------------------|--------|--------|--------|--------|--------|
22
- |Alpaca-LoRA-7B |LoRA Rank = 16, QKVO | 24.2 | 24.1 |**25.0**| 25.2 | 22.7 |
23
- |Alpaca-MixLoRA-7B|LoRA Rank = 8, Top-2 of 8 Experts|**25.5**|**26.1**| 23.3 |**25.3**|**26.9**|
24
-
25
- ## Configuration of MixLoRA
26
-
27
- Compared with LoRA, MixLoRA have some additional configurations.
28
- ```json
29
- {
30
- "name": "lora_0",
31
- "optim": "adamw",
32
- "lr": 1e-5,
33
- "batch_size": 16,
34
- "micro_batch_size": 2,
35
- "test_batch_size": 64,
36
- "num_epochs": 3,
37
- "r": 8,
38
- "lora_alpha": 16,
39
- "lora_dropout": 0.05,
40
- "target_modules": {
41
- "q_proj": true,
42
- "k_proj": false,
43
- "v_proj": true,
44
- "o_proj": false,
45
- "w1_proj": false,
46
- "w2_proj": false,
47
- "w3_proj": false
48
- },
49
- "data": "yahma/alpaca-cleaned",
50
- "prompt": "template/alpaca.json",
51
- "group_by_length": false,
52
- "expand_side": "right"
53
- }
54
- ```
55
- This is an example of LoRA training configuration. You can find instructions at [README.md](./README.md).
56
-
57
- MixLoRA have two routing strategies: top-k routing (like *Mixtral*) and top-1 switch routing (like *Switch Transformers*), can be configured with `"routing_strategy": "mixtral"` or `"routing_strategy": "switch"`.
58
-
59
- **Top-k Routing**
60
- ```
61
- {
62
- ...
63
- "routing_strategy": "mixtral",
64
- "num_experts": 8,
65
- "act_fn": "silu",
66
- "top_k": 2,
67
- ...
68
- }
69
- ```
70
-
71
- **Top-1 Switch Routing**
72
- ```
73
- {
74
- ...
75
- "routing_strategy": "switch",
76
- "num_experts": 8,
77
- "act_fn": "gelu_new",
78
- "expert_capacity": 32,
79
- "jitter_noise": 0.1,
80
- "ffn_dropout": 0.1,
81
- ...
82
- }
83
- ```
84
- You can add these items into training configurations to enable the MixLoRA architecture.
85
- ## Create MixLoRA model
86
-
87
- Basic command for creating a baseline model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:
88
- ```bash
89
- CUDA_VISIBLE_DEVICES=0 python mlora.py \
90
- --base_model yahma/llama-7b-hf \
91
- --config ./config/alpaca_mixlora.json \
92
- --load_8bit
93
- ```
94
- Please note that once the MixLoRA model is created, the number of experts in the model cannot be changed.
95
-
96
- ## Fine-tuning MixLoRA model
97
-
98
- The MixLoRA model can also undergo further fine-tuning.
99
- Basic command for finetuning a model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:
100
- ```bash
101
- CUDA_VISIBLE_DEVICES=0 python mlora.py \
102
- --base_model yahma/llama-7b-hf \
103
- --config ./config/alpaca_mixlora.json \
104
- --load_8bit \
105
- --load_adapter
106
- ```
107
-
108
- ## Evaluate MixLoRA model
109
-
110
- Currently, MixLoRA supports evaluation only through the m-LoRA framework.
111
- ```bash
112
- CUDA_VISIBLE_DEVICES=0 python mlora.py \
113
- --base_model yahma/llama-7b-hf \
114
- --config ./config/alpaca_mixlora.json \
115
- --load_8bit \
116
- --inference
117
- ```
118
- This apporach allows inference multiple MixLoRA and LoRA adapters simultaneously. We also provide a WebUI and an example for inference.
119
- ```bash
120
- # Run WebUI of Inference
121
- CUDA_VISIBLE_DEVICES=0 python inference.py \
122
- --base_model yahma/llama-7b-hf \
123
- --lora_weights scu-kdde/alpaca-mixlora-7b \
124
- --template template/alpaca.json \
125
- --load_8bit
126
-
127
- # Simply Generate
128
- CUDA_VISIBLE_DEVICES=0 python generate.py \
129
- --base_model yahma/llama-7b-hf \
130
- --lora_weights scu-kdde/alpaca-mixlora-7b \
131
- --template template/alpaca.json \
132
- --load_8bit \
133
- --instruction "What is m-LoRA?"
134
- ```
135
 
136
  ## Citation
137
  Please cite the repo if you use the code in this repo.
138
  ```bibtex
139
  @misc{alpaca-mixlora-7b,
140
- author = {Dengchun, Li and Tingfeng, Lan and Zhengmao, Ye and Lei, Duan and Mingjie, Tang},
141
- title = {MixLoRA MoE model based on AlpacaCleaned dataset and LLaMA-7B base model},
142
  year = {2024},
143
  publisher = {HuggingFace Hub},
144
  howpublished = {\url{https://huggingface.co/scu-kdde/alpaca-mixlora-7b}},
 
3
  datasets:
4
  - yahma/alpaca-cleaned
5
  ---
6
+ # MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts
7
 
8
  <div align="left"><img src="https://huggingface.co/scu-kdde/alpaca-mixlora-7b/resolve/main/MixLoRA.png" width=60%"></div>
9
 
10
+ GitHub: [mikecovlee/mlora](https://github.com/mikecovlee/mlora)
11
 
12
+ Large Language Models (LLMs) have showcased exceptional performance across a wide array of Natural Language Processing (NLP) tasks. Fine-tuning techniques are commonly utilized to tailor pre-trained models to specific applications. While methods like LoRA have effectively tackled GPU memory constraints during fine-tuning, their applicability is often restricted. On the other hand, Mix-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance while maintaining a reduced parameter count. However, the resource requirements of these models pose challenges, particularly for consumer-grade GPUs.
13
 
14
+ To address this challenge, we propose MixLoRA, an innovative approach aimed at constructing a resource-efficient sparse MoE model. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model through fine-tuning, employing a top-k routing strategy. Unlike other LoRA MoE methods, MixLoRA enhances model performance by utilizing independently configurable attention layer LoRA adapters, supporting LoRA and its variants for the construction of experts, and applying auxiliary load balance loss to address the imbalance problem of the router.
15
 
16
+ In experiments, MixLoRA achieves commendable performance across all evaluation metrics in both single-task and multi-task learning scenarios. Implemented within the m-LoRA framework, MixLoRA enables parallel fine-tuning, inference, and evaluation of multiple mixture-of-experts models on a single 24GB consumer-grade GPU without quantization, thereby reducing GPU memory consumption by 41\% and latency during the training process by 17\%.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Citation
19
  Please cite the repo if you use the code in this repo.
20
  ```bibtex
21
  @misc{alpaca-mixlora-7b,
22
+ author = {Dengchun, Li and Yingzi, Ma and Naizheng, Wang and Lei, Duan and Jie, Zuo and Mingjie, Tang},
23
+ title = {MixLoRA LoRA MoE adapter based on AlpacaCleaned dataset and LLaMA-2-7B base model},
24
  year = {2024},
25
  publisher = {HuggingFace Hub},
26
  howpublished = {\url{https://huggingface.co/scu-kdde/alpaca-mixlora-7b}},