Text Generation
Transformers
Inference Endpoints
File size: 5,832 Bytes
626a858
 
bc88d8b
 
d41e574
 
 
4ef62fc
d41e574
674303e
d41e574
 
 
 
 
 
 
7221067
 
 
 
 
 
 
674303e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d41e574
 
 
 
674303e
d41e574
 
674303e
d41e574
 
 
 
 
 
 
 
674303e
d41e574
 
 
674303e
d41e574
 
 
 
 
 
674303e
d41e574
 
 
674303e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d41e574
 
 
 
 
a7d7d37
d41e574
a7d7d37
7221067
a7d7d37
 
d41e574
 
 
 
7221067
d41e574
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
license: apache-2.0
datasets:
- yahma/alpaca-cleaned
---
# MixLoRA: Resource-Efficient Model with Mix-of-Experts Architecture for Enhanced LoRA Performance

<div align="left"><img src="https://huggingface.co/scu-kdde/alpaca-mixlora-7b/resolve/main/MixLoRA.png" width=60%"></div>

GitHub: https://github.com/mikecovlee/mlora

The fundamental concept of MixLoRA is based on a pre-trained model with all parameters frozen, such as LLaMA-7B. It involves training multiple LoRA expert modules on top of its fully connected layer (FFN). Simultaneously, a routing layer (Gate Linear) is trained, creating a more powerful Mixture of Experts (MoE) language model. Theoretically, this approach allows achieving performance similar to existing MoE models but with fewer resources.

In addition, MixLoRA also allows simultaneous fine-tuning of the attention layer, contributing to improved fine-tuning outcomes. In experiments, when the attention layer is fine-tuned simultaneously, the MixLoRA model composed of 8 experts exhibits a faster rate of loss reduction compared to the MixLoRA model with 9 experts.

MixLoRA exists within m-LoRA in a specific adapter form. Consequently, m-LoRA is capable of simultaneously loading, training, and fine-tuning multiple distinct MixLoRA models. However, it's essential to note that these models must be based on the same pre-trained model.

## MMLU Scores

|Model|Configuration|MMLU Average|STEM|Social Sciences|Humanities|Other|
|-----------------|---------------------------------|--------|--------|--------|--------|--------|
|Alpaca-LoRA-7B   |LoRA Rank = 16, QKVO             |  24.2  |  24.1  |**25.0**|  25.2  |  22.7  |
|Alpaca-MixLoRA-7B|LoRA Rank = 8, Top-2 of 8 Experts|**25.5**|**26.1**|  23.3  |**25.3**|**26.9**|

## Configuration of MixLoRA

Compared with LoRA, MixLoRA have some additional configurations.
```json
{
  "name": "lora_0",
  "optim": "adamw",
  "lr": 1e-5,
  "batch_size": 16,
  "micro_batch_size": 2,
  "test_batch_size": 64,
  "num_epochs": 3,
  "r": 8,
  "lora_alpha": 16,
  "lora_dropout": 0.05,
  "target_modules": {
      "q_proj": true,
      "k_proj": false,
      "v_proj": true,
      "o_proj": false,
      "w1_proj": false,
      "w2_proj": false,
      "w3_proj": false
  },
  "data": "yahma/alpaca-cleaned",
  "prompt": "template/alpaca.json",
  "group_by_length": false,
  "expand_side": "right"
}
```
This is an example of LoRA training configuration. You can find instructions at [README.md](./README.md).

MixLoRA have two routing strategies: top-k routing (like *Mixtral*) and top-1 switch routing (like *Switch Transformers*), can be configured with `"routing_strategy": "mixtral"` or `"routing_strategy": "switch"`.

**Top-k Routing**
```
{
  ...
  "routing_strategy": "mixtral",
  "num_experts": 8,
  "act_fn": "silu",
  "top_k": 2,
  ...
}
```

**Top-1 Switch Routing**
```
{
  ...
  "routing_strategy": "switch",
  "num_experts": 8,
  "act_fn": "gelu_new",
  "expert_capacity": 32,
  "jitter_noise": 0.1,
  "ffn_dropout": 0.1,
  ...
}
```
You can add these items into training configurations to enable the MixLoRA architecture.
## Create MixLoRA model

Basic command for creating a baseline model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:
```bash
CUDA_VISIBLE_DEVICES=0 python mlora.py \
  --base_model yahma/llama-7b-hf \
  --config ./config/alpaca_mixlora.json \
  --load_8bit
```
Please note that once the MixLoRA model is created, the number of experts in the model cannot be changed.

## Fine-tuning MixLoRA model

The MixLoRA model can also undergo further fine-tuning.
Basic command for finetuning a model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:
```bash
CUDA_VISIBLE_DEVICES=0 python mlora.py \
  --base_model yahma/llama-7b-hf \
  --config ./config/alpaca_mixlora.json \
  --load_8bit \
  --load_adapter
```

## Evaluate MixLoRA model

Currently, MixLoRA supports evaluation only through the m-LoRA framework.
```bash
CUDA_VISIBLE_DEVICES=0 python mlora.py \
  --base_model yahma/llama-7b-hf \
  --config ./config/alpaca_mixlora.json \
  --load_8bit \
  --inference
```
This apporach allows inference multiple MixLoRA and LoRA adapters simultaneously. We also provide a WebUI and an example for inference.
```bash
# Run WebUI of Inference
CUDA_VISIBLE_DEVICES=0 python inference.py \
  --base_model yahma/llama-7b-hf \
  --lora_weights scu-kdde/alpaca-mixlora-7b \
  --template template/alpaca.json \
  --load_8bit

# Simply Generate
CUDA_VISIBLE_DEVICES=0 python generate.py \
  --base_model yahma/llama-7b-hf \
  --lora_weights scu-kdde/alpaca-mixlora-7b \
  --template template/alpaca.json \
  --load_8bit \
  --instruction "What is m-LoRA?"
```

## Citation
Please cite the repo if you use the code in this repo.
```bibtex
@misc{alpaca-mixlora-7b,
  author = {Dengchun, Li and Tingfeng, Lan and Zhengmao, Ye and Lei, Duan and Mingjie, Tang},
  title = {MixLoRA MoE model based on AlpacaCleaned dataset and LLaMA-7B base model},
  year = {2024},
  publisher = {HuggingFace Hub},
  howpublished = {\url{https://huggingface.co/scu-kdde/alpaca-mixlora-7b}},
}
```

## Copyright
Copyright © 2023-2024 All Rights Reserved.

This project is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

```
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```