mikecovlee
commited on
Commit
•
674303e
1
Parent(s):
f13fd28
Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ datasets:
|
|
7 |
|
8 |
<div align="left"><img src="https://huggingface.co/scu-kdde/alpaca-mixlora-7b/resolve/main/MixLoRA.png" width=60%"></div>
|
9 |
|
10 |
-
GitHub: https://github.com/
|
11 |
|
12 |
The fundamental concept of MixLoRA is based on a pre-trained model with all parameters frozen, such as LLaMA-7B. It involves training multiple LoRA expert modules on top of its fully connected layer (FFN). Simultaneously, a routing layer (Gate Linear) is trained, creating a more powerful Mixture of Experts (MoE) language model. Theoretically, this approach allows achieving performance similar to existing MoE models but with fewer resources.
|
13 |
|
@@ -15,15 +15,74 @@ In addition, MixLoRA also allows simultaneous fine-tuning of the attention layer
|
|
15 |
|
16 |
MixLoRA exists within m-LoRA in a specific adapter form. Consequently, m-LoRA is capable of simultaneously loading, training, and fine-tuning multiple distinct MixLoRA models. However, it's essential to note that these models must be based on the same pre-trained model.
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
## Create MixLoRA model
|
19 |
|
20 |
Basic command for creating a baseline model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:
|
21 |
```bash
|
22 |
-
python mlora.py \
|
23 |
--base_model yahma/llama-7b-hf \
|
24 |
--config ./config/alpaca_mixlora.json \
|
25 |
-
--load_8bit
|
26 |
-
--mixlora
|
27 |
```
|
28 |
Please note that once the MixLoRA model is created, the number of experts in the model cannot be changed.
|
29 |
|
@@ -32,25 +91,39 @@ Please note that once the MixLoRA model is created, the number of experts in the
|
|
32 |
The MixLoRA model can also undergo further fine-tuning.
|
33 |
Basic command for finetuning a model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:
|
34 |
```bash
|
35 |
-
python mlora.py \
|
36 |
--base_model yahma/llama-7b-hf \
|
37 |
--config ./config/alpaca_mixlora.json \
|
38 |
-
--load_adapter \
|
39 |
--load_8bit \
|
40 |
-
--
|
41 |
```
|
42 |
|
43 |
## Evaluate MixLoRA model
|
44 |
|
45 |
Currently, MixLoRA supports evaluation only through the m-LoRA framework.
|
46 |
```bash
|
47 |
-
python mlora.py \
|
48 |
--base_model yahma/llama-7b-hf \
|
49 |
--config ./config/alpaca_mixlora.json \
|
50 |
-
--load_adapter \
|
51 |
--load_8bit \
|
52 |
-
--inference
|
53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
```
|
55 |
|
56 |
## Citation
|
|
|
7 |
|
8 |
<div align="left"><img src="https://huggingface.co/scu-kdde/alpaca-mixlora-7b/resolve/main/MixLoRA.png" width=60%"></div>
|
9 |
|
10 |
+
GitHub: https://github.com/mikecovlee/mlora
|
11 |
|
12 |
The fundamental concept of MixLoRA is based on a pre-trained model with all parameters frozen, such as LLaMA-7B. It involves training multiple LoRA expert modules on top of its fully connected layer (FFN). Simultaneously, a routing layer (Gate Linear) is trained, creating a more powerful Mixture of Experts (MoE) language model. Theoretically, this approach allows achieving performance similar to existing MoE models but with fewer resources.
|
13 |
|
|
|
15 |
|
16 |
MixLoRA exists within m-LoRA in a specific adapter form. Consequently, m-LoRA is capable of simultaneously loading, training, and fine-tuning multiple distinct MixLoRA models. However, it's essential to note that these models must be based on the same pre-trained model.
|
17 |
|
18 |
+
## Configuration of MixLoRA
|
19 |
+
|
20 |
+
Compared with LoRA, MixLoRA have some additional configurations.
|
21 |
+
```json
|
22 |
+
{
|
23 |
+
"name": "lora_0",
|
24 |
+
"optim": "adamw",
|
25 |
+
"lr": 1e-5,
|
26 |
+
"batch_size": 16,
|
27 |
+
"micro_batch_size": 2,
|
28 |
+
"test_batch_size": 64,
|
29 |
+
"num_epochs": 3,
|
30 |
+
"r": 8,
|
31 |
+
"lora_alpha": 16,
|
32 |
+
"lora_dropout": 0.05,
|
33 |
+
"target_modules": {
|
34 |
+
"q_proj": true,
|
35 |
+
"k_proj": false,
|
36 |
+
"v_proj": true,
|
37 |
+
"o_proj": false,
|
38 |
+
"w1_proj": false,
|
39 |
+
"w2_proj": false,
|
40 |
+
"w3_proj": false
|
41 |
+
},
|
42 |
+
"data": "yahma/alpaca-cleaned",
|
43 |
+
"prompt": "template/alpaca.json",
|
44 |
+
"group_by_length": false,
|
45 |
+
"expand_side": "right"
|
46 |
+
}
|
47 |
+
```
|
48 |
+
This is an example of LoRA training configuration. You can find instructions at [README.md](./README.md).
|
49 |
+
|
50 |
+
MixLoRA have two routing strategies: top-k routing (like *Mixtral*) and top-1 switch routing (like *Switch Transformers*), can be configured with `"routing_strategy": "mixtral"` or `"routing_strategy": "switch"`.
|
51 |
+
|
52 |
+
**Top-k Routing**
|
53 |
+
```
|
54 |
+
{
|
55 |
+
...
|
56 |
+
"routing_strategy": "mixtral",
|
57 |
+
"num_experts": 8,
|
58 |
+
"act_fn": "silu",
|
59 |
+
"top_k": 2,
|
60 |
+
...
|
61 |
+
}
|
62 |
+
```
|
63 |
+
|
64 |
+
**Top-1 Switch Routing**
|
65 |
+
```
|
66 |
+
{
|
67 |
+
...
|
68 |
+
"routing_strategy": "switch",
|
69 |
+
"num_experts": 8,
|
70 |
+
"act_fn": "gelu_new",
|
71 |
+
"expert_capacity": 32,
|
72 |
+
"jitter_noise": 0.1,
|
73 |
+
"ffn_dropout": 0.1,
|
74 |
+
...
|
75 |
+
}
|
76 |
+
```
|
77 |
+
You can add these items into training configurations to enable the MixLoRA architecture.
|
78 |
## Create MixLoRA model
|
79 |
|
80 |
Basic command for creating a baseline model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:
|
81 |
```bash
|
82 |
+
CUDA_VISIBLE_DEVICES=0 python mlora.py \
|
83 |
--base_model yahma/llama-7b-hf \
|
84 |
--config ./config/alpaca_mixlora.json \
|
85 |
+
--load_8bit
|
|
|
86 |
```
|
87 |
Please note that once the MixLoRA model is created, the number of experts in the model cannot be changed.
|
88 |
|
|
|
91 |
The MixLoRA model can also undergo further fine-tuning.
|
92 |
Basic command for finetuning a model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:
|
93 |
```bash
|
94 |
+
CUDA_VISIBLE_DEVICES=0 python mlora.py \
|
95 |
--base_model yahma/llama-7b-hf \
|
96 |
--config ./config/alpaca_mixlora.json \
|
|
|
97 |
--load_8bit \
|
98 |
+
--load_adapter
|
99 |
```
|
100 |
|
101 |
## Evaluate MixLoRA model
|
102 |
|
103 |
Currently, MixLoRA supports evaluation only through the m-LoRA framework.
|
104 |
```bash
|
105 |
+
CUDA_VISIBLE_DEVICES=0 python mlora.py \
|
106 |
--base_model yahma/llama-7b-hf \
|
107 |
--config ./config/alpaca_mixlora.json \
|
|
|
108 |
--load_8bit \
|
109 |
+
--inference
|
110 |
+
```
|
111 |
+
This apporach allows inference multiple MixLoRA and LoRA adapters simultaneously. We also provide a WebUI and an example for inference.
|
112 |
+
```bash
|
113 |
+
# Run WebUI of Inference
|
114 |
+
CUDA_VISIBLE_DEVICES=0 python inference.py \
|
115 |
+
--base_model yahma/llama-7b-hf \
|
116 |
+
--lora_weights scu-kdde/alpaca-mixlora-7b \
|
117 |
+
--template template/alpaca.json \
|
118 |
+
--load_8bit
|
119 |
+
|
120 |
+
# Simply Generate
|
121 |
+
CUDA_VISIBLE_DEVICES=0 python generate.py \
|
122 |
+
--base_model yahma/llama-7b-hf \
|
123 |
+
--lora_weights scu-kdde/alpaca-mixlora-7b \
|
124 |
+
--template template/alpaca.json \
|
125 |
+
--load_8bit \
|
126 |
+
--instruction "What is m-LoRA?"
|
127 |
```
|
128 |
|
129 |
## Citation
|