File size: 7,030 Bytes
28c51d1 d0da105 28c51d1 400e922 d0da105 86236c3 919df1f a0a3fe8 919df1f 59f42ba 16f5468 919df1f 86236c3 d0da105 21eb7de ded826f d0da105 21eb7de 569d3a4 ded826f d0da105 21eb7de d0da105 ded826f d0da105 ded826f d0da105 569d3a4 d0da105 569d3a4 d0da105 71cd523 1391929 71cd523 16f5468 400e922 50cb80f 400e922 d0da105 2759b9a 400e922 d0da105 4445403 d0da105 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
---
license: apache-2.0
---
# X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models
X-LoRA works by learning scaling values for LoRA adapters. These learned scalings values are used to
gate the LoRA experts in a dense fashion. Additionally, all LoRA adapters and the base model are frozen, allowing efficient fine tuning due to a low parameter count.
X-LoRA is easily applied to any HuggingFace Transformers model.
## Features
- Effective: Dense gating of experts allows effective mixing
- Efficient fine-tuning: low trainable parameter count
- Hierarchical encapsulated strategy: Re-use existing trained models or model section and re-use them to address complex tasks that cut across experts, following a bio-inspired strategy
- Easy-to-use API: `add_xlora_to_model`, broad compatibility
- Dynamically mix LoRA adapters: Deep layer-wise combinations of adapters.
## X-LoRA source code
Install directly from source
```
pip install git+https://github.com/EricLBuehler/xlora.git -U
```
![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/JVzaFIISQ780X92VqaHKD.png)
Further details on installation, packages with source code, API details and more examples:
[https://github.com/EricLBuehler/xlora](https://github.com/EricLBuehler/xlora)
## Converting and loading a model
Example for model conversation:
```python
import torch
import xlora
from transformers import AutoConfig, AutoModelForCausalLM # type: ignore
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
trust_remote_code=True,
use_flash_attention_2=False,
device_map="cuda:0",
torch_dtype=torch.bfloat16,
)
config = AutoConfig.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
trust_remote_code=True,
use_flash_attention_2=False,
device_map="auto",
)
### Convert the model to X-LoRA
model_created = xlora.add_xlora_to_model(
model=model,
xlora_config=xlora.xLoRAConfig(config.hidden_size, xlora_depth=8, device=torch.device("cuda")),
verbose=True,
adapters={
"adapter_1": "./path/to/the/checkpoint_adapter_1/",
"adapter_2": "./path/to/the/checkpoint_adapter_2/",
"adapter_n": "./path/to/the/checkpoint_adapter_3/",
},
)
```
## Loading a trained X-LoRA model from scratch
```python
import torch
import xlora
from transformers import AutoConfig, AutoModelForCausalLM # type: ignore
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
trust_remote_code=True,
use_flash_attention_2=False,
device_map="cuda:0",
torch_dtype=torch.bfloat16,
)
config = AutoConfig.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
trust_remote_code=True,
use_flash_attention_2=False,
device_map="auto",
)
model = xlora.from_pretrained(
"./path/to/saved/model",
model,
{
"adapter_1": "./path/to/the/checkpoint/",
"adapter_2": "./path/to/the/checkpoint/",
"adapter_n": "./path/to/the/checkpoint/",
},
"cuda",
)
```
## Loading pre-trained X-LoRA model
```python
import torch
from xlora.xlora_utils import load_model # type: ignore
XLoRA_model_name = "lamm-mit/x-lora/X-LoRA"
model, tokenizer = load_model(
model_name="HuggingFaceH4/zephyr-7b-beta",
device="cuda:0",
dtype=torch.bfloat16,
fine_tune_model_name=XLoRA_model_name,
adapters={
"adapter_1": "lamm-mit/x-lora/X-LoRA_adapters/1/",
"adapter_2": "lamm-mit/x-lora/X-LoRA_adapters/2/",
"adapter_3": "lamm-mit/x-lora/X-LoRA_adapters/3/",
"adapter_4": "lamm-mit/x-lora/X-LoRA_adapters/4/",
"adapter_5": "lamm-mit/x-lora/X-LoRA_adapters/5/",
"adapter_6": "lamm-mit/x-lora/X-LoRA_adapters/6/",
"adapter_7": "lamm-mit/x-lora/X-LoRA_adapters/7/",
"adapter_8": "lamm-mit/x-lora/X-LoRA_adapters/8/",
"adapter_9": "lamm-mit/x-lora/X-LoRA_adapters/9/",
},
)
```
Inference:
```python
def generate_response (model, tokenizer,
text_input="What is the best biomaterial for superior strength?",
num_return_sequences = 1,
temperature = 0.75,
max_new_tokens = 127,
num_beams = 1,
top_k = 50,
top_p = 0.9,
repetition_penalty=1.,
eos_token_id=2,
add_special_tokens=True,
):
inputs = tokenizer(text_input, add_special_tokens=add_special_tokens)
with torch.no_grad():
outputs = model.generate(input_ids = inputs["input_ids"],
attention_mask = inputs["attention_mask"] ,
max_new_tokens=max_new_tokens,
temperature=temperature,
num_beams=num_beams,
top_k = top_k,
top_p = top_p,
num_return_sequences = num_return_sequences,
eos_token_id=eos_token_id,
pad_token_id = eos_token_id,
do_sample =True,
repetition_penalty=repetition_penalty,
)
return tokenizer.batch_decode(outputs[:,inputs["input_ids"].shape[1]:].detach().cpu().numpy(), skip_special_tokens=True)
output_text=generate_response (model, tokenizer, text_input=txt,eos_token_id=eos_token,
num_return_sequences=1, repetition_penalty=1.1,
top_p=0.9, top_k=512,
temperature=0.5,
max_new_tokens=256)
print (output_text[0])
```
## Dataset
See [lamm-mit/x-lora-dataset](https://huggingface.co/datasets/lamm-mit/x-lora-dataset) for the dataset used to train the X-LoRA model. Details on the datasets used to train the original adapters are included in the paper (see reference below).
## Sample results
![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/GRbDJcIqkZZrQAVXyKB2H.png)
## Acknowledgements
This work is built on the Hugging Face [PEFT library](https://github.com/huggingface/peft/tree/main/) and other components in the Hugging Face ecosystem. We acknowledge the authors of this excellent library and related methods.
## Original paper and citation
Cite this work as:
```bibtex
@article{Buehler_XLoRA_2024,
title = {X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Design},
author = {E.L. Buehler, M.J. Buehler},
journal = {},
year = {2024},
volume = {},
pages = {},
url = {https://arxiv.org/abs/2402.07148}
}
```
|