ucllovelab
commited on
Commit
•
e8eaecd
1
Parent(s):
0913e59
Update README.md
Browse files
README.md
CHANGED
@@ -11,4 +11,21 @@ tags:
|
|
11 |
We fine-tuned Llama2-7b-chat using LoRA. We used a batch size of 1 and a chunk size of 2048. Training involved the use of the AdamW optimizer with a learning rate of 2e-5 and gradient accumulation steps set at 8. A single training epoch was performed, along with a warm-up step of 0.03 and a weight decay rate of 0.001. The learning rate was controlled using a cosine learning rate scheduler. LoRA adapters, characterized by a rank of 8, an alpha value of 32, and a dropout rate of 0.1, were applied after all self-attention blocks and fully-connected layers. This results in total 17,891,328 trainable parameters, roughly 0.26% of the entire parameters of the base model. To optimize training performance, bf16 mixed precision training and data parallelism were employed. We used 4 Nvidia A100 (80GB) GPUs hosted on the Microsoft Azure platform. An epoch of training takes roughly 42 GPU hours.
|
12 |
|
13 |
## Training data:
|
14 |
-
Please refer to Dataset card: https://huggingface.co/datasets/BrainGPT/train_valid_split_pmc_neuroscience_2002-2022_filtered_subset
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
We fine-tuned Llama2-7b-chat using LoRA. We used a batch size of 1 and a chunk size of 2048. Training involved the use of the AdamW optimizer with a learning rate of 2e-5 and gradient accumulation steps set at 8. A single training epoch was performed, along with a warm-up step of 0.03 and a weight decay rate of 0.001. The learning rate was controlled using a cosine learning rate scheduler. LoRA adapters, characterized by a rank of 8, an alpha value of 32, and a dropout rate of 0.1, were applied after all self-attention blocks and fully-connected layers. This results in total 17,891,328 trainable parameters, roughly 0.26% of the entire parameters of the base model. To optimize training performance, bf16 mixed precision training and data parallelism were employed. We used 4 Nvidia A100 (80GB) GPUs hosted on the Microsoft Azure platform. An epoch of training takes roughly 42 GPU hours.
|
12 |
|
13 |
## Training data:
|
14 |
+
Please refer to Dataset card: https://huggingface.co/datasets/BrainGPT/train_valid_split_pmc_neuroscience_2002-2022_filtered_subset
|
15 |
+
|
16 |
+
## Load and use model:
|
17 |
+
```python
|
18 |
+
from peft import PeftModel, PeftConfig
|
19 |
+
from transformers import AutoModelForCausalLM
|
20 |
+
from transformers import AutoTokenizer
|
21 |
+
|
22 |
+
config = PeftConfig.from_pretrained("BrainGPT/BrainGPT-7B-v0.1")
|
23 |
+
|
24 |
+
# Load model
|
25 |
+
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
|
26 |
+
model = PeftModel.from_pretrained(model, "BrainGPT/BrainGPT-7B-v0.1")
|
27 |
+
|
28 |
+
# Load tokenizer
|
29 |
+
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
30 |
+
|
31 |
+
```
|