jordiclive
commited on
Commit
•
eadbb15
1
Parent(s):
732a0ac
Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,11 @@ datasets:
|
|
6 |
- yahma/alpaca-cleaned
|
7 |
---
|
8 |
|
9 |
-
This repo contains a low-rank adapter for LLaMA-7b fit on
|
|
|
|
|
|
|
|
|
10 |
|
11 |
|
12 |
This version of the weights was trained with the following hyperparameters:
|
@@ -15,5 +19,8 @@ This version of the weights was trained with the following hyperparameters:
|
|
15 |
- Batch size: 128
|
16 |
- Max Length: 2048
|
17 |
- Learning rate: 4e-6
|
18 |
-
- Lora _r_:
|
19 |
-
- Lora
|
|
|
|
|
|
|
|
6 |
- yahma/alpaca-cleaned
|
7 |
---
|
8 |
|
9 |
+
This repo contains a low-rank adapter for **LLaMA-7b** fit on
|
10 |
+
- `Nebulous/gpt4all_pruned`
|
11 |
+
- `sahil2801/CodeAlpaca-20k`
|
12 |
+
- `yahma/alpaca-cleaned`
|
13 |
+
- datasets part of the OpenAssistant project.
|
14 |
|
15 |
|
16 |
This version of the weights was trained with the following hyperparameters:
|
|
|
19 |
- Batch size: 128
|
20 |
- Max Length: 2048
|
21 |
- Learning rate: 4e-6
|
22 |
+
- Lora _r_: 8
|
23 |
+
- Lora Alpha: 32
|
24 |
+
- Lora target modules: q_proj, k_proj, v_proj, o_proj
|
25 |
+
|
26 |
+
The model was trained with flash attention and gradient checkpointing.
|