Nanobit commited on
Commit
db73b94
β€’
1 Parent(s): 00dfe43

Add image. Add quickstart. Simplify dataset.

Browse files
Files changed (1) hide show
  1. README.md +61 -11
README.md CHANGED
@@ -1,10 +1,18 @@
1
  # Axolotl
2
 
3
- A centralized repo to train multiple architectures with different dataset types using a simple yaml file.
4
-
5
- Go ahead and axolotl questions!!
6
-
7
- ## Support Matrix
 
 
 
 
 
 
 
 
8
 
9
  | | fp16/fp32 | fp16/fp32 w/ lora | 4bit-quant | 4bit-quant w/flash attention | flash attention | xformers attention |
10
  |----------|:----------|:------------------|------------|------------------------------|-----------------|--------------------|
@@ -14,7 +22,22 @@ Go ahead and axolotl questions!!
14
  | mpt | βœ… | ❌ | ❌ | ❌ | ❌ | ❓ |
15
 
16
 
17
- ## Getting Started
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ### Environment
20
 
@@ -39,6 +62,23 @@ Go ahead and axolotl questions!!
39
 
40
  Have dataset(s) in one of the following format (JSONL recommended):
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  - `alpaca`: instruction; input(optional)
43
  ```json
44
  {"instruction": "...", "input": "...", "output": "..."}
@@ -68,11 +108,13 @@ Have dataset(s) in one of the following format (JSONL recommended):
68
  {"text": "..."}
69
  ```
70
 
 
 
71
  Optionally, download some datasets, see [data/README.md](data/README.md)
72
 
73
  ### Config
74
 
75
- See sample configs in [configs](configs) folder. It is recommended to duplicate and modify to your needs. The most important options are:
76
 
77
  - model
78
  ```yaml
@@ -84,7 +126,7 @@ See sample configs in [configs](configs) folder. It is recommended to duplicate
84
  ```yaml
85
  datasets:
86
  - path: vicgalle/alpaca-gpt4 # local or huggingface repo
87
- type: alpaca # format from above
88
  ```
89
 
90
  - loading
@@ -147,6 +189,8 @@ datasets:
147
  - path: vicgalle/alpaca-gpt4
148
  # The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
149
  type: alpaca
 
 
150
  # axolotl attempts to save the dataset as an arrow after packing the data together so
151
  # subsequent training attempts load faster, relative path
152
  dataset_prepared_path: data/last_run_prepared
@@ -260,7 +304,13 @@ debug:
260
 
261
  ### Accelerate
262
 
263
- Configure accelerate using `accelerate config` or update `~/.cache/huggingface/accelerate/default_config.yaml`
 
 
 
 
 
 
264
 
265
  ### Train
266
 
@@ -275,10 +325,10 @@ Add `--inference` flag to train command above
275
 
276
  If you are inferencing a pretrained LORA, pass
277
  ```bash
278
- --lora_model_dir path/to/lora
279
  ```
280
 
281
- ### Merge LORA to base
282
 
283
  Add `--merge_lora --lora_model_dir="path/to/lora"` flag to train command above
284
 
 
1
  # Axolotl
2
 
3
+ <div align="center">
4
+ <img src="image/axolotl.png" alt="axolotl" width="160">
5
+ <div>
6
+ <p>
7
+ <b>One repo to finetune them all! </b>
8
+ </p>
9
+ <p>
10
+ Go ahead and axolotl questions!!
11
+ </p>
12
+ </div>
13
+ </div>
14
+
15
+ ## Axolotl supports
16
 
17
  | | fp16/fp32 | fp16/fp32 w/ lora | 4bit-quant | 4bit-quant w/flash attention | flash attention | xformers attention |
18
  |----------|:----------|:------------------|------------|------------------------------|-----------------|--------------------|
 
22
  | mpt | βœ… | ❌ | ❌ | ❌ | ❌ | ❓ |
23
 
24
 
25
+ ## Quick start
26
+
27
+ **Requirements**: Python 3.9.
28
+
29
+ ```bash
30
+ git clone https://github.com/OpenAccess-AI-Collective/axolotl
31
+
32
+ pip3 install -e .[int4]
33
+
34
+ accelerate config
35
+ accelerate launch scripts/finetune.py examples/4bit-lora-7b/config.yml
36
+ ```
37
+
38
+
39
+
40
+ ## Requirements and Installation
41
 
42
  ### Environment
43
 
 
62
 
63
  Have dataset(s) in one of the following format (JSONL recommended):
64
 
65
+ - `alpaca`: instruction; input(optional)
66
+ ```json
67
+ {"instruction": "...", "input": "...", "output": "..."}
68
+ ```
69
+ - `sharegpt`: conversations
70
+ ```json
71
+ {"conversations": [{"from": "...", "value": "..."}]}
72
+ ```
73
+ - `completion`: raw corpus
74
+ ```json
75
+ {"text": "..."}
76
+ ```
77
+
78
+ <details>
79
+
80
+ <summary>See all formats</summary>
81
+
82
  - `alpaca`: instruction; input(optional)
83
  ```json
84
  {"instruction": "...", "input": "...", "output": "..."}
 
108
  {"text": "..."}
109
  ```
110
 
111
+ </details>
112
+
113
  Optionally, download some datasets, see [data/README.md](data/README.md)
114
 
115
  ### Config
116
 
117
+ See sample configs in [configs](configs) folder or [examples](examples) for quick start. It is recommended to duplicate and modify to your needs. The most important options are:
118
 
119
  - model
120
  ```yaml
 
126
  ```yaml
127
  datasets:
128
  - path: vicgalle/alpaca-gpt4 # local or huggingface repo
129
+ type: alpaca # format from earlier
130
  ```
131
 
132
  - loading
 
189
  - path: vicgalle/alpaca-gpt4
190
  # The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
191
  type: alpaca
192
+ data_files: # path to source data files
193
+
194
  # axolotl attempts to save the dataset as an arrow after packing the data together so
195
  # subsequent training attempts load faster, relative path
196
  dataset_prepared_path: data/last_run_prepared
 
304
 
305
  ### Accelerate
306
 
307
+ Configure accelerate
308
+
309
+ ```bash
310
+ accelerate config
311
+
312
+ # nano ~/.cache/huggingface/accelerate/default_config.yaml
313
+ ```
314
 
315
  ### Train
316
 
 
325
 
326
  If you are inferencing a pretrained LORA, pass
327
  ```bash
328
+ --lora_model_dir ./completed-model
329
  ```
330
 
331
+ ### Merge LORA to base (Dev branch πŸ”§ )
332
 
333
  Add `--merge_lora --lora_model_dir="path/to/lora"` flag to train command above
334