Ian Walton
commited on
Commit
•
541d020
1
Parent(s):
831413d
Initial commit.
Browse files- README.md +121 -0
- RWKV-14B-WizardLM.pth +3 -0
- RWKV-LoRA.pth +3 -0
- train_log.txt +52 -0
README.md
CHANGED
@@ -1,3 +1,124 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered
|
5 |
+
tags:
|
6 |
+
- uncensored
|
7 |
+
- rwkv
|
8 |
---
|
9 |
+
|
10 |
+
# RWKV 14B WizardLM LoRA
|
11 |
+
|
12 |
+
The model in this repository was trained for 10.25 hours with a cost of $18.
|
13 |
+
|
14 |
+
* LoRA Rank: 32
|
15 |
+
* LoRA Alpha: 64
|
16 |
+
* Real Epochs: 3
|
17 |
+
* Learning Rate: 1e-4
|
18 |
+
* Context Length: 1024
|
19 |
+
* Training Tokens: 22,771,425
|
20 |
+
* Training Dataset: [WizardLM_alpaca_evol_instruct_70k_unfiltered](https://huggingface.co/datasets/ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered)
|
21 |
+
* RWKV Model License: apache-2.0
|
22 |
+
|
23 |
+
This is an unrestricted model. Please be aware that outputs could be extremely harmful, potentially even if the model is not prompted for harmful outputs. Discretion should be advised when deploying the model to make sure you are not exposing yourself to liabilities arising from unwanted or harmful outputs. I am not responsible for anything that happens when you use this model.
|
24 |
+
|
25 |
+
The training data may have more restrictive licenses. Depending on your jurisdiction and local laws, it may be unwise to use this model for commercial purposes. It is currently unclear how training data licenses govern trained models and it may be subject to change in the near future.
|
26 |
+
|
27 |
+
## Preparing Data
|
28 |
+
|
29 |
+
Repo: [RWKV-v2-RNN-Pile](https://github.com/BlinkDL/RWKV-v2-RNN-Pile)
|
30 |
+
Directory: RWKV-v3
|
31 |
+
|
32 |
+
You need to create a file called `train.txt`. Separate each entry with `<|endoftext|>`. Here is some example code:
|
33 |
+
|
34 |
+
```python
|
35 |
+
import json
|
36 |
+
|
37 |
+
with open("WizardLM_alpaca_evol_instruct_70k_unfiltered.json", "r") as fh:
|
38 |
+
data = json.load(fh)
|
39 |
+
for item in data:
|
40 |
+
if len(item.get("instruction")) > 0 and len(item.get("output")) > 0:
|
41 |
+
print(item["instruction"])
|
42 |
+
print("\n### Response:", end="")
|
43 |
+
print(item["output"])
|
44 |
+
print("<|endoftext|>")
|
45 |
+
```
|
46 |
+
|
47 |
+
Then run:
|
48 |
+
|
49 |
+
```bash
|
50 |
+
python prepare_data.py
|
51 |
+
```
|
52 |
+
|
53 |
+
The resulting file will be `train.npy`. Keep track of the number of tokens.
|
54 |
+
|
55 |
+
## Training
|
56 |
+
|
57 |
+
Repo: [RWKV-LM-LoRA](https://github.com/Blealtan/RWKV-LM-LoRA)
|
58 |
+
Directory: RWKV-v4neo
|
59 |
+
|
60 |
+
Trained using Runpod A100 80 GB instance (Torch 2)
|
61 |
+
|
62 |
+
Install dependencies:
|
63 |
+
|
64 |
+
```bash
|
65 |
+
apt install screen ncdu htop vim
|
66 |
+
wget https://huggingface.co/BlinkDL/rwkv-4-pile-14b/resolve/main/RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth
|
67 |
+
# replace import for inf from torch._six with import from math
|
68 |
+
vim /usr/local/lib/python3.10/dist-packages/deepspeed/runtime/utils.py
|
69 |
+
vim /usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py
|
70 |
+
pip install pytorch-lightning==1.9.0 deepspeed==0.7.0
|
71 |
+
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.0 --extra-index-url https://download.pytorch.org/whl/cu118
|
72 |
+
apt install cuda-nvcc-11-8 libcusparse-11-8 libcusparse-dev-11-8 libcublas-dev-11-8 libcublas-11-8 libcusolver-dev-11-8 libcusolver-11-8
|
73 |
+
apt remove cuda-nvcc-11-6
|
74 |
+
```
|
75 |
+
|
76 |
+
Run training:
|
77 |
+
|
78 |
+
Note:
|
79 |
+
|
80 |
+
* `n_layer` and `n_embd` is dependent on the specifc model you choose.
|
81 |
+
* `lora_alpha` must be the same in training and the `merge_lora.py` command.
|
82 |
+
* `epoch_count` is calculated from `tokens / (ctx_len * micro_bsz * epoch_steps) * actual_epochs`
|
83 |
+
* Make sure your checkpoints folder exists.
|
84 |
+
|
85 |
+
```bash
|
86 |
+
python3 train.py \
|
87 |
+
--load_model ./RWKV-4-Pile-3B-20221110-ctx4096.pth \
|
88 |
+
--proj_dir ./checkpoints-wizardlm \
|
89 |
+
--data_file ./train.npy \
|
90 |
+
--data_type numpy \
|
91 |
+
--vocab_size 50277 \
|
92 |
+
--ctx_len 1024 \
|
93 |
+
--epoch_steps 1000 \
|
94 |
+
--epoch_count 34 \
|
95 |
+
--epoch_begin 0 \
|
96 |
+
--epoch_save 5 \
|
97 |
+
--micro_bsz 2 \
|
98 |
+
--n_layer 40 \
|
99 |
+
--n_embd 5120 \
|
100 |
+
--pre_ffn 0 \
|
101 |
+
--head_qk 0 \
|
102 |
+
--lr_init 1e-4 \
|
103 |
+
--lr_final 5e-7 \
|
104 |
+
--warmup_steps 0 \
|
105 |
+
--beta1 0.9 \
|
106 |
+
--beta2 0.999 \
|
107 |
+
--adam_eps 1e-8 \
|
108 |
+
--lora \
|
109 |
+
--lora_r 32 \
|
110 |
+
--lora_alpha 64 \
|
111 |
+
--lora_dropout 0.05 \
|
112 |
+
--lora_parts=att,ffn,time,ln \
|
113 |
+
--accelerator gpu \
|
114 |
+
--devices 1 \
|
115 |
+
--precision bf16 \
|
116 |
+
--grad_cp 0 \
|
117 |
+
--strategy deepspeed_stage_2
|
118 |
+
```
|
119 |
+
|
120 |
+
Merge weights (since LoRA isn't supported in most implementations):
|
121 |
+
|
122 |
+
```bash
|
123 |
+
python merge_lora.py 64 RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth rwkv-45.pth RWKV-14B-WizardLM.pth
|
124 |
+
```
|
RWKV-14B-WizardLM.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:67e1029553afc5096a2ee4e863eeaeffb13de9caf3b94c7961ac61cd728dd562
|
3 |
+
size 28297430329
|
RWKV-LoRA.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d7625fb23acba32eb5281098621fb73c7abaee5c0677d2ba3c38a166c9fcd735
|
3 |
+
size 240567331
|
train_log.txt
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
NEW RUN 2023-05-18-02-13-24
|
2 |
+
{'load_model': './RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth', 'wandb': '', 'proj_dir': './checkpoints-wizardlm', 'random_seed': -1, 'data_file': './train.npy', 'data_type': 'numpy', 'vocab_size': 50277, 'ctx_len': 2048, 'epoch_steps': 1000, 'epoch_count': 17, 'epoch_begin': 0, 'epoch_save': 5, 'micro_bsz': 2, 'n_layer': 40, 'n_embd': 5120, 'dim_att': 5120, 'dim_ffn': 20480, 'pre_ffn': 0, 'head_qk': 0, 'tiny_att_dim': 0, 'tiny_att_layer': -999, 'lr_init': 0.0001, 'lr_final': 5e-07, 'warmup_steps': 0, 'beta1': 0.9, 'beta2': 0.999, 'adam_eps': 1e-08, 'grad_cp': 0, 'my_pile_stage': 0, 'my_pile_shift': -1, 'my_pile_edecay': 0, 'layerwise_lr': 1, 'ds_bucket_mb': 200, 'my_img_version': 0, 'my_img_size': 0, 'my_img_bit': 0, 'my_img_clip': 'x', 'my_img_clip_scale': 1, 'my_img_l1_scale': 0, 'my_img_encoder': 'x', 'my_sample_len': 0, 'my_ffn_shift': 1, 'my_att_shift': 1, 'my_pos_emb': 0, 'load_partial': 0, 'magic_prime': 0, 'my_qa_mask': 0, 'my_testing': '', 'lora': True, 'lora_load': '', 'lora_r': 32, 'lora_alpha': 64.0, 'lora_dropout': 0.05, 'lora_parts': 'att,ffn,time,ln', 'logger': False, 'enable_checkpointing': False, 'default_root_dir': None, 'gradient_clip_val': 1.0, 'gradient_clip_algorithm': None, 'num_nodes': 1, 'num_processes': None, 'devices': '1', 'gpus': None, 'auto_select_gpus': None, 'tpu_cores': None, 'ipus': None, 'enable_progress_bar': True, 'overfit_batches': 0.0, 'track_grad_norm': -1, 'check_val_every_n_epoch': 100000000000000000000, 'fast_dev_run': False, 'accumulate_grad_batches': None, 'max_epochs': -1, 'min_epochs': None, 'max_steps': -1, 'min_steps': None, 'max_time': None, 'limit_train_batches': None, 'limit_val_batches': None, 'limit_test_batches': None, 'limit_predict_batches': None, 'val_check_interval': None, 'log_every_n_steps': 100000000000000000000, 'accelerator': 'gpu', 'strategy': 'deepspeed_stage_2', 'sync_batchnorm': False, 'precision': 'bf16', 'enable_model_summary': True, 'num_sanity_val_steps': 0, 'resume_from_checkpoint': None, 'profiler': None, 'benchmark': None, 'reload_dataloaders_every_n_epochs': 0, 'auto_lr_find': False, 'replace_sampler_ddp': False, 'detect_anomaly': False, 'auto_scale_batch_size': False, 'plugins': None, 'amp_backend': None, 'amp_level': None, 'move_metrics_to_cpu': False, 'multiple_trainloader_mode': 'max_size_cycle', 'inference_mode': True, 'my_timestamp': '2023-05-18-02-13-24', 'betas': (0.9, 0.999), 'real_bsz': 2, 'run_name': '50277 ctx2048 L40 D5120'}
|
3 |
+
{'zero_allow_untested_optimizer': True, 'zero_optimization': {'stage': 2, 'contiguous_gradients': True, 'overlap_comm': True, 'allgather_partitions': True, 'reduce_scatter': True, 'allgather_bucket_size': 200000000, 'reduce_bucket_size': 200000000, 'sub_group_size': 1000000000000}, 'activation_checkpointing': {'partition_activations': False, 'cpu_checkpointing': False, 'contiguous_memory_optimization': False, 'synchronize_checkpoint_boundary': False}, 'aio': {'block_size': 1048576, 'queue_depth': 8, 'single_submit': False, 'overlap_events': True, 'thread_count': 1}, 'gradient_accumulation_steps': 1, 'train_micro_batch_size_per_gpu': 2, 'gradient_clipping': 1.0, 'bf16': {'enabled': True}, 'compression_training': {'weight_quantization': {'shared_parameters': {}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {}, 'different_groups': {}}}}
|
4 |
+
NEW RUN 2023-05-18-02-18-22
|
5 |
+
{'load_model': './RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth', 'wandb': '', 'proj_dir': './checkpoints-wizardlm', 'random_seed': -1, 'data_file': './train.npy', 'data_type': 'numpy', 'vocab_size': 50277, 'ctx_len': 1024, 'epoch_steps': 1000, 'epoch_count': 34, 'epoch_begin': 0, 'epoch_save': 5, 'micro_bsz': 2, 'n_layer': 40, 'n_embd': 5120, 'dim_att': 5120, 'dim_ffn': 20480, 'pre_ffn': 0, 'head_qk': 0, 'tiny_att_dim': 0, 'tiny_att_layer': -999, 'lr_init': 0.0001, 'lr_final': 5e-07, 'warmup_steps': 0, 'beta1': 0.9, 'beta2': 0.999, 'adam_eps': 1e-08, 'grad_cp': 0, 'my_pile_stage': 0, 'my_pile_shift': -1, 'my_pile_edecay': 0, 'layerwise_lr': 1, 'ds_bucket_mb': 200, 'my_img_version': 0, 'my_img_size': 0, 'my_img_bit': 0, 'my_img_clip': 'x', 'my_img_clip_scale': 1, 'my_img_l1_scale': 0, 'my_img_encoder': 'x', 'my_sample_len': 0, 'my_ffn_shift': 1, 'my_att_shift': 1, 'my_pos_emb': 0, 'load_partial': 0, 'magic_prime': 0, 'my_qa_mask': 0, 'my_testing': '', 'lora': True, 'lora_load': '', 'lora_r': 32, 'lora_alpha': 64.0, 'lora_dropout': 0.05, 'lora_parts': 'att,ffn,time,ln', 'logger': False, 'enable_checkpointing': False, 'default_root_dir': None, 'gradient_clip_val': 1.0, 'gradient_clip_algorithm': None, 'num_nodes': 1, 'num_processes': None, 'devices': '1', 'gpus': None, 'auto_select_gpus': None, 'tpu_cores': None, 'ipus': None, 'enable_progress_bar': True, 'overfit_batches': 0.0, 'track_grad_norm': -1, 'check_val_every_n_epoch': 100000000000000000000, 'fast_dev_run': False, 'accumulate_grad_batches': None, 'max_epochs': -1, 'min_epochs': None, 'max_steps': -1, 'min_steps': None, 'max_time': None, 'limit_train_batches': None, 'limit_val_batches': None, 'limit_test_batches': None, 'limit_predict_batches': None, 'val_check_interval': None, 'log_every_n_steps': 100000000000000000000, 'accelerator': 'gpu', 'strategy': 'deepspeed_stage_2', 'sync_batchnorm': False, 'precision': 'bf16', 'enable_model_summary': True, 'num_sanity_val_steps': 0, 'resume_from_checkpoint': None, 'profiler': None, 'benchmark': None, 'reload_dataloaders_every_n_epochs': 0, 'auto_lr_find': False, 'replace_sampler_ddp': False, 'detect_anomaly': False, 'auto_scale_batch_size': False, 'plugins': None, 'amp_backend': None, 'amp_level': None, 'move_metrics_to_cpu': False, 'multiple_trainloader_mode': 'max_size_cycle', 'inference_mode': True, 'my_timestamp': '2023-05-18-02-18-22', 'betas': (0.9, 0.999), 'real_bsz': 2, 'run_name': '50277 ctx1024 L40 D5120'}
|
6 |
+
{'zero_allow_untested_optimizer': True, 'zero_optimization': {'stage': 2, 'contiguous_gradients': True, 'overlap_comm': True, 'allgather_partitions': True, 'reduce_scatter': True, 'allgather_bucket_size': 200000000, 'reduce_bucket_size': 200000000, 'sub_group_size': 1000000000000}, 'activation_checkpointing': {'partition_activations': False, 'cpu_checkpointing': False, 'contiguous_memory_optimization': False, 'synchronize_checkpoint_boundary': False}, 'aio': {'block_size': 1048576, 'queue_depth': 8, 'single_submit': False, 'overlap_events': True, 'thread_count': 1}, 'gradient_accumulation_steps': 1, 'train_micro_batch_size_per_gpu': 2, 'gradient_clipping': 1.0, 'bf16': {'enabled': True}, 'compression_training': {'weight_quantization': {'shared_parameters': {}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {}, 'different_groups': {}}}}
|
7 |
+
0 0.974954 2.6510 0.00008557 2023-05-18 02:35:03.067663 0
|
8 |
+
1 0.924752 2.5212 0.00007322 2023-05-18 02:48:33.964340 1
|
9 |
+
2 0.910025 2.4844 0.00006266 2023-05-18 03:02:04.389023 2
|
10 |
+
3 0.896199 2.4503 0.00005362 2023-05-18 03:15:34.393033 3
|
11 |
+
4 0.891289 2.4383 0.00004588 2023-05-18 03:29:04.556924 4
|
12 |
+
5 0.887027 2.4279 0.00003926 2023-05-18 03:42:35.363919 5
|
13 |
+
6 0.879992 2.4109 0.00003359 2023-05-18 03:56:07.111067 6
|
14 |
+
7 0.873740 2.3959 0.00002875 2023-05-18 04:09:40.671492 7
|
15 |
+
8 0.876330 2.4021 0.00002460 2023-05-18 04:23:14.937769 8
|
16 |
+
9 0.859262 2.3614 0.00002105 2023-05-18 04:36:47.992056 9
|
17 |
+
10 0.857725 2.3578 0.00001801 2023-05-18 04:50:21.573587 10
|
18 |
+
11 0.854920 2.3512 0.00001541 2023-05-18 05:03:55.170411 11
|
19 |
+
12 0.853412 2.3476 0.00001319 2023-05-18 05:17:28.057519 12
|
20 |
+
13 0.841703 2.3203 0.00001129 2023-05-18 05:31:01.106771 13
|
21 |
+
14 0.854889 2.3511 0.00000966 2023-05-18 05:44:33.658881 14
|
22 |
+
15 0.850775 2.3415 0.00000826 2023-05-18 05:58:08.160597 15
|
23 |
+
16 0.851631 2.3435 0.00000707 2023-05-18 06:11:46.384325 16
|
24 |
+
17 0.839762 2.3158 0.00000605 2023-05-18 06:25:27.382862 17
|
25 |
+
18 0.850289 2.3403 0.00000518 2023-05-18 06:39:08.960637 18
|
26 |
+
19 0.841697 2.3203 0.00000443 2023-05-18 06:52:46.666994 19
|
27 |
+
20 0.839498 2.3152 0.00000379 2023-05-18 07:06:22.116873 20
|
28 |
+
21 0.842402 2.3219 0.00000324 2023-05-18 07:20:03.309937 21
|
29 |
+
22 0.830740 2.2950 0.00000278 2023-05-18 07:33:35.949427 22
|
30 |
+
23 0.838361 2.3126 0.00000238 2023-05-18 07:47:07.854821 23
|
31 |
+
24 0.843396 2.3242 0.00000203 2023-05-18 08:00:38.640102 24
|
32 |
+
25 0.833445 2.3012 0.00000174 2023-05-18 08:14:09.935184 25
|
33 |
+
26 0.835568 2.3061 0.00000149 2023-05-18 08:27:42.156833 26
|
34 |
+
27 0.842768 2.3228 0.00000127 2023-05-18 08:41:15.406521 27
|
35 |
+
28 0.840123 2.3167 0.00000109 2023-05-18 08:54:47.893448 28
|
36 |
+
29 0.834012 2.3025 0.00000093 2023-05-18 09:08:20.361893 29
|
37 |
+
30 0.833059 2.3003 0.00000080 2023-05-18 09:21:53.894218 30
|
38 |
+
31 0.838252 2.3123 0.00000068 2023-05-18 09:35:35.929064 31
|
39 |
+
32 0.834691 2.3041 0.00000058 2023-05-18 09:49:17.445363 32
|
40 |
+
33 0.849344 2.3381 0.00000050 2023-05-18 10:02:56.926832 33
|
41 |
+
34 0.832176 2.2983 0.00000050 2023-05-18 10:16:30.915617 34
|
42 |
+
35 0.833998 2.3025 0.00000050 2023-05-18 10:30:03.204456 35
|
43 |
+
36 0.844871 2.3277 0.00000050 2023-05-18 10:43:36.046350 36
|
44 |
+
37 0.844629 2.3271 0.00000050 2023-05-18 10:57:08.279003 37
|
45 |
+
38 0.840990 2.3187 0.00000050 2023-05-18 11:10:36.765075 38
|
46 |
+
39 0.835910 2.3069 0.00000050 2023-05-18 11:24:05.882433 39
|
47 |
+
40 0.836146 2.3075 0.00000050 2023-05-18 11:37:36.168271 40
|
48 |
+
41 0.835250 2.3054 0.00000050 2023-05-18 11:51:05.636469 41
|
49 |
+
42 0.833586 2.3016 0.00000050 2023-05-18 12:04:35.482199 42
|
50 |
+
43 0.838982 2.3140 0.00000050 2023-05-18 12:18:05.848889 43
|
51 |
+
44 0.837347 2.3102 0.00000050 2023-05-18 12:31:35.710046 44
|
52 |
+
45 0.842637 2.3225 0.00000050 2023-05-18 12:45:05.453466 45
|