adarshxs commited on
Commit
77e62ee
1 Parent(s): de455c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -2
README.md CHANGED
@@ -1,9 +1,92 @@
1
  ---
2
  library_name: peft
3
  ---
4
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
 
 
 
 
 
 
 
 
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  The following `bitsandbytes` quantization config was used during training:
8
  - quant_method: bitsandbytes
9
  - load_in_8bit: True
@@ -15,7 +98,14 @@ The following `bitsandbytes` quantization config was used during training:
15
  - bnb_4bit_quant_type: fp4
16
  - bnb_4bit_use_double_quant: False
17
  - bnb_4bit_compute_dtype: float32
 
 
 
 
 
 
 
18
  ### Framework versions
19
 
20
 
21
- - PEFT 0.6.0.dev0
 
1
  ---
2
  library_name: peft
3
  ---
4
+ We trained this LORA adapter for base `Llama-2-7b-hf` on the `henrichsen/alpaca_2k_test` dataset.
5
+
6
+ Visit us at: https://tensoic.com
7
+
8
+ Check out [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) to merge and run inference.
9
+
10
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/644bf6ef778ecbfb977e8e84/C0btqRI3eCz0kNYGQoa9k.png)
11
+
12
+ ## Training Setup:
13
+ ```
14
+ Number of GPUs: 8x NVIDIA V100 GPUs
15
+ GPU Memory: 32GB each (SXM2 form factor)
16
+ ```
17
+ ## Training Configuration:
18
+
19
+ ```yaml
20
+ base_model: meta-llama/Llama-2-7b-hf
21
+ base_model_config: meta-llama/Llama-2-7b-hf
22
+ model_type: LlamaForCausalLM
23
+ tokenizer_type: LlamaTokenizer
24
+ is_llama_derived_model: true
25
+
26
+ load_in_8bit: true
27
+ load_in_4bit: false
28
+ strict: false
29
+
30
+ datasets:
31
+ - path: mhenrichsen/alpaca_2k_test
32
+ type: alpaca
33
+ dataset_prepared_path: last_run_prepared
34
+ val_set_size: 0.01
35
+ output_dir: ./lora-out
36
+
37
+ sequence_len: 4096
38
+ sample_packing: false
39
+ pad_to_sequence_len: true
40
 
41
+ adapter: lora
42
+ lora_model_dir:
43
+ lora_r: 32
44
+ lora_alpha: 16
45
+ lora_dropout: 0.05
46
+ lora_target_linear: true
47
+ lora_fan_in_fan_out:
48
 
49
+ wandb_project:
50
+ wandb_entity:
51
+ wandb_watch:
52
+ wandb_run_id:
53
+ wandb_log_model:
54
+
55
+ gradient_accumulation_steps: 4
56
+ micro_batch_size: 2
57
+ num_epochs: 3
58
+ optimizer: adamw_bnb_8bit
59
+ lr_scheduler: cosine
60
+ learning_rate: 0.0002
61
+
62
+ train_on_inputs: false
63
+ group_by_length: false
64
+ bf16: false
65
+ fp16: true
66
+ tf32: false
67
+
68
+ gradient_checkpointing: true
69
+ early_stopping_patience:
70
+ resume_from_checkpoint:
71
+ local_rank:
72
+ logging_steps: 1
73
+ xformers_attention: true
74
+ flash_attention: false
75
+
76
+ warmup_steps: 10
77
+ eval_steps: 20
78
+ save_steps:
79
+ debug:
80
+ deepspeed:
81
+ weight_decay: 0.0
82
+ fsdp:
83
+ fsdp_config:
84
+ special_tokens:
85
+ bos_token: "<s>"
86
+ eos_token: "</s>"
87
+ unk_token: "<unk>"
88
+ ```
89
+ ```
90
  The following `bitsandbytes` quantization config was used during training:
91
  - quant_method: bitsandbytes
92
  - load_in_8bit: True
 
98
  - bnb_4bit_quant_type: fp4
99
  - bnb_4bit_use_double_quant: False
100
  - bnb_4bit_compute_dtype: float32
101
+
102
+ ```
103
+
104
+ ## Training procedure
105
+
106
+
107
+
108
  ### Framework versions
109
 
110
 
111
+ - PEFT 0.6.0.dev0