diabolic6045 commited on
Commit
ec25aaa
1 Parent(s): bac728e

End of training

Browse files
Files changed (2) hide show
  1. README.md +14 -22
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -1,21 +1,19 @@
1
  ---
2
- license: llama3
3
  library_name: peft
 
4
  tags:
5
  - axolotl
6
  - generated_from_trainer
7
- base_model: meta-llama/Meta-Llama-3-8B
8
  model-index:
9
  - name: Sanskrit-llama
10
  results: []
11
- datasets:
12
- - diabolic6045/Sanskrit-llama
13
  ---
14
 
15
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
19
  <details><summary>See axolotl config</summary>
20
 
21
  axolotl version: `0.4.1`
@@ -24,7 +22,7 @@ axolotl version: `0.4.1`
24
  base_model: meta-llama/Meta-Llama-3-8B
25
  model_type: AutoModelForCausalLM
26
  tokenizer_type: AutoTokenizer
27
- max_steps: 2
28
  bnb_config_kwargs:
29
  llm_int8_has_fp16_weight: false
30
  bnb_4bit_quant_type: nf4
@@ -35,7 +33,7 @@ load_in_4bit: true
35
  strict: false
36
 
37
  datasets:
38
- - path: diabolic6045/Sanskrit-llama
39
  type: alpaca
40
  dataset_prepared_path:
41
  val_set_size: 0
@@ -46,7 +44,7 @@ hf_use_auth_token: true
46
  adapter: qlora
47
  lora_model_dir:
48
 
49
- sequence_len: 1024
50
  sample_packing: true
51
  eval_sample_packing: false
52
  pad_to_sequence_len: true
@@ -56,21 +54,15 @@ lora_alpha: 16
56
  lora_dropout: 0.05
57
  lora_target_modules:
58
  lora_target_linear: true
59
- lora_fan_in_fan_out:
60
-
61
- wandb_project: संस्कृतम्-llama
62
- wandb_entity:
63
- wandb_watch: all
64
- wandb_name: संस्कृतम्-llama
65
- wandb_log_model:
66
 
67
- gradient_accumulation_steps: 4
68
  micro_batch_size: 2
69
  num_epochs: 1
70
  optimizer: paged_adamw_8bit
71
  lr_scheduler: cosine
72
  cosine_min_lr_ratio: 0.2
73
- learning_rate: 2e-5
74
 
75
  train_on_inputs: false
76
  group_by_length: false
@@ -132,19 +124,19 @@ More information needed
132
  ### Training hyperparameters
133
 
134
  The following hyperparameters were used during training:
135
- - learning_rate: 2e-05
136
  - train_batch_size: 2
137
  - eval_batch_size: 2
138
  - seed: 42
139
  - distributed_type: multi-GPU
140
  - num_devices: 2
141
- - gradient_accumulation_steps: 4
142
- - total_train_batch_size: 16
143
  - total_eval_batch_size: 4
144
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
145
  - lr_scheduler_type: cosine
146
  - lr_scheduler_warmup_steps: 10
147
- - training_steps: 2
148
 
149
  ### Training results
150
 
@@ -153,7 +145,7 @@ The following hyperparameters were used during training:
153
  ### Framework versions
154
 
155
  - PEFT 0.11.1
156
- - Transformers 4.41.1
157
  - Pytorch 2.1.2
158
  - Datasets 2.19.1
159
  - Tokenizers 0.19.1
 
1
  ---
2
+ base_model: meta-llama/Meta-Llama-3-8B
3
  library_name: peft
4
+ license: llama3
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
 
8
  model-index:
9
  - name: Sanskrit-llama
10
  results: []
 
 
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
  <details><summary>See axolotl config</summary>
18
 
19
  axolotl version: `0.4.1`
 
22
  base_model: meta-llama/Meta-Llama-3-8B
23
  model_type: AutoModelForCausalLM
24
  tokenizer_type: AutoTokenizer
25
+ max_steps:
26
  bnb_config_kwargs:
27
  llm_int8_has_fp16_weight: false
28
  bnb_4bit_quant_type: nf4
 
33
  strict: false
34
 
35
  datasets:
36
+ - path: VinitT/Sanskrit-Llama_Base-Dataset
37
  type: alpaca
38
  dataset_prepared_path:
39
  val_set_size: 0
 
44
  adapter: qlora
45
  lora_model_dir:
46
 
47
+ sequence_len: 512
48
  sample_packing: true
49
  eval_sample_packing: false
50
  pad_to_sequence_len: true
 
54
  lora_dropout: 0.05
55
  lora_target_modules:
56
  lora_target_linear: true
57
+ lora_fan_in_fan_out:
 
 
 
 
 
 
58
 
59
+ gradient_accumulation_steps: 8
60
  micro_batch_size: 2
61
  num_epochs: 1
62
  optimizer: paged_adamw_8bit
63
  lr_scheduler: cosine
64
  cosine_min_lr_ratio: 0.2
65
+ learning_rate: 5e-5
66
 
67
  train_on_inputs: false
68
  group_by_length: false
 
124
  ### Training hyperparameters
125
 
126
  The following hyperparameters were used during training:
127
+ - learning_rate: 5e-05
128
  - train_batch_size: 2
129
  - eval_batch_size: 2
130
  - seed: 42
131
  - distributed_type: multi-GPU
132
  - num_devices: 2
133
+ - gradient_accumulation_steps: 8
134
+ - total_train_batch_size: 32
135
  - total_eval_batch_size: 4
136
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
137
  - lr_scheduler_type: cosine
138
  - lr_scheduler_warmup_steps: 10
139
+ - num_epochs: 1
140
 
141
  ### Training results
142
 
 
145
  ### Framework versions
146
 
147
  - PEFT 0.11.1
148
+ - Transformers 4.42.3
149
  - Pytorch 2.1.2
150
  - Datasets 2.19.1
151
  - Tokenizers 0.19.1
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cd548dd24c84749e0d56af20848f2169dbca0ab2b67242a58f27e68c2db79019
3
  size 167843194
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d59031ba061534bc251ce97171f3d11833bed818b498853a2f16aa29c16509f7
3
  size 167843194