Text Generation
Transformers
PyTorch
English
llama
sft
Inference Endpoints
text-generation-inference
andreaskoepf commited on
Commit
72f506c
1 Parent(s): a95cfcd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md CHANGED
@@ -1,3 +1,63 @@
1
  ---
2
  license: other
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
  ---
4
+
5
+ - [wand](https://wandb.ai/open-assistant/supervised-finetuning/runs/2jfazjt9) (still internal, needs to be moved to public-sft)
6
+ - checkpoint: 3319 steps
7
+
8
+ ## Note
9
+
10
+ In order to load this model you need to install a pre-release version of the Huggingface transformers library.
11
+
12
+
13
+ ## Model Configuration
14
+ ```
15
+ llama2_13b_orca_8k:
16
+ rng_seed: 0xe1291f1a
17
+ use_custom_sampler: true
18
+ sort_by_length: false
19
+ dtype: fp16
20
+ log_dir: "llama2_log_13b_orca_8k"
21
+ learning_rate: 1e-5
22
+ model_name: /mnt/data/llama2/Llama-2-13b-hf/
23
+ output_dir: llama2_13b_orca_8k
24
+ deepspeed_config: configs/zero_config_pretrain.json
25
+ weight_decay: 0.0
26
+ max_length: 8192
27
+ warmup_steps: 100
28
+ use_flash_attention: true
29
+ gradient_checkpointing: true
30
+ gradient_accumulation_steps: 8
31
+ per_device_train_batch_size: 2
32
+ per_device_eval_batch_size: 1
33
+ residual_dropout: 0.0
34
+ eval_steps: 200
35
+ save_steps: 1000 # (total steps: 3319)
36
+ num_train_epochs: 1
37
+ save_total_limit: 4
38
+ superhot: true
39
+ superhot_config:
40
+ type: linear
41
+ scale: 2
42
+ datasets:
43
+ # Dataset Composition:
44
+ # Tain (sampled):
45
+ # orca-chat: 100.00% (188842)
46
+ # fanfics: 100.00% (47760)
47
+ # red_pajama: 25.00% (188262)
48
+ # Valid:
49
+ # orca-chat: 5000 (71.43%)
50
+ # fanfics: 1000 (14.29%)
51
+ # red_pajama: 1000 (14.29%)
52
+ - orca-chat:
53
+ max_val_set: 5000
54
+ - fanfics:
55
+ max_chunk_size: 65535
56
+ max_val_set: 1000
57
+ - red_pajama:
58
+ fraction: 0.25
59
+ max_val_set: 1000
60
+ max_chunk_size: 65535
61
+ peft_model: false
62
+ ```
63
+