mrm8488 commited on
Commit
228fc22
1 Parent(s): 6ac4e60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -44
README.md CHANGED
@@ -40,50 +40,43 @@ Phi-2 is a Transformer with **2.7 billion** parameters. It was trained using the
40
 
41
 
42
 
43
- ### LoRa config
44
-
45
- ```py
46
- config = LoraConfig(
47
- r=32,
48
- lora_alpha=64,
49
- target_modules=[
50
- "Wqkv",
51
- "fc1",
52
- "fc2",
53
- "out_proj"
54
- ],
55
- bias="none",
56
- lora_dropout=0.05,
57
- task_type="CAUSAL_LM",
58
- )
59
- ```
60
-
61
- ### Training hyperparameters ⚙
62
-
63
- ```py
64
- per_device_train_batch_size=4,
65
- gradient_accumulation_steps=32,
66
- num_train_epochs=2,
67
- learning_rate=2.5e-5,
68
- optim="paged_adamw_8bit",
69
- seed=66,
70
- load_best_model_at_end=True,
71
- save_strategy="steps",
72
- save_steps=50,
73
- evaluation_strategy="steps",
74
- eval_steps=50,
75
- ```
76
-
77
- ### Training results 🗒️
78
-
79
-
80
- | Step | Training Loss | Validation Loss |
81
- |------|---------------|-----------------|
82
- | 50 | 0.763100 | 0.717398 |
83
- | 100 | 0.673500 | 0.694871 |
84
- | 150 | 0.696000 | 0.689336 |
85
- | 200 | 0.786100 | 0.687515 |
86
- | 250 | 0.734600 | 0.686658 |
87
 
88
 
89
 
 
40
 
41
 
42
 
43
+ ### Training procedure
44
+
45
+
46
+ The following `bitsandbytes` quantization config was used during training:
47
+ - quant_method: bitsandbytes
48
+ - load_in_8bit: True
49
+ - load_in_4bit: False
50
+ - llm_int8_threshold: 6.0
51
+ - llm_int8_skip_modules: None
52
+ - llm_int8_enable_fp32_cpu_offload: False
53
+ - llm_int8_has_fp16_weight: False
54
+ - bnb_4bit_quant_type: fp4
55
+ - bnb_4bit_use_double_quant: False
56
+ - bnb_4bit_compute_dtype: float32
57
+
58
+ ### Training hyperparameters
59
+
60
+ The following hyperparameters were used during training:
61
+ - learning_rate: 2.5e-05
62
+ - train_batch_size: 4
63
+ - eval_batch_size: 8
64
+ - seed: 66
65
+ - gradient_accumulation_steps: 32
66
+ - total_train_batch_size: 128
67
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
68
+ - lr_scheduler_type: linear
69
+ - num_epochs: 2
70
+
71
+ ### Training results
72
+
73
+ | Training Loss | Epoch | Step | Validation Loss |
74
+ |:-------------:|:-----:|:----:|:---------------:|
75
+ | 0.7631 | 0.36 | 50 | 0.7174 |
76
+ | 0.6735 | 0.71 | 100 | 0.6949 |
77
+ | 0.696 | 1.07 | 150 | 0.6893 |
78
+ | 0.7861 | 1.42 | 200 | 0.6875 |
79
+ | 0.7346 | 1.78 | 250 | 0.6867 |
 
 
 
 
 
 
 
80
 
81
 
82