Update README.md
Browse files
README.md
CHANGED
|
@@ -39,8 +39,34 @@ LoRA rank-16 adapter fine-tuned from **`kfdong/STP_model_Lean_0320`** to assist
|
|
| 39 |
| **Context length** | 1792 tokens |
|
| 40 |
| **Hardware** | 1 × H100 80 GB |
|
| 41 |
|
|
|
|
| 42 |
---
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
### Results (1 epoch, STP Lean corpus)
|
| 45 |
|
| 46 |
| Metric | Value |
|
|
|
|
| 39 |
| **Context length** | 1792 tokens |
|
| 40 |
| **Hardware** | 1 × H100 80 GB |
|
| 41 |
|
| 42 |
+
|
| 43 |
---
|
| 44 |
|
| 45 |
+
#### Training Hyperparameters
|
| 46 |
+
|
| 47 |
+
| Setting | Value |
|
| 48 |
+
|-------------------------------|---------------------------------------------|
|
| 49 |
+
| Precision / regime | **bf16 mixed precision** |
|
| 50 |
+
| Epochs | **1** |
|
| 51 |
+
| Max sequence length | 1 792 tokens (right-padding) |
|
| 52 |
+
| Per-device train batch size | 6 |
|
| 53 |
+
| Per-device eval batch size | 2 |
|
| 54 |
+
| Gradient accumulation steps | 1 (effective batch = 6) |
|
| 55 |
+
| Optimizer | AdamW |
|
| 56 |
+
| Learning rate schedule | **2 × 10⁻⁴** cosine, warm-up **3 %** |
|
| 57 |
+
| Weight decay | 0.01 |
|
| 58 |
+
| LoRA rank / α / dropout | r = 16, α = 32 (2 × r), dropout = 0.05 |
|
| 59 |
+
| Gradient checkpointing | Enabled (memory-efficient) |
|
| 60 |
+
| Flash-Attention v2 | Enabled |
|
| 61 |
+
| Logging | every 50 steps |
|
| 62 |
+
| Evaluation strategy | once per epoch |
|
| 63 |
+
| Save strategy | once per epoch |
|
| 64 |
+
| Seed | 42 |
|
| 65 |
+
| Hardware | 1 × H100 80 GB |
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
|
| 70 |
### Results (1 epoch, STP Lean corpus)
|
| 71 |
|
| 72 |
| Metric | Value |
|