PowerInfer
/

Bamboo-base-v0_1

Feature Extraction

Model card Files Files and versions Community

yixinsong commited on Mar 27, 2024

Commit

e116053

·

verified ·

1 Parent(s): b49a7fc

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -56,6 +56,7 @@ The following table shows the hyper-paramters we used in our training process.
 | Peak Learning Rate    | 5e-5        |
 | Batch Size            | 4M          |
 | Weight Decay          | 0.1         |
 **Second phase**: We further adjusted the training corpus ratio, incorporating more domain-specific datasets (e.g., Math, Coding), and continued training for 50B tokens.
@@ -66,6 +67,7 @@ The following table shows the hyper-paramters we used in our training process.
 | Peak Learning Rate    | 5e-6        |
 | Batch Size            | 4M          |
 | Weight Decay          | 0.01        |
 ## Performance Evaluation Results

 | Peak Learning Rate    | 5e-5        |
 | Batch Size            | 4M          |
 | Weight Decay          | 0.1         |
+| Context Length        | 2k          |
 **Second phase**: We further adjusted the training corpus ratio, incorporating more domain-specific datasets (e.g., Math, Coding), and continued training for 50B tokens.
 | Peak Learning Rate    | 5e-6        |
 | Batch Size            | 4M          |
 | Weight Decay          | 0.01        |
+| Context Length        | 4k          |
 ## Performance Evaluation Results