Update README.md
Browse files
README.md
CHANGED
@@ -35,14 +35,14 @@ This version includes the optimizer, allowing you to resume training using the H
|
|
35 |
|
36 |
## Continual Training Tutorial
|
37 |
|
38 |
-
### Step 1: Modify the `
|
39 |
|
40 |
-
Due to the implementation of Hugging Face Trainer, certain parameters are stored in the `
|
41 |
|
42 |
- **`save_steps`**: The frequency of saving intermediate checkpoints.
|
43 |
- **`train_batch_size`**: The batch size per GPU (equivalent to `per_device_train_batch_size` in the Trainer). We used a batch size of 1008 (approximately 4M tokens) during the stable training stage. Maintaining this same batch size is equally important for training effectiveness.
|
44 |
|
45 |
-
Below is an example of a properly configured `
|
46 |
|
47 |
```json
|
48 |
{
|
|
|
35 |
|
36 |
## Continual Training Tutorial
|
37 |
|
38 |
+
### Step 1: Modify the `trainer_state.json`
|
39 |
|
40 |
+
Due to the implementation of Hugging Face Trainer, certain parameters are stored in the `trainer_state.json` file and cannot be modified through the Trainer's command-line arguments. Therefore, you need to update these parameters in the `trainer_state.json` file first, particularly:
|
41 |
|
42 |
- **`save_steps`**: The frequency of saving intermediate checkpoints.
|
43 |
- **`train_batch_size`**: The batch size per GPU (equivalent to `per_device_train_batch_size` in the Trainer). We used a batch size of 1008 (approximately 4M tokens) during the stable training stage. Maintaining this same batch size is equally important for training effectiveness.
|
44 |
|
45 |
+
Below is an example of a properly configured `trainer_state.json` file:
|
46 |
|
47 |
```json
|
48 |
{
|