File size: 464 Bytes
9618dbc 36b1f8a 9618dbc 711f41f 9618dbc 36b1f8a |
1 2 3 4 5 6 7 8 9 |
This data is from [13B-en training](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr1-13B-base)
- indices - these are Megatron-LM shuffled indices that the training was using. They were generated the first time the training started. So the order is the same if one replays them via the dataloader w/o actually doing the training steps.
- the corresponding dataset is oscar-en that's on JZ at `$six_ALL_CCFRWORK/datasets-custom/oscar-en`
|