YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
This data is from 13B-en training
indices - these are Megatron-LM shuffled indices that the training was using. They were generated the first time the training started. So the order is the same if one replays them via the dataloader w/o actually doing the training steps.
the corresponding dataset is oscar-en that's on JZ at
$six_ALL_CCFRWORK/datasets-custom/oscar-en