Pretrained GPT-NeoX model with 177MB vietnamese dataset. Took about 2 hour and 20 minutes to reach 4,000 iterations. Trained on p3.16xlarge.