This model is derived from phi 1.3B using layer stacking techniques to double the number of hidden layers in the model. The model was then trained for 1 epoch on data from tiny-textbooks and tiny-lessons.
https://wandb.ai/wing-lian/phi-2x-pt-tiny