Josephgflowers/TinyLlama-748M-Reason-With-Cinder-Test-2

I cut my TinyLlama 1.1B cinder v 2 down from 22 layers to 14. At 14 there was no coherent text but there were emerging ideas of a response. 1000 steps on step-by-step dataset. 10000 on Reason-with-cinder. The loss was around .6 and the learning rate was still over 4. Starting to get better performance now. This model still needs significat training. I am putting it up as a base model that needs work. If you continue training please let me know on the tinyllama discord https://discord.com/channels/1156883027805356072/1156883029671813122 or email Cinder.stem@gmail.com , I have some interesting plans for this model.