This is a model using the llama2 architecture and only 30 million parameters. It is trained on approximately 8 billion tokens of diverse web data from the first 4000000 rows of the uncleaned c4 english dataset. The model has a context length of 2048.