No information about the training data. How are we supposed to compare it with LLaMa 2 models and how can we justify the change in perplexity on common datasets such as wikitext?
Β· Sign up or log in to comment