arxiv:2309.05463

Textbooks Are All You Need II: phi-1.5 technical report

Published on Sep 11, 2023

· Submitted by

akhaliq on Sep 12, 2023

#1 Paper of the day

Upvote

Authors:

Yuanzhi Li ,

Sébastien Bubeck ,

Ronen Eldan ,

Allie Del Giorno ,

Suriya Gunasekar ,

Yin Tat Lee

Abstract

We continue the investigation into the power of smaller Transformer-based language models as initiated by TinyStories -- a 10 million parameter model that can produce coherent English -- and the follow-up work on phi-1, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named phi-1.5, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, phi-1.5 exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source phi-1.5 to promote further research on these urgent topics.

View arXiv page View PDF Add to collection