arxiv:2305.07759

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Published on May 12, 2023

· Featured in Daily Papers on May 16, 2023

Upvote

Authors:

Ronen Eldan ,

Yuanzhi Li

Abstract

Language models (LMs) are powerful tools for natural language processing, but they often struggle to produce coherent and fluent text when they are small. Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training. This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention). In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities. We also introduce a new paradigm for the evaluation of language models: We suggest a framework which uses GPT-4 to grade the content generated by these models as if those were stories written by students and graded by a (human) teacher. This new paradigm overcomes the flaws of standard benchmarks which often requires the model's output to be very structures, and moreover provides a multidimensional score for the model, providing scores for different capabilities such as grammar, creativity and consistency. We hope that TinyStories can facilitate the development, analysis and research of LMs, especially for low-resource or specialized domains, and shed light on the emergence of language capabilities in LMs.

View arXiv page View PDF Add to collection

Community

simonsv

May 18, 2023

This comment has been hidden

AIWintermuteAI

May 18, 2023

Awesome job, trying it now.
Please fix the dataset encoding btw.

camsdixon

May 27, 2023

when playing with the system now, I'm not getting nearly the quality of responses that your paper is showing.. Constantly the model stops after 5-10 words.

breadlicker45

Jun 2, 2023

how did they find the Grammar and Creativity scores?

SpiridonSunRotator

Jun 5, 2023

That's a great job.
Do you plan to release the repo with the training scripts?

davanstrien

15 days ago

@librarian-bot recommend

librarian-bot

15 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

yumemio

about 10 hours ago

•

edited about 10 hours ago

Microsoft has published a nice article describing how the author (Ronen)'s daughter inspired him to work on this research, and how it led to the development of Phi series models. Anyone interested in the backstory of the paper should check this out!

https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/