Edit model card

First steps to descale low-resource language models

Sasando

Sasando-1 is a tiny, highly experimental short-sequence text generator built using the Phi-3 architecture.

โ•Go straight to the gradio demoโ•

This repo contains the 7M version

Preliminary research preview

๐ŸŽป Welcome!

Sasando-1 is a tiny, highly experimental Indonesian text generator built using the Phi-3 architecture. It comes with two variations of microscopic sizes: 7M and 25M parameters. It is trained on a tightly-controlled Indo4B dataset filtered to only have 18000 unique words. The method is inspired by Microsoft's TinyStories paper which demonstrates that a tiny language model can produce fluent text when trained on tightly-controlled dataset.

๐Ÿ‡ฎ๐Ÿ‡ฉ Context

Indonesia has +700 languages, and many of them are dying at an alarming rate. Language technologies like generative AI can play a massive role in language preservation. However, Indonesia has several contextual issues:

  • Many languages, including those with millions of speakers, have low-volume digital resources
  • Running large models can be costly, while Indonesia is a middle-income country with little funding

Overcoming these challenges require developers to work with what little data and money that they have. Sasando-1 is a prototypical demonstration that thinly-available resources can potentially still be leveraged to develop generative models with cheap compute.

โœจ Specs

  • Comes with 7M and 25M parameters
  • Based on Phi-3 architecture
  • Embedding vocab 4096
  • Trained on ~257M tokens * 4 epoch

๐Ÿ”ญ Out-of-Scope Use

This is a research preview base model. It is not intruction-tuned and has minimal safety curation. It is not intended for commercial or practical applications.

You are also not allowed to use this model without having fun.

Acknowledgments

  • Developed by: Afrizal Hasbi Azizy
  • License: MIT

Training log

Training log
Downloads last month
77
Safetensors
Model size
7.34M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using afrizalha/Sasando-1-7M 1