Spaces:

Kc-12
/

TinyStories_Transformer

Sleeping

Update vocab and model size

by terru3 - opened Dec 7, 2023

←

Files changed (1) hide show

app.py CHANGED Viewed

@@ -17,8 +17,8 @@ def main():
     st.markdown("""We used the dataset from the [TinyStories Research Paper](https://arxiv.org/pdf/2305.07759.pdf) (Ronen Eldan and Yuanzhi Li, Microsoft),
     which consists of 2.1 million synthetic short children's stories generated by GPT-4, to train a Transformer LLM that we built from scratch in PyTorch.""")
-    st.markdown("""Our final model uses EleutherAI's [gpt-neo-1.3B tokenizer](https://huggingface.co/EleutherAI/gpt-neo-1.3B) (vocab size 50,256) and consists of 8 transformer blocks,
-    16 attention heads, and an embedding dimension of 768, for a total of 133M parameters. The model was trained on 8 H100 GPUs for ~7 hours, and has a cross-entropy validation loss of 1.16,
     which is superior to any model in the TinyStories paper (likely due to a larger vocab size and far more compute).""")
     st.markdown("""Despite the simple themes and limited vocabulary present in the training data, the model is
     quite effective at generating new short stories. **Try it out below!**""")

     st.markdown("""We used the dataset from the [TinyStories Research Paper](https://arxiv.org/pdf/2305.07759.pdf) (Ronen Eldan and Yuanzhi Li, Microsoft),
     which consists of 2.1 million synthetic short children's stories generated by GPT-4, to train a Transformer LLM that we built from scratch in PyTorch.""")
+    st.markdown("""Our final model uses EleutherAI's [gpt-neo-1.3B tokenizer](https://huggingface.co/EleutherAI/gpt-neo-1.3B) (vocab size 50,257) and consists of 8 transformer blocks,
+    16 attention heads, and an embedding dimension of 768, for a total of ~56M non-embedding parameters. The model was trained on 8 H100 GPUs for ~7 hours, achieving a cross-entropy validation loss of 1.16,
     which is superior to any model in the TinyStories paper (likely due to a larger vocab size and far more compute).""")
     st.markdown("""Despite the simple themes and limited vocabulary present in the training data, the model is
     quite effective at generating new short stories. **Try it out below!**""")