Spaces:

Kc-12
/

TinyStories_Transformer

Sleeping

App Files Files Community

Kc-12 commited on Dec 14, 2023

Commit

de247ac

•

1 Parent(s): 66f6dbb

Upload app.py

Browse files

Files changed (1) hide show

app.py +19 -16

app.py CHANGED Viewed

@@ -9,34 +9,36 @@ def main():
     device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
     tokenizer, EMPTY_TOKENS = load_tokenizer(device)
-    st.title("Short Story Transformer Demo")
-    st.subheader("UCLA DSU Project, Fall 2023")
-    st.markdown("By Daniel Mendelevitch, Terry Ming, Casey Tattersall, Sean Tjoa")
-    st.header("Data and Training")
     st.markdown("""We used the dataset from Microsoft Research's [TinyStories Paper](https://arxiv.org/pdf/2305.07759.pdf) (Eldan and Li),
-    which consists of 2.1 million synthetic short children's stories generated by GPT-4, to train a Transformer LLM that we built from scratch in PyTorch.""")
     st.markdown("""Our model uses EleutherAI's [gpt-neo-1.3B tokenizer](https://huggingface.co/EleutherAI/gpt-neo-1.3B) (vocab size 50,257) and consists of 8 transformer blocks,
-    16 attention heads, and an embedding dimension of 768, for a total of ~56M non-embedding parameters. The model was trained on 8 H100 GPUs for 7 hours, achieving a cross-entropy validation loss of 1.16,
-    which is superior to all models in the TinyStories paper (likely due to a larger vocab size and far more compute).""")
     st.markdown("""Despite the simple themes and limited vocabulary present in the training data, the model is
     quite effective at generating new short stories. **Try it out below!**""")
-    st.header("Prompting Tips")
     st.markdown(
-        "The model can struggle with some prompts, especially those outside of its limited domain. If a response isn't satisfactory, try repeating the generation, or make the following modifications:"
     )
     st.markdown(
         """
-        - Use simple vocabulary - words and themes that would appear in a children's story.
-        - Avoid using idioms - for example, instead of "hit the gym", say "went to the gym".
-        - Include plenty of descriptive adjectives.
-        - The model often struggles with names. **Using common names and sticking with first names only can help.**
         """
     )
-    st.header("Let's make some stories! 📖")
     # Input from user
     user_input = st.text_input("Enter your prompt:", placeholder="Write a prompt to make a story of your own, or leave it empty for a random story!").strip()
@@ -71,8 +73,8 @@ def main():
     # model_version = st.radio("Which model would you like to use?", ["smoll", "beeg"])
     # small_model = load_casey_model(tokenizer, device)
     model = load_big_model(tokenizer, device)
-    model.to('cuda')
-    model.cuda()
     if st.button('Write my story!'):
@@ -121,6 +123,7 @@ def main():
             placeholder = st.empty()
     st.markdown('####')
     st.caption(r'Data Attribution: Tinystories (License: CDLA-Sharing-1.0)  https://arxiv.org/abs/2305.07759')

     device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
     tokenizer, EMPTY_TOKENS = load_tokenizer(device)
+    st.title("TinyStories Transformer LLM Demo")
+    st.subheader("Data and Training")
     st.markdown("""We used the dataset from Microsoft Research's [TinyStories Paper](https://arxiv.org/pdf/2305.07759.pdf) (Eldan and Li),
+    which consists of 2.1 million synthetic short children's stories generated by GPT-4, to train a PyTorch Transformer LLM.""")
     st.markdown("""Our model uses EleutherAI's [gpt-neo-1.3B tokenizer](https://huggingface.co/EleutherAI/gpt-neo-1.3B) (vocab size 50,257) and consists of 8 transformer blocks,
+    16 attention heads, and an embedding dimension of 768, for a total of ~56M non-embedding parameters. The model was trained overnight on 8 H100 GPUs, achieving a lower cross-entropy
+    validation loss than any of the models in the TinyStories paper (likely due to a larger vocab size).""")
     st.markdown("""Despite the simple themes and limited vocabulary present in the training data, the model is
     quite effective at generating new short stories. **Try it out below!**""")
+    st.subheader("How Do I Prompt?")
     st.markdown(
+        """
+        Instead of generating a new story from scratch, you can "prompt" the model by writing the first few words or sentences of a story, and let it finish from there. It can even jump in mid-sentence!
+        The model can struggle with some prompts, especially those outside of its limited domain. If a response isn't satisfactory, try repeating the generation, or make the following modifications:
+        """
     )
     st.markdown(
         """
+        - **Use simple vocabulary and syntax** - words, structures, and themes you'd see in a children's story.
+        - Use common first names only - the model can struggle with longer or uncommon names.
+        `SAMPLE PROMPT: One day, Timmy and Lily were playing at the park. They decided to`
         """
     )
+    st.subheader("Let's make some stories! 📖")
     # Input from user
     user_input = st.text_input("Enter your prompt:", placeholder="Write a prompt to make a story of your own, or leave it empty for a random story!").strip()
     # model_version = st.radio("Which model would you like to use?", ["smoll", "beeg"])
     # small_model = load_casey_model(tokenizer, device)
     model = load_big_model(tokenizer, device)
+    #model.to('cuda')
+    #model.cuda()
     if st.button('Write my story!'):
             placeholder = st.empty()
     st.markdown('####')
+    st.caption('UCLA DSU Project Fall 2023: Daniel Mendelevitch, Terry Ming, Casey Tattersall, Sean Tjoa')
     st.caption(r'Data Attribution: Tinystories (License: CDLA-Sharing-1.0)  https://arxiv.org/abs/2305.07759')