Spaces:

Kc-12
/

TinyStories_Transformer

Sleeping

App Files Files Community

Kc-12 commited on Dec 7, 2023

Commit

6fd8787

•

1 Parent(s): e724aac

Upload app.py

Browse files

Files changed (1) hide show

app.py +19 -19

app.py CHANGED Viewed

@@ -1,54 +1,54 @@
 import streamlit as st
 import time
-import torch
 from better_transformer import *
 def main():
     # Enable CUDA if available and load in tokenizer
     device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
     tokenizer, EMPTY_TOKENS = load_tokenizer(device)
-    st.title("Scaling Transformers")
     st.subheader("UCLA DSU Project, Fall 2023")
-    st.markdown("Daniel Mendelevitch  \n Terry Ming  \n Casey Tattersall  \n Sean Tjoa")
-    st.header("What Are Transformers? 🚗🔄🤖")
-    header_text = """A transformer is a specific type of neural network that uses a mechanism called self-attention to learn the context (and
-        thus meaning) of sequential data. Transformer-based models can be used in many different domains, such as processing language, predicting
-        the weather, or even generating images.  \n\n You might be familiar with ChatGPT, a Transformer-based model which cost over \$100 million to train.  \n In contrast, we spent \$40*.
-        """
-    st.markdown(header_text)
     st.header("Let's make some stories! 📖")
     # Input from user
-    user_input = st.text_input("Enter your prompt:", placeholder="Write a prompt to make a story of your own or leave it empty for a random story!").strip()
     if st.checkbox("Show Prompting Tips"):
-        st.markdown("Our model was trained on the TinyStories dataset, a collection of synthetic short stories generated by GPT-4. These stories only contain words and themes that a typical 3-4 year old would understand.")
         st.markdown(
             """
             - Use simple vocabulary - words and themes that would appear in a children's story
             - Avoid using idioms - for example, instead of "hit the gym", say "went to the gym"
             - Include plenty of descriptive adjectives
-            - The model often struggles with names - using common names and only including a person's first name can help
             """
         )
     ## Default values for advanced settings
-    user_seed = None # Set to a value if we want to rig the "random" demo
     generation_method = "top-k"
     specified_k = 5
     specified_nucleus = 0.5
     specified_temperature = 0.9
-    max_tokens = 500
     if st.checkbox("Show Advanced Settings"):
         user_seed = st.number_input("Randomness Seed:", value = None, step = 1, placeholder="Use to replicate response", min_value = 1)
-        generation_method = st.selectbox("Method of Generation:", ("top-k", "multinomial", "temperature", "greedy", "nucleus"), index = 0).strip()
         if generation_method == "top-k":
             specified_k = st.number_input("Value for k:", value = 5, step = 1)
@@ -59,7 +59,7 @@ def main():
         if generation_method == "temperature":
             specified_temperature = st.number_input("Value for temperature:", value = 0.9, step = 0.05, min_value = 0.0, max_value = 1.0)
-        max_tokens = st.slider('Max Tokens Generated:', 100, 800, 500)
@@ -72,7 +72,6 @@ def main():
     model.cuda()
     if st.button('Write my story!'):
         placeholder = st.empty()
         # if model_version == 'smoll':
@@ -114,7 +113,8 @@ def main():
         if st.button('Clear Output'):
             placeholder = st.empty()
 if __name__ == "__main__":

 import streamlit as st
 import time
 from better_transformer import *
 def main():
     # Enable CUDA if available and load in tokenizer
     device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
     tokenizer, EMPTY_TOKENS = load_tokenizer(device)
+    st.title("Short Story Transformer Demo")
     st.subheader("UCLA DSU Project, Fall 2023")
+    st.markdown("By Daniel Mendelevitch, Terry Ming, Casey Tattersall, Sean Tjoa")
+    st.header("Data and Training")
+    st.markdown("""We used the dataset from the [TinyStories Research Paper](https://arxiv.org/pdf/2305.07759.pdf) (Ronen Eldan and Yuanzhi Li, Microsoft),
+    which consists of 2.1 million synthetic short children's stories generated by GPT-4, to train a Transformer LLM that we built from scratch in PyTorch.""")
+    st.markdown("""Our final model uses EleutherAI's [gpt-neo-1.3B tokenizer](https://huggingface.co/EleutherAI/gpt-neo-1.3B) (vocab size 50,256) and consists of 8 transformer blocks,
+    16 attention heads, and an embedding dimension of 768, for a total of 133M parameters. The model was trained on 8 H100 GPUs for ~7 hours, and has a cross-entropy validation loss of 1.16,
+    which is superior to any model in the TinyStories paper (likely due to a larger vocab size and far more compute).""")
+    st.markdown("""Despite the simple themes and limited vocabulary present in the training data, the model is
+    quite effective at generating new short stories. **Try it out below!**""")
     st.header("Let's make some stories! 📖")
     # Input from user
+    user_input = st.text_input("Enter your prompt:", placeholder="Write a prompt to make a story of your own, or leave it empty for a random story!").strip()
     if st.checkbox("Show Prompting Tips"):
+        st.markdown("The model can struggle with some prompts, especially those outside of its limited domain. If a response isn't satisfactory, try repeating the generation, or make the following modifications:")
         st.markdown(
             """
             - Use simple vocabulary - words and themes that would appear in a children's story
             - Avoid using idioms - for example, instead of "hit the gym", say "went to the gym"
             - Include plenty of descriptive adjectives
+            - The model often struggles with names. **Using common names and sticking with first names only can help.**
             """
         )
     ## Default values for advanced settings
+    user_seed = None # Remove if we're not rigging the "random" demo
     generation_method = "top-k"
     specified_k = 5
     specified_nucleus = 0.5
     specified_temperature = 0.9
+    max_tokens = 750
     if st.checkbox("Show Advanced Settings"):
         user_seed = st.number_input("Randomness Seed:", value = None, step = 1, placeholder="Use to replicate response", min_value = 1)
+        generation_method = st.selectbox("Method of Generation:", ("top-k", "nucleus", "temperature", "multinomial", "greedy"), index = 0).strip()
         if generation_method == "top-k":
             specified_k = st.number_input("Value for k:", value = 5, step = 1)
         if generation_method == "temperature":
             specified_temperature = st.number_input("Value for temperature:", value = 0.9, step = 0.05, min_value = 0.0, max_value = 1.0)
+        max_tokens = st.slider('Max Tokens Generated:', 50, 750, 750)
     model.cuda()
     if st.button('Write my story!'):
         placeholder = st.empty()
         # if model_version == 'smoll':
         if st.button('Clear Output'):
             placeholder = st.empty()
+    st.markdown('####')
+    st.caption(r'Data Attribution: Tinystories (License: CDLA-Sharing-1.0)  https://arxiv.org/abs/2305.07759')
 if __name__ == "__main__":