Spaces:

imh0
/

transformers-p1-embeddings

Runtime error

App Files Files Community

im commited on Aug 6, 2023

Commit

30bd82e

1 Parent(s): 0a8544d

refine introdution

Browse files

Files changed (1) hide show

app.py +38 -26

app.py CHANGED Viewed

@@ -24,39 +24,46 @@ def get_embeddings(text):
 st.title("Transformers: Tokenisers and Embeddings")
 preface_image, preface_text,  = st.columns(2)
-# preface_image.image("https://static.streamlit.io/examples/dice.jpg")
-# preface_image.image("""https://assets.digitalocean.com/articles/alligator/boo.svg""")
 preface_text.write("""\
-    *Transformers represent a revolutionary class of machine learning architectures that have sparked
-    immense interest. While numerous insightful tutorials are available, the evolution of transformer architectures over
-    the last few years has led to significant simplifications. These advancements have made it increasingly
-    straightforward to understand their inner workings. In this series of articles, I aim to provide a direct, clear explanation of
-    how and why modern transformers function, unburdened by the historical complexities associated with their inception.*
 """)
 divider()
 st.write("""\
-    In order to understand the recent success in AI we need to understand the Transformer architecture. Its
-    rise in the field of Natural Language Processing (NLP) is largely attributed to a combination of several key
-    advancements:
-    - Tokenisers and Embeddings
-    - Attention and Self-Attention
-    - Encoder-Decoder architecture
     Understanding these foundational concepts is crucial to comprehending the overall structure and function of the
     Transformer model. They are the building blocks from which the rest of the model is constructed, and their roles
     within the architecture are essential to the model's ability to process and generate language. In my view,
-    a comprehensive and simple explanation may give a reader a significant advantage in using LLMs. Feynman once said:
-    "*I think I can safely say that nobody understands quantum mechanics.*". Because he couldn't explain it to a freshman.
-    Given the importance and complexity of these concepts, I have chosen to dedicate the first article in this series
-    solely to Tokenisation and embeddings. The decision to separate the topics into individual articles is driven by a
-    desire to provide a thorough and in-depth understanding of each component of the Transformer model.
-    Note: *HuggingFace provides an exceptional [tutorial on Transformer models](https://huggingface.co/docs/transformers/index).
-    That tutorial is particularly beneficial for readers willing to dive into advanced topics.*
 """)
 with st.expander("Copernicus Museum in Warsaw"):
@@ -72,10 +79,15 @@ with st.expander("Copernicus Museum in Warsaw"):
 """)
     st.image("https://i.pinimg.com/originals/04/11/2c/04112c791a859d07a01001ac4f436e59.jpg")
 divider()
-st.header("Tokenisers and Tokenisation")
 st.write("""\
     Tokenisation is the initial step in the data preprocessing pipeline for natural language processing (NLP)
@@ -713,7 +725,7 @@ with st.expander("References:"):
 # *********************************************
 divider()
-st.header("Dimensionality Reduction (optional)")
 st.write("""\
     As was mentioned above, embedding vectors are learned in such a way that words with similar meanings

 st.title("Transformers: Tokenisers and Embeddings")
 preface_image, preface_text,  = st.columns(2)
 preface_text.write("""\
+"*I think I can safely say that nobody understands quantum mechanics.*" R. Feynman
 """)
 divider()
 st.write("""\
+    Did you know that the leading AI models powering speech recognition, language translation,
+    and even your email auto-responses owe their capabilities to a single, revolutionary concept: the Transformer
+    architecture?
+    Artificial Intelligence (AI) has seen remarkable progress in the last decade, and a significant part of that is due
+    to advancements in Natural Language Processing (NLP). NLP, a subset of AI, involves the interaction between computers
+    and human language, making it possible for AI to understand, interpret, and generate human language in a valuable
+    way. Within this realm of NLP, a game-changer has emerged: the Transformer model. With its innovative architecture
+    and remarkable performance, the Transformer model has revolutionised how machines understand and generate human
+    language.
+    However, the complexity of Transformer models can be daunting, making them seem inaccessible to those without
+    extensive technical expertise. This creates a barrier to understanding, utilising, and improving upon these powerful
+    tools.
+    That's why I'm embarking on this series of articles, breaking down the key components of Transformer models into
+    digestible, easy-to-understand concepts. I have chosen to dedicate the first article in this series solely to
+    Tokenisers and Embeddings. The article has the following structure:
+    - [Tokenisers](#tokenisers)
+    - [Embeddings](#embeddings)
+    - [Vector Databases](#vector-databases)
+    - [Dimensionality Reduction](#dimensionality-reduction)
     Understanding these foundational concepts is crucial to comprehending the overall structure and function of the
     Transformer model. They are the building blocks from which the rest of the model is constructed, and their roles
     within the architecture are essential to the model's ability to process and generate language. In my view,
+    a comprehensive and simple explanation may give a reader a significant advantage in using LLMs.
+    Are you ready to take a deep dive into the world of Transformers? I promise that by the end of this series,
+    you'll have a clearer understanding of how these complex models work and how they contribute to the remarkable
+    capabilities of modern AI.
 """)
 with st.expander("Copernicus Museum in Warsaw"):
 """)
     st.image("https://i.pinimg.com/originals/04/11/2c/04112c791a859d07a01001ac4f436e59.jpg")
+st.write("""\
+    Note: *HuggingFace provides an exceptional [tutorial on Transformer models](https://huggingface.co/docs/transformers/index).
+    That tutorial is particularly beneficial for readers willing to dive into advanced topics.*
+""")
 divider()
+st.header("Tokenisers")
 st.write("""\
     Tokenisation is the initial step in the data preprocessing pipeline for natural language processing (NLP)
 # *********************************************
 divider()
+st.header("Dimensionality Reduction")
 st.write("""\
     As was mentioned above, embedding vectors are learned in such a way that words with similar meanings