im commited on
Commit
30bd82e
1 Parent(s): 0a8544d

refine introdution

Browse files
Files changed (1) hide show
  1. app.py +38 -26
app.py CHANGED
@@ -24,39 +24,46 @@ def get_embeddings(text):
24
  st.title("Transformers: Tokenisers and Embeddings")
25
 
26
  preface_image, preface_text, = st.columns(2)
27
- # preface_image.image("https://static.streamlit.io/examples/dice.jpg")
28
- # preface_image.image("""https://assets.digitalocean.com/articles/alligator/boo.svg""")
29
  preface_text.write("""\
30
- *Transformers represent a revolutionary class of machine learning architectures that have sparked
31
- immense interest. While numerous insightful tutorials are available, the evolution of transformer architectures over
32
- the last few years has led to significant simplifications. These advancements have made it increasingly
33
- straightforward to understand their inner workings. In this series of articles, I aim to provide a direct, clear explanation of
34
- how and why modern transformers function, unburdened by the historical complexities associated with their inception.*
35
  """)
36
 
37
  divider()
38
 
39
  st.write("""\
40
- In order to understand the recent success in AI we need to understand the Transformer architecture. Its
41
- rise in the field of Natural Language Processing (NLP) is largely attributed to a combination of several key
42
- advancements:
43
-
44
- - Tokenisers and Embeddings
45
- - Attention and Self-Attention
46
- - Encoder-Decoder architecture
47
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  Understanding these foundational concepts is crucial to comprehending the overall structure and function of the
49
  Transformer model. They are the building blocks from which the rest of the model is constructed, and their roles
50
  within the architecture are essential to the model's ability to process and generate language. In my view,
51
- a comprehensive and simple explanation may give a reader a significant advantage in using LLMs. Feynman once said:
52
- "*I think I can safely say that nobody understands quantum mechanics.*". Because he couldn't explain it to a freshman.
53
-
54
- Given the importance and complexity of these concepts, I have chosen to dedicate the first article in this series
55
- solely to Tokenisation and embeddings. The decision to separate the topics into individual articles is driven by a
56
- desire to provide a thorough and in-depth understanding of each component of the Transformer model.
57
-
58
- Note: *HuggingFace provides an exceptional [tutorial on Transformer models](https://huggingface.co/docs/transformers/index).
59
- That tutorial is particularly beneficial for readers willing to dive into advanced topics.*
60
  """)
61
 
62
  with st.expander("Copernicus Museum in Warsaw"):
@@ -72,10 +79,15 @@ with st.expander("Copernicus Museum in Warsaw"):
72
  """)
73
  st.image("https://i.pinimg.com/originals/04/11/2c/04112c791a859d07a01001ac4f436e59.jpg")
74
 
 
 
 
 
 
75
  divider()
76
 
77
 
78
- st.header("Tokenisers and Tokenisation")
79
 
80
  st.write("""\
81
  Tokenisation is the initial step in the data preprocessing pipeline for natural language processing (NLP)
@@ -713,7 +725,7 @@ with st.expander("References:"):
713
 
714
  # *********************************************
715
  divider()
716
- st.header("Dimensionality Reduction (optional)")
717
 
718
  st.write("""\
719
  As was mentioned above, embedding vectors are learned in such a way that words with similar meanings
 
24
  st.title("Transformers: Tokenisers and Embeddings")
25
 
26
  preface_image, preface_text, = st.columns(2)
 
 
27
  preface_text.write("""\
28
+ "*I think I can safely say that nobody understands quantum mechanics.*" R. Feynman
 
 
 
 
29
  """)
30
 
31
  divider()
32
 
33
  st.write("""\
34
+ Did you know that the leading AI models powering speech recognition, language translation,
35
+ and even your email auto-responses owe their capabilities to a single, revolutionary concept: the Transformer
36
+ architecture?
37
+
38
+ Artificial Intelligence (AI) has seen remarkable progress in the last decade, and a significant part of that is due
39
+ to advancements in Natural Language Processing (NLP). NLP, a subset of AI, involves the interaction between computers
40
+ and human language, making it possible for AI to understand, interpret, and generate human language in a valuable
41
+ way. Within this realm of NLP, a game-changer has emerged: the Transformer model. With its innovative architecture
42
+ and remarkable performance, the Transformer model has revolutionised how machines understand and generate human
43
+ language.
44
+
45
+ However, the complexity of Transformer models can be daunting, making them seem inaccessible to those without
46
+ extensive technical expertise. This creates a barrier to understanding, utilising, and improving upon these powerful
47
+ tools.
48
+
49
+ That's why I'm embarking on this series of articles, breaking down the key components of Transformer models into
50
+ digestible, easy-to-understand concepts. I have chosen to dedicate the first article in this series solely to
51
+ Tokenisers and Embeddings. The article has the following structure:
52
+
53
+ - [Tokenisers](#tokenisers)
54
+ - [Embeddings](#embeddings)
55
+ - [Vector Databases](#vector-databases)
56
+ - [Dimensionality Reduction](#dimensionality-reduction)
57
+
58
  Understanding these foundational concepts is crucial to comprehending the overall structure and function of the
59
  Transformer model. They are the building blocks from which the rest of the model is constructed, and their roles
60
  within the architecture are essential to the model's ability to process and generate language. In my view,
61
+ a comprehensive and simple explanation may give a reader a significant advantage in using LLMs.
62
+
63
+ Are you ready to take a deep dive into the world of Transformers? I promise that by the end of this series,
64
+ you'll have a clearer understanding of how these complex models work and how they contribute to the remarkable
65
+ capabilities of modern AI.
66
+
 
 
 
67
  """)
68
 
69
  with st.expander("Copernicus Museum in Warsaw"):
 
79
  """)
80
  st.image("https://i.pinimg.com/originals/04/11/2c/04112c791a859d07a01001ac4f436e59.jpg")
81
 
82
+ st.write("""\
83
+ Note: *HuggingFace provides an exceptional [tutorial on Transformer models](https://huggingface.co/docs/transformers/index).
84
+ That tutorial is particularly beneficial for readers willing to dive into advanced topics.*
85
+ """)
86
+
87
  divider()
88
 
89
 
90
+ st.header("Tokenisers")
91
 
92
  st.write("""\
93
  Tokenisation is the initial step in the data preprocessing pipeline for natural language processing (NLP)
 
725
 
726
  # *********************************************
727
  divider()
728
+ st.header("Dimensionality Reduction")
729
 
730
  st.write("""\
731
  As was mentioned above, embedding vectors are learned in such a way that words with similar meanings