| import streamlit as st | |
| st.markdown( | |
| """ | |
| <style> | |
| /* App Background */ | |
| .stApp { | |
| background: linear-gradient(to right , #EE82EE, #FFA500 ,#87CEEB); /* Gradient dark professional background */ | |
| color: #00FFFF; | |
| padding: 20px; | |
| } | |
| /* Align content to the left */ | |
| .block-container { | |
| text-align: left; /* Left align for content */ | |
| padding: 2rem; /* Padding for aesthetics */ | |
| } | |
| /* Header and Subheader Text */ | |
| h1 { | |
| color: #800080 !important; /* Custom styling for the main header */ | |
| font-family: 'Arial', sans-serif !important; | |
| font-weight: bold !important; | |
| text-align: center; | |
| } | |
| h2, h3, h4 { | |
| color: #FFFF00 !important; /* Custom styling for subheaders */ | |
| font-family: 'Arial', sans-serif !important; | |
| font-weight: bold !important; | |
| } | |
| /* Paragraph Text */ | |
| p { | |
| color: #0000FF !important; /* Custom styling for paragraphs */ | |
| font-family: 'Arial', sans-serif !important; | |
| line-height: 1.6; | |
| } | |
| </style> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h1 style="text-align: center;">Basic Terminology in NLP</h1> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5>Before diving deep into the concepts of NLP we must know about the frequently used terminologies in NLP</h5> | |
| <h5 style="color: ##00FF00;">1.Key Terminologies in NLP</h5> | |
| <ul style="color: #008000; line-height: 1.8;"> | |
| <li><b>Corpus:</b> A collection of text documents. Example: {d1, d2, d3, ...}</li> | |
| <li><b>Document:</b> A single unit of text (e.g., a sentence, paragraph, or article).</li> | |
| <li><b>Paragraph:</b> A collection of sentences.</li> | |
| <li><b>Sentence:</b> A collection of words forming a meaningful expression.</li> | |
| <li><b>Word:</b> A collection of characters.</li> | |
| <li><b>Character:</b> A basic unit like an alphabet, number, or special symbol.</li> | |
| </ul> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5 style="color: #00FFFF;">2.Tokenization</h5> | |
| <p style="color: #FFA500;">Tokenization is the process of breaking down a large piece of text into smaller units called tokens. These tokens can be words, sentences, or subwords, depending on the granularity required for the task.</p> | |
| <h6>Types of Tokenization:</h6> | |
| <ul style="color: #d4e6f1; line-height: 1.8;"> | |
| <li><b>Sentence Tokenization:</b> Splitting text into sentences. <br> Example: "I love ice-cream. I love chocolate." β ["I love ice-cream", "I love chocolate"]</li> | |
| <li><b>Word Tokenization:</b> Splitting sentences into words. <br> Example: "I love biryani" β ["I", "love", "biryani"]</li> | |
| <li><b>Character Tokenization:</b> Splitting words into characters. <br> Example: "Love" β ["L", "o", "v","e"]</li> | |
| </ul> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5 style="color: #008080;">3.Stop Words</h5> | |
| <p style="color: #000080;">Stop words are commonly used words in a language that carry little or no meaningful information for text analysis. </p> | |
| <h6>Example:</h6> | |
| <p style="color: #d4e6f1;">"In Hyderabad, we can eat famous biryani." <br> Stop words: ["in", "we", "can"]</p> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5 style="color: #20B2AA;">4.Vectorization</h5> | |
| <p style="color: #d4e6f1;">Vectorization is the process of converting text data into numerical representations so that machine learning models can process and analyze it.</p> | |
| <h6>Types of Vectorization:</h6> | |
| <ul style="color: #d4e6f1; line-height: 1.8;"> | |
| <li><b>One-Hot Encoding:</b> Represents each word as a binary vector.</li> | |
| <li><b>Bag of Words (BoW):</b> Represents text based on word frequencies.</li> | |
| <li><b>TF-IDF:</b> Adjusts word frequency by importance.</li> | |
| <li><b>Word2Vec:</b> Embeds words in a vector space using deep learning.</li> | |
| <li><b>GloVe:</b> Uses global co-occurrence statistics for embedding.</li> | |
| <li><b>FastText:</b> Similar to Word2Vec but includes subword information.</li> | |
| </ul> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5 style="color: #20B2AA;">5. Stemming</h5> | |
| <p style="color: #d4e6f1;">Stemming is the process of reducing words to their base or root form, often by removing prefixes or suffixes. It is a rule-based, heuristic approach to standardize words by removing derivational affixes.</p> | |
| <h6>Example:</h6> | |
| <ul style="color: #d4e6f1; line-height: 1.8;"> | |
| <li><b>Original Words:</b> "running", "runner", "runs"</li> | |
| <li><b>Stemmed Form:</b> "run"</li> | |
| </ul> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5 style="color: #20B2AA;">6. Lemmatization</h5> | |
| <p style="color: #d4e6f1;">Lemmatization is the process of reducing a word to its base or root form (called a lemma) using linguistic rules and a vocabulary (dictionary). Unlike stemming, lemmatization ensures that the resulting word is a valid word in the language.</p> | |
| <h6>Example:</h6> | |
| <ul style="color: #d4e6f1; line-height: 1.8;"> | |
| <li><b>Original Words:</b> "studying", "better", "carrying"</li> | |
| <li><b>Lemmatized Form:</b> "study", "good", "carry"</li> | |
| </ul> | |
| <p style="color: #d4e6f1;">Lemmatization is more accurate than stemming but computationally more intensive as it requires a language dictionary.</p> | |
| """, | |
| unsafe_allow_html=True | |
| ) |