Spaces:
Sleeping
Sleeping
import streamlit as st | |
# Page configuration | |
st.set_page_config( | |
layout="wide", | |
initial_sidebar_state="auto" | |
) | |
# Custom CSS for better styling | |
st.markdown(""" | |
<style> | |
.main-title { | |
font-size: 36px; | |
color: #4A90E2; | |
font-weight: bold; | |
text-align: center; | |
} | |
.sub-title { | |
font-size: 24px; | |
color: #4A90E2; | |
margin-top: 20px; | |
} | |
.section { | |
background-color: #f9f9f9; | |
padding: 15px; | |
border-radius: 10px; | |
margin-top: 20px; | |
} | |
.section h2 { | |
font-size: 22px; | |
color: #4A90E2; | |
} | |
.section p, .section ul { | |
color: #666666; | |
} | |
.link { | |
color: #4A90E2; | |
text-decoration: none; | |
} | |
.benchmark-table { | |
width: 100%; | |
border-collapse: collapse; | |
margin-top: 20px; | |
} | |
.benchmark-table th, .benchmark-table td { | |
border: 1px solid #ddd; | |
padding: 8px; | |
text-align: left; | |
} | |
.benchmark-table th { | |
background-color: #4A90E2; | |
color: white; | |
} | |
.benchmark-table td { | |
background-color: #f2f2f2; | |
} | |
</style> | |
""", unsafe_allow_html=True) | |
# Title | |
st.markdown('<div class="main-title">Introduction to RoBERTa Annotators in Spark NLP</div>', unsafe_allow_html=True) | |
# Subtitle | |
st.markdown(""" | |
<div class="section"> | |
<p>RoBERTa (A Robustly Optimized BERT Pretraining Approach) builds on BERT's language model by modifying key hyperparameters and pretraining techniques to enhance its performance. RoBERTa achieves state-of-the-art results in various NLP tasks. Below, we provide an overview of the RoBERTa annotator for token classification, zero-shot classification, and sequence classification:</p> | |
</div> | |
""", unsafe_allow_html=True) | |
tab1, tab2, tab3, tab4 = st.tabs(["RoBERTa for Token Classification", "RoBERTa for Zero Shot Classification", "RoBERTa for Sequence Classification", "RoBERTa for Question Answering"]) | |
with tab1: | |
st.markdown(""" | |
<div class="section"> | |
<h2>RoBERTa for Token Classification</h2> | |
<p>The <strong>RoBertaForTokenClassification</strong> annotator is designed for Named Entity Recognition (NER) tasks using the RoBERTa model. This pretrained model is adapted from a Hugging Face model and imported into Spark NLP, offering robust performance in identifying and classifying entities in text. The RoBERTa model, with its large-scale pretraining, delivers state-of-the-art results on NER tasks.</p> | |
<p>Token classification with RoBERTa enables:</p> | |
<ul> | |
<li><strong>Named Entity Recognition (NER):</strong> Identifying and classifying entities such as miscellaneous (MISC), organizations (ORG), locations (LOC), and persons (PER).</li> | |
<li><strong>Information Extraction:</strong> Extracting key information from unstructured text for further analysis.</li> | |
<li><strong>Text Categorization:</strong> Enhancing document retrieval and categorization based on entity recognition.</li> | |
</ul> | |
<p>Here is an example of how RoBERTa token classification works:</p> | |
<table class="benchmark-table"> | |
<tr> | |
<th>Entity</th> | |
<th>Label</th> | |
</tr> | |
<tr> | |
<td>Apple</td> | |
<td>ORG</td> | |
</tr> | |
<tr> | |
<td>Elon Musk</td> | |
<td>PER</td> | |
</tr> | |
<tr> | |
<td>California</td> | |
<td>LOC</td> | |
</tr> | |
</table> | |
</div> | |
""", unsafe_allow_html=True) | |
# RoBERTa Token Classification - NER Large | |
st.markdown('<div class="sub-title">RoBERTa Token Classification - NER Large</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<p>The <strong>roberta_ner_roberta_large_ner_english</strong> is a fine-tuned RoBERTa model for token classification tasks, specifically adapted for Named Entity Recognition (NER) on English text. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER), and Miscellaneous (MISC).</p> | |
</div> | |
""", unsafe_allow_html=True) | |
# How to Use the Model - Token Classification | |
st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True) | |
st.code(''' | |
from sparknlp.base import * | |
from sparknlp.annotator import * | |
from pyspark.ml import Pipeline | |
from pyspark.sql.functions import col, expr | |
document_assembler = DocumentAssembler() \\ | |
.setInputCol("text") \\ | |
.setOutputCol("document") | |
sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\\ | |
.setInputCols(["document"])\\ | |
.setOutputCol("sentence") | |
tokenizer = Tokenizer() \\ | |
.setInputCols(["sentence"]) \\ | |
.setOutputCol("token") | |
tokenClassifier = RoBertaForTokenClassification \\ | |
.pretrained("roberta_ner_roberta_large_ner_english", "en") \\ | |
.setInputCols(["sentence", "token"]) \\ | |
.setOutputCol("ner") | |
ner_converter = NerConverter() \\ | |
.setInputCols(['sentence', 'token', 'ner']) \\ | |
.setOutputCol('entities') | |
pipeline = Pipeline(stages=[ | |
document_assembler, | |
sentenceDetector, | |
tokenizer, | |
tokenClassifier, | |
ner_converter | |
]) | |
data = spark.createDataFrame([["William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, and philanthropist. He is best known as the co-founder of Microsoft Corporation. During his career at Microsoft, Gates held the positions of chairman, chief executive officer (CEO), president and chief software architect, while also being the largest individual shareholder until May 2014. He is one of the best-known entrepreneurs and pioneers of the microcomputer revolution of the 1970s and 1980s. Born and raised in Seattle, Washington, Gates co-founded Microsoft with childhood friend Paul Allen in 1975, in Albuquerque, New Mexico; it went on to become the world's largest personal computer software company. Gates led the company as chairman and CEO until stepping down as CEO in January 2000, but he remained chairman and became chief software architect. During the late 1990s, Gates had been criticized for his business tactics, which have been considered anti-competitive. This opinion has been upheld by numerous court rulings. In June 2006, Gates announced that he would be transitioning to a part-time role at Microsoft and full-time work at the Bill & Melinda Gates Foundation, the private charitable foundation that he and his wife, Melinda Gates, established in 2000.[9] He gradually transferred his duties to Ray Ozzie and Craig Mundie. He stepped down as chairman of Microsoft in February 2014 and assumed a new post as technology adviser to support the newly appointed CEO Satya Nadella."]]).toDF("text") | |
result = pipeline.fit(data).transform(data) | |
result.select( | |
expr("explode(entities) as ner_chunk") | |
).select( | |
col("ner_chunk.result").alias("chunk"), | |
col("ner_chunk.metadata.entity").alias("ner_label") | |
).show(truncate=False) | |
''', language='python') | |
# Results | |
st.text(""" | |
+-------------------------------+---------+ | |
|chunk |ner_label| | |
+-------------------------------+---------+ | |
|William Henry Gates III |R | | |
|American |SC | | |
|Microsoft Corporation |G | | |
|Microsoft |G | | |
|Gates |R | | |
|Seattle |C | | |
|Washington |C | | |
|Gates co-founded Microsoft |R | | |
|Paul Allen |R | | |
|Albuquerque |C | | |
|New Mexico |C | | |
|Gates |R | | |
|Gates |R | | |
|Gates |R | | |
|Microsoft |G | | |
|Bill & Melinda Gates Foundation|G | | |
|Melinda Gates |R | | |
|Ray Ozzie |R | | |
|Craig Mundie |R | | |
|Microsoft |G | | |
+-------------------------------+---------+ | |
""") | |
# Model Info Section | |
st.markdown('<div class="sub-title">Model Info</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<ul> | |
<li><strong>Model Name:</strong> roberta_ner_roberta_large_ner_english</li> | |
<li><strong>Compatibility:</strong> Spark NLP 3.4.2+</li> | |
<li><strong>License:</strong> Open Source</li> | |
<li><strong>Edition:</strong> Official</li> | |
<li><strong>Input Labels:</strong> [document, token]</li> | |
<li><strong>Output Labels:</strong> [ner]</li> | |
<li><strong>Language:</strong> English (en)</li> | |
<li><strong>Size:</strong> 1.3 GB</li> | |
<li><strong>Case Sensitive:</strong> True</li> | |
<li><strong>Max Sentence Length:</strong> 128</li> | |
</ul> | |
</div> | |
""", unsafe_allow_html=True) | |
# References Section | |
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<ul> | |
<li><a class="link" href="https://huggingface.co/Jean-Baptiste/roberta-large-ner-english" target="_blank">Jean-Baptiste's RoBERTa NER Model on Hugging Face</a></li> | |
<li><a class="link" href="https://medium.com/@jean-baptiste.polle/lstm-model-for-email-signature-detection-8e990384fefa" target="_blank">LSTM Model for Email Signature Detection</a></li> | |
</ul> | |
</div> | |
""", unsafe_allow_html=True) | |
with tab2: | |
# RoBERTa Zero-Shot Classification | |
st.markdown(""" | |
<div class="section"> | |
<h2>RoBERTa for Zero-Shot Classification</h2> | |
<p>The <strong>RoBertaForZeroShotClassification</strong> annotator is designed for zero-shot text classification, particularly in English. This model utilizes the RoBERTa Base architecture fine-tuned on Natural Language Inference (NLI) tasks, allowing it to classify text into labels it has not seen during training.</p> | |
<p>Key features of this model include:</p> | |
<ul> | |
<li><strong>Zero-Shot Classification:</strong> Classify text into dynamic categories defined at runtime without requiring predefined classes.</li> | |
<li><strong>Flexibility:</strong> Adjusts to different classification scenarios by specifying candidate labels as needed.</li> | |
<li><strong>Model Foundation:</strong> Based on RoBERTa and fine-tuned with NLI data for robust performance across various tasks.</li> | |
</ul> | |
<p>This model is ideal for applications where predefined categories are not available or frequently change, offering flexibility and adaptability in text classification tasks.</p> | |
<table class="benchmark-table"> | |
<tr> | |
<th>Text</th> | |
<th>Predicted Category</th> | |
</tr> | |
<tr> | |
<td>"I have a problem with my iPhone that needs to be resolved ASAP!!"</td> | |
<td>Urgent</td> | |
</tr> | |
<tr> | |
<td>"The latest advancements in technology are fascinating."</td> | |
<td>Technology</td> | |
</tr> | |
</table> | |
</div> | |
""", unsafe_allow_html=True) | |
# RoBERTA Zero-Shot Classification Base - NLI | |
st.markdown('<div class="sub-title">RoBERTA Zero-Shot Classification Base - NLI</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<p>The <strong>roberta_base_zero_shot_classifier_nli</strong> model is tailored for zero-shot text classification tasks, enabling dynamic classification based on labels specified at runtime. Fine-tuned on Natural Language Inference (NLI) tasks, this model leverages the RoBERTa architecture to provide flexible and robust classification capabilities.</p> | |
</div> | |
""", unsafe_allow_html=True) | |
# How to Use the Model - Zero-Shot Classification | |
st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True) | |
st.code(''' | |
from sparknlp.base import * | |
from sparknlp.annotator import * | |
from pyspark.ml import Pipeline | |
document_assembler = DocumentAssembler() \\ | |
.setInputCol('text') \\ | |
.setOutputCol('document') | |
tokenizer = Tokenizer() \\ | |
.setInputCols(['document']) \\ | |
.setOutputCol('token') | |
zeroShotClassifier = RoBertaForZeroShotClassification \\ | |
.pretrained('roberta_base_zero_shot_classifier_nli', 'en') \\ | |
.setInputCols(['token', 'document']) \\ | |
.setOutputCol('class') \\ | |
.setCaseSensitive(False) \\ | |
.setMaxSentenceLength(512) \\ | |
.setCandidateLabels(["urgent", "mobile", "travel", "movie", "music", "sport", "weather", "technology"]) | |
pipeline = Pipeline(stages=[ | |
document_assembler, | |
tokenizer, | |
zeroShotClassifier | |
]) | |
example = spark.createDataFrame([['I have a problem with my iPhone that needs to be resolved ASAP!!']]).toDF("text") | |
result = pipeline.fit(example).transform(example) | |
result.select('document.result', 'class.result').show(truncate=False) | |
''', language='python') | |
st.text(""" | |
+------------------------------------------------------------------+------------+ | |
|result |result | | |
+------------------------------------------------------------------+------------+ | |
|[I have a problem with my iPhone that needs to be resolved ASAP!!]|[technology]| | |
+------------------------------------------------------------------+------------+ | |
""") | |
# Model Information - Zero-Shot Classification | |
st.markdown('<div class="sub-title">Model Information</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<table class="benchmark-table"> | |
<tr> | |
<th>Attribute</th> | |
<th>Description</th> | |
</tr> | |
<tr> | |
<td><strong>Model Name</strong></td> | |
<td>roberta_base_zero_shot_classifier_nli</td> | |
</tr> | |
<tr> | |
<td><strong>Compatibility</strong></td> | |
<td>Spark NLP 4.4.2+</td> | |
</tr> | |
<tr> | |
<td><strong>License</strong></td> | |
<td>Open Source</td> | |
</tr> | |
<tr> | |
<td><strong>Edition</strong></td> | |
<td>Official</td> | |
</tr> | |
<tr> | |
<td><strong>Input Labels</strong></td> | |
<td>[token, document]</td> | |
</tr> | |
<tr> | |
<td><strong>Output Labels</strong></td> | |
<td>[multi_class]</td> | |
</tr> | |
<tr> | |
<td><strong>Language</strong></td> | |
<td>en</td> | |
</tr> | |
<tr> | |
<td><strong>Size</strong></td> | |
<td>466.4 MB</td> | |
</tr> | |
<tr> | |
<td><strong>Case Sensitive</strong></td> | |
<td>true</td> | |
</tr> | |
</table> | |
""", unsafe_allow_html=True) | |
# References - Zero-Shot Classification | |
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<ul> | |
<li><a class="link" href="https://github.com/huggingface/transformers" target="_blank" rel="noopener">Hugging Face Transformers</a></li> | |
<li><a class="link" href="https://arxiv.org/abs/1905.05583" target="_blank" rel="noopener">RoBERTa: A Robustly Optimized BERT Pretraining Approach</a></li> | |
<li><a class="link" href="https://huggingface.co/roberta-base" target="_blank" rel="noopener">Hugging Face RoBERTa Models</a></li> | |
</ul> | |
</div> | |
""", unsafe_allow_html=True) | |
with tab3: | |
# RoBERTa Sequence Classification | |
st.markdown(""" | |
<div class="section"> | |
<h2>RoBERTa for Sequence Classification</h2> | |
<p>The <strong>RoBertaForSequenceClassification</strong> annotator is designed for tasks such as sentiment analysis and sequence classification using the RoBERTa model. This model handles classification tasks efficiently and is adapted for production-readiness with Spark NLP.</p> | |
<p>Sequence classification with RoBERTa enables:</p> | |
<ul> | |
<li><strong>Sentiment Analysis:</strong> Determining sentiment expressed in text as negative, neutral, or positive.</li> | |
<li><strong>Text Classification:</strong> Categorizing text into predefined classes such as sentiment or topic categories.</li> | |
<li><strong>Document Analysis:</strong> Enhancing the analysis and categorization of documents based on content.</li> | |
</ul> | |
<p>Here is an example of how RoBERTa sequence classification works:</p> | |
<table class="benchmark-table"> | |
<tr> | |
<th>Text</th> | |
<th>Label</th> | |
</tr> | |
<tr> | |
<td>The new RoBERTa model shows significant improvements in performance.</td> | |
<td>Positive</td> | |
</tr> | |
<tr> | |
<td>The training was not very effective and did not yield desired results.</td> | |
<td>Negative</td> | |
</tr> | |
<tr> | |
<td>The overall feedback on the new features has been mixed.</td> | |
<td>Neutral</td> | |
</tr> | |
</table> | |
</div> | |
""", unsafe_allow_html=True) | |
# RoBERTa Sequence Classification - ACTS Feedback1 | |
st.markdown('<div class="sub-title">RoBERTa Sequence Classification - ACTS Feedback1</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<p>The <strong>roberta_classifier_acts_feedback1</strong> model is a fine-tuned RoBERTa model for sequence classification tasks, specifically adapted for English text. This model was originally trained by mp6kv and is curated to provide scalability and production-readiness using Spark NLP. It can classify text into three categories: negative, neutral, and positive.</p> | |
</div> | |
""", unsafe_allow_html=True) | |
# How to Use the Model - Sequence Classification | |
st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True) | |
st.code(''' | |
from sparknlp.base import * | |
from sparknlp.annotator import * | |
from pyspark.ml import Pipeline | |
document_assembler = DocumentAssembler() \\ | |
.setInputCol("text") \\ | |
.setOutputCol("document") | |
tokenizer = Tokenizer() \\ | |
.setInputCols("document") \\ | |
.setOutputCol("token") | |
seq_classifier = RoBertaForSequenceClassification \\ | |
.pretrained("roberta_classifier_acts_feedback1", "en") \\ | |
.setInputCols(["document", "token"]) \\ | |
.setOutputCol("class") | |
pipeline = Pipeline(stages=[document_assembler, tokenizer, seq_classifier]) | |
data = spark.createDataFrame([["I had a fantastic day at the park with my friends and family, enjoying the beautiful weather and fun activities."]]).toDF("text") | |
result = pipeline.fit(data).transform(data) | |
result.select('class.result').show(truncate=False) | |
''', language='python') | |
# Results | |
st.text(""" | |
+----------+ | |
|result | | |
+----------+ | |
|[positive]| | |
+----------+ | |
""") | |
# Model Info Section | |
st.markdown('<div class="sub-title">Model Info</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<ul> | |
<li><strong>Model Name:</strong> roberta_classifier_acts_feedback1</li> | |
<li><strong>Compatibility:</strong> Spark NLP 5.2.0+</li> | |
<li><strong>License:</strong> Open Source</li> | |
<li><strong>Edition:</strong> Official</li> | |
<li><strong>Input Labels:</strong> [document, token]</li> | |
<li><strong>Output Labels:</strong> [class]</li> | |
<li><strong>Language:</strong> en</li> | |
<li><strong>Size:</strong> 424.8 MB</li> | |
<li><strong>Case Sensitive:</strong> True</li> | |
<li><strong>Max Sentence Length:</strong> 256</li> | |
</ul> | |
</div> | |
""", unsafe_allow_html=True) | |
# References Section | |
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<ul> | |
<li><a class="link" href="https://huggingface.co/mp6kv/ACTS_feedback1" target="_blank" rel="noopener">ACTS Feedback1 Model on Hugging Face</a></li> | |
<li><a class="link" href="https://arxiv.org/abs/1907.11692" target="_blank" rel="noopener">RoBERTa: A Robustly Optimized BERT Pretraining Approach</a></li> | |
<li><a class="link" href="https://github.com/huggingface/transformers" target="_blank" rel="noopener">Hugging Face Transformers</a></li> | |
</ul> | |
</div> | |
""", unsafe_allow_html=True) | |
with tab4: | |
st.markdown(""" | |
<div class="section"> | |
<h2>RoBERTa for Question Answering</h2> | |
<p>The <strong>RoBertaForQuestionAnswering</strong> annotator is designed for extracting answers from a given context based on a specific question. This model leverages RoBERTa's capabilities to accurately find and provide answers, making it suitable for applications that require detailed information retrieval. Question answering with RoBERTa is especially useful for:</p> | |
<ul> | |
<li><strong>Building Advanced QA Systems:</strong> Developing systems capable of answering user queries with high accuracy.</li> | |
<li><strong>Enhancing Customer Service:</strong> Providing precise answers to customer questions in support environments.</li> | |
<li><strong>Improving Information Retrieval:</strong> Extracting specific answers from large text corpora.</li> | |
</ul> | |
<p>Utilizing this annotator can significantly enhance your ability to retrieve and deliver accurate answers from text data.</p> | |
<table class="benchmark-table"> | |
<tr> | |
<th>Context</th> | |
<th>Question</th> | |
<th>Predicted Answer</th> | |
</tr> | |
<tr> | |
<td>"The Eiffel Tower is one of the most recognizable structures in the world. It was constructed in 1889 as the entrance arch to the 1889 World's Fair held in Paris, France."</td> | |
<td>"When was the Eiffel Tower constructed?"</td> | |
<td>1889</td> | |
</tr> | |
<tr> | |
<td>"The Amazon rainforest, also known as Amazonia, is a vast tropical rainforest in South America. It is home to an incredible diversity of flora and fauna."</td> | |
<td>"What is the Amazon rainforest also known as?"</td> | |
<td>Amazonia</td> | |
</tr> | |
<tr> | |
<td>"The Great Wall of China is a series of fortifications made of various materials, stretching over 13,000 miles across northern China."</td> | |
<td>"How long is the Great Wall of China?"</td> | |
<td>13,000 miles</td> | |
</tr> | |
</table> | |
</div> | |
""", unsafe_allow_html=True) | |
# RoBERTa for Question Answering - icebert_finetuned_squad_10 | |
st.markdown('<div class="sub-title">icebert_finetuned_squad_10</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<p>This model is a pretrained RoBERTa model, adapted from Hugging Face, specifically fine-tuned for question-answering tasks. It has been curated to provide scalability and production-readiness using Spark NLP. The <strong>icebert_finetuned_squad_10</strong> model is originally trained by gudjonk93 for English language tasks.</p> | |
</div> | |
""", unsafe_allow_html=True) | |
# How to Use the Model - Question Answering | |
st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True) | |
st.code(''' | |
from sparknlp.base import * | |
from sparknlp.annotator import * | |
from pyspark.ml import Pipeline | |
# Document Assembler | |
document_assembler = MultiDocumentAssembler() \\ | |
.setInputCols(["question", "context"]) \\ | |
.setOutputCols(["document_question", "document_context"]) | |
# RoBertaForQuestionAnswering | |
spanClassifier = RoBertaForQuestionAnswering.pretrained("icebert_finetuned_squad_10", "en") \\ | |
.setInputCols(["document_question", "document_context"]) \\ | |
.setOutputCol("answer") | |
# Pipeline | |
pipeline = Pipeline().setStages([ | |
document_assembler, | |
spanClassifier | |
]) | |
# Create example DataFrame | |
example = spark.createDataFrame([ | |
["What's my name?", "My name is Clara and I live in Berkeley."] | |
]).toDF("question", "context") | |
# Fit and transform the data | |
pipelineModel = pipeline.fit(example) | |
result = pipelineModel.transform(example) | |
# Show results | |
result.select('document_question.result', 'answer.result').show(truncate=False) | |
''', language='python') | |
st.text(""" | |
+-----------------+-------+ | |
|result |result | | |
+-----------------+-------+ | |
|[What's my name?]|[Clara]| | |
+-----------------+-------+ | |
""") | |
# Model Information - Question Answering | |
st.markdown('<div class="sub-title">Model Information</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<table class="benchmark-table"> | |
<tr> | |
<th>Attribute</th> | |
<th>Description</th> | |
</tr> | |
<tr> | |
<td><strong>Model Name</strong></td> | |
<td>icebert_finetuned_squad_10</td> | |
</tr> | |
<tr> | |
<td><strong>Compatibility</strong></td> | |
<td>Spark NLP 5.2.1+</td> | |
</tr> | |
<tr> | |
<td><strong>License</strong></td> | |
<td>Open Source</td> | |
</tr> | |
<tr> | |
<td><strong>Edition</strong></td> | |
<td>Official</td> | |
</tr> | |
<tr> | |
<td><strong>Input Labels</strong></td> | |
<td>[document_question, document_context]</td> | |
</tr> | |
<tr> | |
<td><strong>Output Labels</strong></td> | |
<td>[answer]</td> | |
</tr> | |
<tr> | |
<td><strong>Language</strong></td> | |
<td>en</td> | |
</tr> | |
<tr> | |
<td><strong>Size</strong></td> | |
<td>450.4 MB</td> | |
</tr> | |
</table> | |
""", unsafe_allow_html=True) | |
# References - Question Answering | |
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<ul> | |
<li><a class="link" href="https://huggingface.co/gudjonk93/IceBERT-finetuned-squad-10" target="_blank" rel="noopener">IceBERT Model on Hugging Face</a></li> | |
<li><a class="link" href="https://arxiv.org/abs/1810.04805" target="_blank" rel="noopener">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</a></li> | |
<li><a class="link" href="https://github.com/google-research/bert" target="_blank" rel="noopener">Google Research BERT</a></li> | |
</ul> | |
</div> | |
""", unsafe_allow_html=True) | |
# Community & Support | |
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True) | |
st.markdown(""" | |
<div class="section"> | |
<ul> | |
<li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li> | |
<li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li> | |
<li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li> | |
<li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li> | |
<li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li> | |
</ul> | |
</div> | |
""", unsafe_allow_html=True) |