Spaces:

rk2546
/

csgy-6613-project-rk2546

Runtime error

App Files Files Community

Ryan Kim commited on Apr 27, 2023

Commit

fda5a48

1 Parent(s): 06f9c28

streamlined appearance of overall app

Browse files

Files changed (2) hide show

README.md +10 -39
src/main.py +70 -73

README.md CHANGED Viewed

@@ -86,48 +86,24 @@ class ModelImplementation(object):
 The main idea is that for every model that's needed, we create a new instance of this class. In each case, we can store a reference to the tokenizer, model, and pipeline; the model will then use that tokenizer, model, and pipeline in the `predict()` call. If the output of a model needs to be curated in some way (ex. we need to post-process the output of a model so that it's more human-readable), we can also pass a custom method alongside the other parameters too. This is useful when we are switching between models in the Sentiment Analysis page or between the Sentiment Analysis and Patent Acceptance Prediction page - we merely have to create or modify an instance of the `ModelImplementation` class with the proper tokenizer, model, pipeline, and post-process method (if needed). Placeholder text for any inputs can also be stored as well in an array.
-The Sentiment Analysis and Patent Acceptance Prediction pages are both stored on one interface, with a sidebar menu allowing a user to switch between the two. The page has a simple title, subtitle, and sidebar implementation through **Streamlit**:
 ````python
 # Title
 st.title("CSGY-6613 Project")
 # Subtitle
 st.markdown("_**Ryan Kim (rk2546)**_")
-st.markdown("---")
-def PageToHome():
-    st.session_state.page = "home"
-def PageToEmotion():
-    st.session_state.page = "emotion"
-def PageToPatent():
-    st.session_state.page = "patent"
-with st.sidebar:
-    st.subheader("Toolbox")
-    home_selected = st.button("Home", on_click=PageToHome)
-    emotion_selected = st.button(
-        "Emotion Analysis [Milestone #2]",
-        on_click=PageToEmotion
-    )
-    patent_selected = st.button(
-        "Patent Prediction [Milestone #3]",
-        on_click=PageToPatent
-    )
-````
-We store the current page of the user inside an `st.session_state` dictionary, which persists every time the page loads or changes. Because **Streamlit** will only re-render the page every time a change is made to the interface - this means that variables not stored in a session will be re-set. Alongside the current page, we also store models and user inputs inside of the session as well, which allows them to persist between **Streamlit** re-renderings.
-Whenever we switch between pages via the sidebar, a simple `if-else` statement ensure that the proper page is loaded:
-````python
-if st.session_state.page == "emotion":
     st.subheader("Sentiment Analysis")
-    if "emotion_model" not in st.session_state:
-        st.write("Loading model...")
-    else:
-        // ...
-elif st.session_state.page == "patent":
     st.subheader("USPTO Patent Evaluation")
     // ...
 ````
@@ -187,13 +163,8 @@ if submit:
         to_eval = st.session_state.emotion_model.placeholders[0]
     else:
         to_eval = text_input.strip()
-    st.write("You entered:")
-    st.markdown("> {}".format(to_eval))
-    st.write("Using the NLP model:")
-    st.markdown("> {}".format(st.session_state.emotion_model_name))
-    label, score = st.session_state.emotion_model.predict(to_eval)
-    st.markdown("#### Result:")
-    st.markdown("**{}**: {}".format(label,score))
 ````
 ### **USPTO Patent Acceptance Prediction**

 The main idea is that for every model that's needed, we create a new instance of this class. In each case, we can store a reference to the tokenizer, model, and pipeline; the model will then use that tokenizer, model, and pipeline in the `predict()` call. If the output of a model needs to be curated in some way (ex. we need to post-process the output of a model so that it's more human-readable), we can also pass a custom method alongside the other parameters too. This is useful when we are switching between models in the Sentiment Analysis page or between the Sentiment Analysis and Patent Acceptance Prediction page - we merely have to create or modify an instance of the `ModelImplementation` class with the proper tokenizer, model, pipeline, and post-process method (if needed). Placeholder text for any inputs can also be stored as well in an array.
+The Sentiment Analysis and Patent Acceptance Prediction pages are both stored on one interface, with a tab menu allowing a user to switch between the two.
 ````python
 # Title
 st.title("CSGY-6613 Project")
 # Subtitle
 st.markdown("_**Ryan Kim (rk2546)**_")
+sentimentTab, patentTab = st.tabs([
+    "Emotion Analysis [Milestone #2]",
+    "Patent Prediction [Milestone #3]"
+])
+with sentimentTab:
     st.subheader("Sentiment Analysis")
+    // ...
+with patentTab:
     st.subheader("USPTO Patent Evaluation")
     // ...
 ````
         to_eval = st.session_state.emotion_model.placeholders[0]
     else:
         to_eval = text_input.strip()
+    label, score, output_func = st.session_state.emotion_model.predict(to_eval)
+    output_func("**{}**: {}".format(label,score))
 ````
 ### **USPTO Patent Acceptance Prediction**

src/main.py CHANGED Viewed

@@ -6,10 +6,6 @@ import streamlit as st
 from transformers import TextClassificationPipeline, pipeline
 from transformers import AutoTokenizer, AutoModelForSequenceClassification, DistilBertTokenizerFast, DistilBertForSequenceClassification
-# We'll be using Torch this time around
-import torch
-import torch.nn.functional as F
 emotion_model_names = (
     "cardiffnlp/twitter-roberta-base-sentiment",
     "finiteautomata/beto-sentiment-analysis",
@@ -47,14 +43,45 @@ class ModelImplementation(object):
 def ParseEmotionOutput(self, result):
     label = result[0]['label']
     score = result[0]['score']
     if self.transformer_model_name == "cardiffnlp/twitter-roberta-base-sentiment":
         if label == "LABEL_0":
-            label = "Negative"
         elif label == "LABEL_2":
-            label = "Positive"
         else:
-            label = "Neutral"
-    return label, score
 def ParsePatentOutput(self, result):
     return result
@@ -115,28 +142,13 @@ if "patent_data" not in st.session_state:
 st.title("CSGY-6613 Project")
 # Subtitle
 st.markdown("_**Ryan Kim (rk2546)**_")
-st.markdown("---")
-def PageToHome():
-    st.session_state.page = "home"
-def PageToEmotion():
-    st.session_state.page = "emotion"
-def PageToPatent():
-    st.session_state.page = "patent"
-with st.sidebar:
-    st.subheader("Toolbox")
-    home_selected = st.button("Home", on_click=PageToHome)
-    emotion_selected = st.button(
-        "Emotion Analysis [Milestone #2]",
-        on_click=PageToEmotion
-    )
-    patent_selected = st.button(
-        "Patent Prediction [Milestone #3]",
-        on_click=PageToPatent
-    )
-if st.session_state.page == "emotion":
     st.subheader("Sentiment Analysis")
     if "emotion_model" not in st.session_state:
         st.write("Loading model...")
@@ -158,15 +170,10 @@ if st.session_state.page == "emotion":
                 to_eval = st.session_state.emotion_model.placeholders[0]
             else:
                 to_eval = text_input.strip()
-            st.write("You entered:")
-            st.markdown("> {}".format(to_eval))
-            st.write("Using the NLP model:")
-            st.markdown("> {}".format(st.session_state.emotion_model_name))
-            label, score = st.session_state.emotion_model.predict(to_eval)
-            st.markdown("#### Result:")
-            st.markdown("**{}**: {}".format(label,score))
-elif st.session_state.page == "patent":
     st.subheader("USPTO Patent Evaluation")
     st.markdown("Below are two inputs - one for an **ABSTRACT** and another for a list of **CLAIMS**. Enter both and select the \"Submit\" button to evaluate the patenteability of your idea.")
@@ -177,8 +184,6 @@ elif st.session_state.page == "patent":
         key="patent_num",
     )
-    print(patent_index_option)
     if "patent_abstract_model" not in st.session_state or "patent_claim_model" not in st.session_state:
         st.write("Loading models...")
     else:
@@ -188,13 +193,13 @@ elif st.session_state.page == "patent":
                 abstract_input = st.text_area(
                     "Enter the abstract of the patent below",
                     placeholder=st.session_state.patent_data[st.session_state.patent_num]["abstract"],
-                    height=400
                 )
             with col2:
                 claim_input = st.text_area(
                     "Enter the claims of the patent below",
                     placeholder=st.session_state.patent_data[st.session_state.patent_num]["claim"],
-                    height=400
                 )
             weight_val = st.slider(
                 "How much do the abstract and claims weight when aggregating a total softmax score?",
@@ -219,17 +224,8 @@ elif st.session_state.page == "patent":
                     claim_to_eval = claim_input.strip()
                     is_custom = True
-                #tokenized_claim = st.session_state.patent_claim_model.tokenizer.encode(claim_to_eval, padding=True, truncation=True, max_length=512, add_special_tokens = True)
-                #untokenized_claim = st.session_state.patent_claim_model.tokenizer.decode(tokenized_claim)
-                #claim_to_eval2 = untokenized_claim.replace("[CLS]","")
-                #claim_to_eval2 = claim_to_eval2.replace("[SEP]","")
-                #print(claim_to_eval2)
                 abstract_response = st.session_state.patent_abstract_model.predict(abstract_to_eval)
                 claim_response = st.session_state.patent_claim_model.predict(claim_to_eval)
-                print(abstract_response[0])
-                print(claim_response[0])
-                print(weight_val)
                 claim_weight = (1+weight_val)/2
                 abstract_weight = 1-claim_weight
@@ -238,36 +234,37 @@ elif st.session_state.page == "patent":
                     {'label':'ACCEPTED','score':abstract_response[0][1]['score']*abstract_weight + claim_response[0][1]['score']*claim_weight}
                 ]
                 aggregate_score_sorted = sorted(aggregate_score, key=lambda d: d['score'], reverse=True)
-                print(aggregate_score_sorted)
-                print(f'Original Rating: {st.session_state.patent_data[st.session_state.patent_num]["label"]}')
-                st.markdown("---")
-                answerCol1, answerCol2 = st.columns(2)
                 with answerCol1:
-                    st.markdown("### Abstract Ratings")
-                    st.markdown("""
-                        > **Reject**: {}
-                        > **Accept**: {}
-                    """.format(abstract_response[0][0]["score"], abstract_response[0][1]["score"]))
                 with answerCol2:
-                    st.markdown("### Claims Ratings")
-                    st.markdown("""
-                        > **Reject**: {}
-                        > **Accept**: {}
-                    """.format(claim_response[0][0]["score"], claim_response[0][1]["score"]))
-                st.markdown(f'### Final Rating: **{aggregate_score_sorted[0]["label"]}**')
-                st.markdown("""
-                    > **Reject**: {}
-                    > **Accept**: {}
-                """.format(aggregate_score[0]['score'], aggregate_score[1]['score']))
                 #if not is_custom:
                 #    st.markdown('**Original Score:**')
                 #    st.markdown(st.session_state.patent_data[st.session_state.patent_num]["label"])
-else:
-    st.write("To get started, access the sidebar on the left (click the arrow in the top-left corner of the screen) and select a tool.")
 st.write("")

 from transformers import TextClassificationPipeline, pipeline
 from transformers import AutoTokenizer, AutoModelForSequenceClassification, DistilBertTokenizerFast, DistilBertForSequenceClassification
 emotion_model_names = (
     "cardiffnlp/twitter-roberta-base-sentiment",
     "finiteautomata/beto-sentiment-analysis",
 def ParseEmotionOutput(self, result):
     label = result[0]['label']
     score = result[0]['score']
+    output_func = st.info
     if self.transformer_model_name == "cardiffnlp/twitter-roberta-base-sentiment":
         if label == "LABEL_0":
+            label = "NEGATIVE"
+            output_func = st.error
         elif label == "LABEL_2":
+            label = "POSITIVE"
+            output_func = st.success
+        else:
+            label = "NEUTRAL"
+    elif self.transformer_model_name == "finiteautomata/beto-sentiment-analysis":
+        if label == "NEG":
+            label = "NEGATIVE"
+            output_func = st.error
+        elif label == "POS":
+            label = "POSITIVE"
+            output_func = st.success
         else:
+            label = "NEUTRAL"
+    elif self.transformer_model_name == "bhadresh-savani/distilbert-base-uncased-emotion":
+        if label == "sadness":
+            output_func = st.info
+        elif label == "joy":
+            output_func = st.success
+        elif label == "love":
+            output_func = st.success
+        elif label == "anger":
+            output_func = st.error
+        elif label == "fear":
+            output_func = st.info
+        elif label == "surprise":
+            output_func = st.error
+        label = label.upper()
+    elif self.transformer_model_name == "siebert/sentiment-roberta-large-english":
+        if label == "NEGATIVE":
+            output_func = st.error
+        elif label == "POSITIVE":
+            output_func = st.success
+    return label, score, output_func
 def ParsePatentOutput(self, result):
     return result
 st.title("CSGY-6613 Project")
 # Subtitle
 st.markdown("_**Ryan Kim (rk2546)**_")
+sentimentTab, patentTab = st.tabs([
+    "Emotion Analysis [Milestone #2]",
+    "Patent Prediction [Milestone #3]"
+])
+with sentimentTab:
     st.subheader("Sentiment Analysis")
     if "emotion_model" not in st.session_state:
         st.write("Loading model...")
                 to_eval = st.session_state.emotion_model.placeholders[0]
             else:
                 to_eval = text_input.strip()
+            label, score, output_func = st.session_state.emotion_model.predict(to_eval)
+            output_func("**{}**: {}".format(label,score))
+with patentTab:
     st.subheader("USPTO Patent Evaluation")
     st.markdown("Below are two inputs - one for an **ABSTRACT** and another for a list of **CLAIMS**. Enter both and select the \"Submit\" button to evaluate the patenteability of your idea.")
         key="patent_num",
     )
     if "patent_abstract_model" not in st.session_state or "patent_claim_model" not in st.session_state:
         st.write("Loading models...")
     else:
                 abstract_input = st.text_area(
                     "Enter the abstract of the patent below",
                     placeholder=st.session_state.patent_data[st.session_state.patent_num]["abstract"],
+                    height=200
                 )
             with col2:
                 claim_input = st.text_area(
                     "Enter the claims of the patent below",
                     placeholder=st.session_state.patent_data[st.session_state.patent_num]["claim"],
+                    height=200
                 )
             weight_val = st.slider(
                 "How much do the abstract and claims weight when aggregating a total softmax score?",
                     claim_to_eval = claim_input.strip()
                     is_custom = True
                 abstract_response = st.session_state.patent_abstract_model.predict(abstract_to_eval)
                 claim_response = st.session_state.patent_claim_model.predict(claim_to_eval)
                 claim_weight = (1+weight_val)/2
                 abstract_weight = 1-claim_weight
                     {'label':'ACCEPTED','score':abstract_response[0][1]['score']*abstract_weight + claim_response[0][1]['score']*claim_weight}
                 ]
                 aggregate_score_sorted = sorted(aggregate_score, key=lambda d: d['score'], reverse=True)
+                answerCol1, answerCol2, answerCol3 = st.columns(3)
                 with answerCol1:
+                    st.slider(
+                        "Abstract Acceptance Likelihood",
+                        min_value=0.0,
+                        max_value=100.0,
+                        value=abstract_response[0][1]["score"]*100.0,
+                        disabled=True
+                    )
                 with answerCol2:
+                    output_func = st.info
+                    if aggregate_score_sorted[0]["label"] == "REJECTED":
+                        output_func = st.error
+                    else:
+                        output_func = st.success
+                    output_func("""
+                        **Final Rating: {}**
+                        {}%
+                    """.format(aggregate_score_sorted[0]["label"],aggregate_score_sorted[0]["score"]*100.0))
+                with answerCol3:
+                    st.slider(
+                        "Claim Acceptance Likelihood",
+                        min_value=0.0,
+                        max_value=100.0,
+                        value=claim_response[0][1]["score"]*100.0,
+                        disabled=True
+                    )
                 #if not is_custom:
                 #    st.markdown('**Original Score:**')
                 #    st.markdown(st.session_state.patent_data[st.session_state.patent_num]["label"])
 st.write("")