Spaces:

rk2546
/

csgy-6613-project-rk2546

Runtime error

App Files Files Community

Ryan Kim commited on Apr 27, 2023

Commit

06f9c28

•

1 Parent(s): 850858e

fixed some details in readme

Browse files

Files changed (1) hide show

README.md +8 -10

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ The USPTO application is divided into several directories. Overall, the importan
 Both `train.json` and `val.json` contain the original USPTO data, sized down to contain only the relevant data from each recorded patent and split between training and validation data. The validation data `val.json` is used in the online USPTO application as a set of pre-set patents that a user can select when using the USPTO patent prediction function.
-The primary code back-end is stored in `main.py`, which runs the application on the HuggingFace space UI. The application uses `Streamlit` to render UI elements on the screen. All models run off of Transformers and Tokenizers from HuggingFace.
 The application has two features: Sentiment Analysis (for Milestone #2) and USPTO Patent Acceptance Prediction (Milestone #3). Both run on `main.py`. Sentiment Analysis relies on pre-trained [models](https://huggingface.co/models) from HuggingFace's public [datasets](https://huggingface.co/datasets) - particularly 4 models:
@@ -86,7 +86,7 @@ class ModelImplementation(object):
 The main idea is that for every model that's needed, we create a new instance of this class. In each case, we can store a reference to the tokenizer, model, and pipeline; the model will then use that tokenizer, model, and pipeline in the `predict()` call. If the output of a model needs to be curated in some way (ex. we need to post-process the output of a model so that it's more human-readable), we can also pass a custom method alongside the other parameters too. This is useful when we are switching between models in the Sentiment Analysis page or between the Sentiment Analysis and Patent Acceptance Prediction page - we merely have to create or modify an instance of the `ModelImplementation` class with the proper tokenizer, model, pipeline, and post-process method (if needed). Placeholder text for any inputs can also be stored as well in an array.
-The Sentiment Analysis and Patent Acceptance Prediction pages are both stored on one interface, with a sidebar menu allowing a user to switch between the two. The page has a simple title, subtitle, and sidebar implementation through `Streamlit`:
 ````python
 # Title
@@ -115,7 +115,7 @@ with st.sidebar:
     )
 ````
-We store the current page of the user inside an `st.session_state` dictionary, which persists every time the page loads or changes. Because `Streamlit` will only re-render the page every time a change is made to the interface - this means that variables not stored in a session will be re-set. Alongside the current page, we also store models and user inputs inside of the session as well, which allows them to persist between `Streamlit` re-renderings.
 Whenever we switch between pages via the sidebar, a simple `if-else` statement ensure that the proper page is loaded:
@@ -132,7 +132,7 @@ elif st.session_state.page == "patent":
     // ...
 ````
-#### **Sentiment Analysis**
 Sentiment Analysis is relatively simple. It uses the `ModelImplementation` class detailed above to switch between four pre-existing HuggingFace models for the sentiment analysis:
@@ -141,9 +141,7 @@ Sentiment Analysis is relatively simple. It uses the `ModelImplementation` class
 - [bhadresh-savani/distilbert-base-uncased-emotion](https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion)
 - [siebert/sentiment-roberta-large-english](https://huggingface.co/siebert/sentiment-roberta-large-english)
-A method called `ParseEmotionOutput()` is used to process labels outputted by the [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) model in particular.
-Upon loading, if a model hasn't been instantiated yet, the page will create a new model with a pre-set model name and cache it for later use:
 ````python
 def emotion_model_change():
@@ -163,7 +161,7 @@ if "emotion_model_name" not in st.session_state:
     emotion_model_change()
 ````
-The method `emotion_model_change()` can be called to switch between different models, based on the session-saved `emotion_model_name` value. By default, the [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) model is used. To switch between models, we use a `Streamlit` `selectbox` module:
 ````python
 model_option = st.selectbox(
@@ -181,7 +179,7 @@ text_input = form.text_area(
 submit = form.form_submit_button('Submit')
 ````
-When the page loads, a placeholder from the current model is printed inside a `Streamlit` `text_area` module. When the user clicks on the `form_submit_button` button, the app will use whatever model is currently cached to generate output predictions. If no user input was provided, the placeholder value is passed as the input instead.
 ````python
 if submit:
@@ -198,7 +196,7 @@ if submit:
     st.markdown("**{}**: {}".format(label,score))
 ````
-#### **USPTO Patent Acceptance Prediction**
 The back-end for the USPTO Patent Acceptance Prediction is similar to that of **Sentiment Analysis**, but with some major differences.

 Both `train.json` and `val.json` contain the original USPTO data, sized down to contain only the relevant data from each recorded patent and split between training and validation data. The validation data `val.json` is used in the online USPTO application as a set of pre-set patents that a user can select when using the USPTO patent prediction function.
+The primary code back-end is stored in `main.py`, which runs the application on the HuggingFace space UI. The application uses **Streamlit** to render UI elements on the screen. All models run off of Transformers and Tokenizers from **HuggingFace**.
 The application has two features: Sentiment Analysis (for Milestone #2) and USPTO Patent Acceptance Prediction (Milestone #3). Both run on `main.py`. Sentiment Analysis relies on pre-trained [models](https://huggingface.co/models) from HuggingFace's public [datasets](https://huggingface.co/datasets) - particularly 4 models:
 The main idea is that for every model that's needed, we create a new instance of this class. In each case, we can store a reference to the tokenizer, model, and pipeline; the model will then use that tokenizer, model, and pipeline in the `predict()` call. If the output of a model needs to be curated in some way (ex. we need to post-process the output of a model so that it's more human-readable), we can also pass a custom method alongside the other parameters too. This is useful when we are switching between models in the Sentiment Analysis page or between the Sentiment Analysis and Patent Acceptance Prediction page - we merely have to create or modify an instance of the `ModelImplementation` class with the proper tokenizer, model, pipeline, and post-process method (if needed). Placeholder text for any inputs can also be stored as well in an array.
+The Sentiment Analysis and Patent Acceptance Prediction pages are both stored on one interface, with a sidebar menu allowing a user to switch between the two. The page has a simple title, subtitle, and sidebar implementation through **Streamlit**:
 ````python
 # Title
     )
 ````
+We store the current page of the user inside an `st.session_state` dictionary, which persists every time the page loads or changes. Because **Streamlit** will only re-render the page every time a change is made to the interface - this means that variables not stored in a session will be re-set. Alongside the current page, we also store models and user inputs inside of the session as well, which allows them to persist between **Streamlit** re-renderings.
 Whenever we switch between pages via the sidebar, a simple `if-else` statement ensure that the proper page is loaded:
     // ...
 ````
+### **Sentiment Analysis**
 Sentiment Analysis is relatively simple. It uses the `ModelImplementation` class detailed above to switch between four pre-existing HuggingFace models for the sentiment analysis:
 - [bhadresh-savani/distilbert-base-uncased-emotion](https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion)
 - [siebert/sentiment-roberta-large-english](https://huggingface.co/siebert/sentiment-roberta-large-english)
+A method called `ParseEmotionOutput()` is used to process labels outputted by the [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) model in particular. Upon loading, if a model hasn't been instantiated yet, the page will create a new model with a pre-set model name and cache it for later use:
 ````python
 def emotion_model_change():
     emotion_model_change()
 ````
+The method `emotion_model_change()` can be called to switch between different models, based on the session-saved `emotion_model_name` value. By default, the [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) model is used. To switch between models, we use a **Streamlit** `selectbox` module:
 ````python
 model_option = st.selectbox(
 submit = form.form_submit_button('Submit')
 ````
+When the page loads, a placeholder from the current model is printed inside a **Streamlit** `text_area` module. When the user clicks on the `form_submit_button` button, the app will use whatever model is currently cached to generate output predictions. If no user input was provided, the placeholder value is passed as the input instead.
 ````python
 if submit:
     st.markdown("**{}**: {}".format(label,score))
 ````
+### **USPTO Patent Acceptance Prediction**
 The back-end for the USPTO Patent Acceptance Prediction is similar to that of **Sentiment Analysis**, but with some major differences.