Ryan Kim commited on
Commit
06f9c28
1 Parent(s): 850858e

fixed some details in readme

Browse files
Files changed (1) hide show
  1. README.md +8 -10
README.md CHANGED
@@ -28,7 +28,7 @@ The USPTO application is divided into several directories. Overall, the importan
28
 
29
  Both `train.json` and `val.json` contain the original USPTO data, sized down to contain only the relevant data from each recorded patent and split between training and validation data. The validation data `val.json` is used in the online USPTO application as a set of pre-set patents that a user can select when using the USPTO patent prediction function.
30
 
31
- The primary code back-end is stored in `main.py`, which runs the application on the HuggingFace space UI. The application uses `Streamlit` to render UI elements on the screen. All models run off of Transformers and Tokenizers from HuggingFace.
32
 
33
  The application has two features: Sentiment Analysis (for Milestone #2) and USPTO Patent Acceptance Prediction (Milestone #3). Both run on `main.py`. Sentiment Analysis relies on pre-trained [models](https://huggingface.co/models) from HuggingFace's public [datasets](https://huggingface.co/datasets) - particularly 4 models:
34
 
@@ -86,7 +86,7 @@ class ModelImplementation(object):
86
 
87
  The main idea is that for every model that's needed, we create a new instance of this class. In each case, we can store a reference to the tokenizer, model, and pipeline; the model will then use that tokenizer, model, and pipeline in the `predict()` call. If the output of a model needs to be curated in some way (ex. we need to post-process the output of a model so that it's more human-readable), we can also pass a custom method alongside the other parameters too. This is useful when we are switching between models in the Sentiment Analysis page or between the Sentiment Analysis and Patent Acceptance Prediction page - we merely have to create or modify an instance of the `ModelImplementation` class with the proper tokenizer, model, pipeline, and post-process method (if needed). Placeholder text for any inputs can also be stored as well in an array.
88
 
89
- The Sentiment Analysis and Patent Acceptance Prediction pages are both stored on one interface, with a sidebar menu allowing a user to switch between the two. The page has a simple title, subtitle, and sidebar implementation through `Streamlit`:
90
 
91
  ````python
92
  # Title
@@ -115,7 +115,7 @@ with st.sidebar:
115
  )
116
  ````
117
 
118
- We store the current page of the user inside an `st.session_state` dictionary, which persists every time the page loads or changes. Because `Streamlit` will only re-render the page every time a change is made to the interface - this means that variables not stored in a session will be re-set. Alongside the current page, we also store models and user inputs inside of the session as well, which allows them to persist between `Streamlit` re-renderings.
119
 
120
  Whenever we switch between pages via the sidebar, a simple `if-else` statement ensure that the proper page is loaded:
121
 
@@ -132,7 +132,7 @@ elif st.session_state.page == "patent":
132
  // ...
133
  ````
134
 
135
- #### **Sentiment Analysis**
136
 
137
  Sentiment Analysis is relatively simple. It uses the `ModelImplementation` class detailed above to switch between four pre-existing HuggingFace models for the sentiment analysis:
138
 
@@ -141,9 +141,7 @@ Sentiment Analysis is relatively simple. It uses the `ModelImplementation` class
141
  - [bhadresh-savani/distilbert-base-uncased-emotion](https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion)
142
  - [siebert/sentiment-roberta-large-english](https://huggingface.co/siebert/sentiment-roberta-large-english)
143
 
144
- A method called `ParseEmotionOutput()` is used to process labels outputted by the [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) model in particular.
145
-
146
- Upon loading, if a model hasn't been instantiated yet, the page will create a new model with a pre-set model name and cache it for later use:
147
 
148
  ````python
149
  def emotion_model_change():
@@ -163,7 +161,7 @@ if "emotion_model_name" not in st.session_state:
163
  emotion_model_change()
164
  ````
165
 
166
- The method `emotion_model_change()` can be called to switch between different models, based on the session-saved `emotion_model_name` value. By default, the [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) model is used. To switch between models, we use a `Streamlit` `selectbox` module:
167
 
168
  ````python
169
  model_option = st.selectbox(
@@ -181,7 +179,7 @@ text_input = form.text_area(
181
  submit = form.form_submit_button('Submit')
182
  ````
183
 
184
- When the page loads, a placeholder from the current model is printed inside a `Streamlit` `text_area` module. When the user clicks on the `form_submit_button` button, the app will use whatever model is currently cached to generate output predictions. If no user input was provided, the placeholder value is passed as the input instead.
185
 
186
  ````python
187
  if submit:
@@ -198,7 +196,7 @@ if submit:
198
  st.markdown("**{}**: {}".format(label,score))
199
  ````
200
 
201
- #### **USPTO Patent Acceptance Prediction**
202
 
203
  The back-end for the USPTO Patent Acceptance Prediction is similar to that of **Sentiment Analysis**, but with some major differences.
204
 
 
28
 
29
  Both `train.json` and `val.json` contain the original USPTO data, sized down to contain only the relevant data from each recorded patent and split between training and validation data. The validation data `val.json` is used in the online USPTO application as a set of pre-set patents that a user can select when using the USPTO patent prediction function.
30
 
31
+ The primary code back-end is stored in `main.py`, which runs the application on the HuggingFace space UI. The application uses **Streamlit** to render UI elements on the screen. All models run off of Transformers and Tokenizers from **HuggingFace**.
32
 
33
  The application has two features: Sentiment Analysis (for Milestone #2) and USPTO Patent Acceptance Prediction (Milestone #3). Both run on `main.py`. Sentiment Analysis relies on pre-trained [models](https://huggingface.co/models) from HuggingFace's public [datasets](https://huggingface.co/datasets) - particularly 4 models:
34
 
 
86
 
87
  The main idea is that for every model that's needed, we create a new instance of this class. In each case, we can store a reference to the tokenizer, model, and pipeline; the model will then use that tokenizer, model, and pipeline in the `predict()` call. If the output of a model needs to be curated in some way (ex. we need to post-process the output of a model so that it's more human-readable), we can also pass a custom method alongside the other parameters too. This is useful when we are switching between models in the Sentiment Analysis page or between the Sentiment Analysis and Patent Acceptance Prediction page - we merely have to create or modify an instance of the `ModelImplementation` class with the proper tokenizer, model, pipeline, and post-process method (if needed). Placeholder text for any inputs can also be stored as well in an array.
88
 
89
+ The Sentiment Analysis and Patent Acceptance Prediction pages are both stored on one interface, with a sidebar menu allowing a user to switch between the two. The page has a simple title, subtitle, and sidebar implementation through **Streamlit**:
90
 
91
  ````python
92
  # Title
 
115
  )
116
  ````
117
 
118
+ We store the current page of the user inside an `st.session_state` dictionary, which persists every time the page loads or changes. Because **Streamlit** will only re-render the page every time a change is made to the interface - this means that variables not stored in a session will be re-set. Alongside the current page, we also store models and user inputs inside of the session as well, which allows them to persist between **Streamlit** re-renderings.
119
 
120
  Whenever we switch between pages via the sidebar, a simple `if-else` statement ensure that the proper page is loaded:
121
 
 
132
  // ...
133
  ````
134
 
135
+ ### **Sentiment Analysis**
136
 
137
  Sentiment Analysis is relatively simple. It uses the `ModelImplementation` class detailed above to switch between four pre-existing HuggingFace models for the sentiment analysis:
138
 
 
141
  - [bhadresh-savani/distilbert-base-uncased-emotion](https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion)
142
  - [siebert/sentiment-roberta-large-english](https://huggingface.co/siebert/sentiment-roberta-large-english)
143
 
144
+ A method called `ParseEmotionOutput()` is used to process labels outputted by the [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) model in particular. Upon loading, if a model hasn't been instantiated yet, the page will create a new model with a pre-set model name and cache it for later use:
 
 
145
 
146
  ````python
147
  def emotion_model_change():
 
161
  emotion_model_change()
162
  ````
163
 
164
+ The method `emotion_model_change()` can be called to switch between different models, based on the session-saved `emotion_model_name` value. By default, the [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) model is used. To switch between models, we use a **Streamlit** `selectbox` module:
165
 
166
  ````python
167
  model_option = st.selectbox(
 
179
  submit = form.form_submit_button('Submit')
180
  ````
181
 
182
+ When the page loads, a placeholder from the current model is printed inside a **Streamlit** `text_area` module. When the user clicks on the `form_submit_button` button, the app will use whatever model is currently cached to generate output predictions. If no user input was provided, the placeholder value is passed as the input instead.
183
 
184
  ````python
185
  if submit:
 
196
  st.markdown("**{}**: {}".format(label,score))
197
  ````
198
 
199
+ ### **USPTO Patent Acceptance Prediction**
200
 
201
  The back-end for the USPTO Patent Acceptance Prediction is similar to that of **Sentiment Analysis**, but with some major differences.
202