Spaces:

flax-sentence-embeddings
/

sentence-embeddings

Runtime error

App Files Files Community

devtrent commited on Jul 18, 2021

Commit

f7a5664

•

1 Parent(s): 5cd1ac6

Asymmetric QA

Browse files

Files changed (2) hide show

app.py +17 -10
backend/config.py +1 -1

app.py CHANGED Viewed

@@ -13,16 +13,15 @@ st.markdown('''
 Hi! This is the demo for the [flax sentence embeddings](https://huggingface.co/flax-sentence-embeddings) created for the **Flax/JAX community week 🤗**. We are going to use three flax-sentence-embeddings models: a **distilroberta base**, a **mpnet base** and a **minilm-l6**. All were trained on all the dataset of the 1B+ train corpus with the v3 setup.
----
-**Instructions**: You can compare the similarity of a main text with other texts of your choice (in the sidebar). In the background, we'll create an embedding for each text, and then we'll use the cosine similarity function to calculate a similarity metric between our main sentence and the others.
 For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html).
-Please enjoy!!
 ''')
-if menu == "Sentence Similarity":
     select_models = st.multiselect("Choose models", options=list(MODELS_ID), default=list(MODELS_ID)[0])
     anchor = st.text_input(
@@ -45,7 +44,7 @@ if menu == "Sentence Similarity":
         results = {model: inference.text_similarity(anchor, inputs, model, MODELS_ID) for model in select_models}
         df_results = {model: results[model] for model in results}
-        index = [f"{idx}:{input[:min(15, len(input))]}..." for idx, input in enumerate(inputs)]
         df_total = pd.DataFrame(index=index)
         for key, value in df_results.items():
             df_total[key] = list(value['score'].values)
@@ -55,11 +54,19 @@ if menu == "Sentence Similarity":
         st.write('Visualize the results of each model:')
         st.line_chart(df_total)
 elif menu == "Asymmetric QA":
     select_models = st.multiselect("Choose models", options=list(QA_MODELS_ID), default=list(QA_MODELS_ID)[0])
     anchor = st.text_input(
         'Please enter here the query you want to compare with given answers:',
-        value="How many close friends do you have?"
     )
     n_texts = st.number_input(
@@ -69,7 +76,7 @@ elif menu == "Asymmetric QA":
     inputs = []
-    defaults = ["I have 10.", "How many children do you have?", "I have 3 brothers."]
     for i in range(int(n_texts)):
         input = st.text_input(f'Answer {i + 1}:', value=defaults[i] if i < len(defaults) else "")
@@ -79,7 +86,7 @@ elif menu == "Asymmetric QA":
         results = {model: inference.text_similarity(anchor, inputs, model, QA_MODELS_ID) for model in select_models}
         df_results = {model: results[model] for model in results}
-        index = [f"{idx}:{input[:min(15, len(input))]}..." for idx, input in enumerate(inputs)]
         df_total = pd.DataFrame(index=index)
         for key, value in df_results.items():
             df_total[key] = list(value['score'].values)

 Hi! This is the demo for the [flax sentence embeddings](https://huggingface.co/flax-sentence-embeddings) created for the **Flax/JAX community week 🤗**. We are going to use three flax-sentence-embeddings models: a **distilroberta base**, a **mpnet base** and a **minilm-l6**. All were trained on all the dataset of the 1B+ train corpus with the v3 setup.
+''')
+if menu == "Sentence Similarity":
+    st.header('Sentence Similarity')
+    st.markdown('''
+**Instructions**: You can compare the similarity of a main text with other texts of your choice. In the background, we'll create an embedding for each text, and then we'll use the cosine similarity function to calculate a similarity metric between our main sentence and the others.
 For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html).
 ''')
     select_models = st.multiselect("Choose models", options=list(MODELS_ID), default=list(MODELS_ID)[0])
     anchor = st.text_input(
         results = {model: inference.text_similarity(anchor, inputs, model, MODELS_ID) for model in select_models}
         df_results = {model: results[model] for model in results}
+        index = [f"{idx + 1}:{input[:min(15, len(input))]}..." for idx, input in enumerate(inputs)]
         df_total = pd.DataFrame(index=index)
         for key, value in df_results.items():
             df_total[key] = list(value['score'].values)
         st.write('Visualize the results of each model:')
         st.line_chart(df_total)
 elif menu == "Asymmetric QA":
+    st.header('Asymmetric QA')
+    st.markdown('''
+**Instructions**: You can compare the Answer likeliness of a given Query with answer candidates of your choice. In the background, we'll create an embedding for each answers, and then we'll use the cosine similarity function to calculate a similarity metric between our query sentence and the others.
+`mpnet_asymmetric_qa` model works best for hard negative answers or distinguishing similar queries due to separate models applied for encoding questions and answers.
+For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html).
+''')
     select_models = st.multiselect("Choose models", options=list(QA_MODELS_ID), default=list(QA_MODELS_ID)[0])
     anchor = st.text_input(
         'Please enter here the query you want to compare with given answers:',
+        value="What is the weather in Paris?"
     )
     n_texts = st.number_input(
     inputs = []
+    defaults = ["It is raining in Paris right now with 70 F temperature.", "What is the weather in Berlin?", "I have 3 brothers."]
     for i in range(int(n_texts)):
         input = st.text_input(f'Answer {i + 1}:', value=defaults[i] if i < len(defaults) else "")
         results = {model: inference.text_similarity(anchor, inputs, model, QA_MODELS_ID) for model in select_models}
         df_results = {model: results[model] for model in results}
+        index = [f"{idx + 1}:{input[:min(15, len(input))]}..." for idx, input in enumerate(inputs)]
         df_total = pd.DataFrame(index=index)
         for key, value in df_results.items():
             df_total[key] = list(value['score'].values)

backend/config.py CHANGED Viewed

@@ -3,8 +3,8 @@ MODELS_ID = dict(distilroberta = 'flax-sentence-embeddings/st-codesearch-distilr
                  minilm_l6 = 'flax-sentence-embeddings/all_datasets_v3_MiniLM-L6')
 QA_MODELS_ID = dict(
-    mpnet_qa = 'flax-sentence-embeddings/mpnet_stackexchange_v1',
     mpnet_asymmetric_qa = ['flax-sentence-embeddings/multi-QA_v1-mpnet-asymmetric-Q',
                            'flax-sentence-embeddings/multi-QA_v1-mpnet-asymmetric-A'],
     distilbert_qa = 'flax-sentence-embeddings/multi-qa_v1-distilbert-cls_dot'
 )

                  minilm_l6 = 'flax-sentence-embeddings/all_datasets_v3_MiniLM-L6')
 QA_MODELS_ID = dict(
     mpnet_asymmetric_qa = ['flax-sentence-embeddings/multi-QA_v1-mpnet-asymmetric-Q',
                            'flax-sentence-embeddings/multi-QA_v1-mpnet-asymmetric-A'],
+    mpnet_qa='flax-sentence-embeddings/mpnet_stackexchange_v1',
     distilbert_qa = 'flax-sentence-embeddings/multi-qa_v1-distilbert-cls_dot'
 )