Spaces:

raynardj
/

x-language-search-ancient-with-modern-words

Running

raynardj commited on Jan 18, 2022

Commit

67eeae3

•

1 Parent(s): 5e8b453

👜 baseline

Files changed (4) hide show

README.md CHANGED Viewed

@@ -1,37 +1,13 @@
 ---
-title: X Language Search Ancient With Modern Words
-emoji: 🐠
-colorFrom: purple
 colorTo: purple
 sdk: streamlit
 app_file: app.py
 pinned: false
 ---
-# Configuration
-`title`: _string_
-Display title for the Space
-`emoji`: _string_
-Space emoji (emoji-only character allowed)
-`colorFrom`: _string_
-Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)
-`colorTo`: _string_
-Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)
-`sdk`: _string_
-Can be either `gradio`, `streamlit`, or `static`
-`sdk_version` : _string_
-Only applicable for `streamlit` SDK.
-See [doc](https://hf.co/docs/hub/spaces) for more info on supported versions.
-`app_file`: _string_
-Path to your main application file (which contains either `gradio` or `streamlit` Python code, or `static` html code).
-Path is relative to the root of the repository.
-`pinned`: _boolean_
-Whether the Space stays on top of your list.

 ---
+title: Cross language search
+emoji: ⚔️
+colorFrom: indigo
 colorTo: purple
 sdk: streamlit
 app_file: app.py
 pinned: false
 ---
+# Cross Language Search
+> Search ancient books with modern words

app.py ADDED Viewed

+import streamlit as st
+import pandas as pd
+from sentence_transformers import SentenceTransformer
+from forgebox.cosine import CosineSearch
+import numpy as np
+TAG = "raynardj/xlsearch-cross-lang-search-zh-vs-classicical-cn"
+@st.cache(allow_output_mutation=True)
+def load_encoder():
+    with st.spinner(f"Loading Transformer:{TAG}"):
+        encoder = SentenceTransformer(TAG)
+    return encoder
+encoder = load_encoder()
+@st.cache(allow_output_mutation=True)
+def load_book():
+    with st.spinner(f"📚 Loading Book..."):
+        df = pd.read_csv("grand_historian.csv")
+    return list(df.sentence)
+all_lines = load_book()
+@st.cache(allow_output_mutation=True)
+def encode_book():
+    with st.spinner(f"Encoding sentences for book《Records of the Grand Historian》"):
+        vec = encoder.encode(all_lines, batch_size=64, show_progress_bar=True)
+        cosine = CosineSearch(vec)
+    return cosine
+cosine = encode_book()
+def search(text):
+    enc = encoder.encode(text) # encode the search key
+    order = cosine(enc) # distance array
+    sentence_df = pd.DataFrame({"sentence":np.array(all_lines)[order[:5]]})
+    return sentence_df
+keyword = st.text_input("用白话搜", "")
+if st.button("搜索"):
+    if keyword:
+        with st.spinner(f"🔍 Searching for {keyword}"):
+            df = search(keyword)
+            st.table(df)

grand_historian.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

+torch==1.7.1
+sentence-transformers==2.1.0
+transformers==4.12.3
+pandas==1.3.5
+forgebox==0.4.20