kiyer commited on
Commit
374da48
β€’
1 Parent(s): 237026a

update readme bugfixes

Browse files
Files changed (2) hide show
  1. app.py +18 -2
  2. pages/2_arxiv_embedding.py +0 -5
app.py CHANGED
@@ -8,19 +8,35 @@ st.set_page_config(
8
  st.write("# Welcome to arXiv-GPT! πŸ‘‹")
9
 
10
  st.sidebar.success("Select a function above.")
11
- st.sidebar.markdown("Current functions include visualizing papers in the arxiv embedding, or searching for similar papers to an input paper or prompt phrase.")
12
 
13
  st.markdown(
14
  """
15
  arXiv+GPT is a framework for searching and visualizing papers on
16
  the [arXiv](https://arxiv.org/) using the context sensitivity from modern
17
- large language models (LLMs) like GPT3 to better link paper contexts
18
 
19
  **πŸ‘ˆ Select a tool from the sidebar** to see some examples
20
  of what this framework can do!
 
 
 
 
 
 
 
 
 
 
 
 
21
  ### Want to learn more?
 
22
  - Check out `chaotic_neural` [(link)](http://chaotic-neural.readthedocs.io/)
23
  - Jump into our [documentation](https://docs.streamlit.io)
24
  - Contribute!
 
 
 
25
  """
26
  )
 
8
  st.write("# Welcome to arXiv-GPT! πŸ‘‹")
9
 
10
  st.sidebar.success("Select a function above.")
11
+ st.sidebar.markdown("Current functions include visualizing papers in the arxiv embedding, searching for similar papers to an input paper or prompt phrase, or answering quick questions.")
12
 
13
  st.markdown(
14
  """
15
  arXiv+GPT is a framework for searching and visualizing papers on
16
  the [arXiv](https://arxiv.org/) using the context sensitivity from modern
17
+ large language models (LLMs) to better link paper contexts
18
 
19
  **πŸ‘ˆ Select a tool from the sidebar** to see some examples
20
  of what this framework can do!
21
+
22
+ ### Page summary:
23
+ - `Paper search` looks for relevant papers given an arxiv id or a question.
24
+ - `Arxiv embedding` shows the landscape of current galaxy evolution papers (astro-ph.GA)
25
+ - `QA sources` brings it all together to give concise answers to questions with primary sources and relevant papers.
26
+
27
+ ### Coming soon:
28
+ - [AstroLLaMA](https://huggingface.co/spaces/universeTBD/astrollama) embeddings!
29
+ - export results
30
+ - daily updates to repo
31
+ - other fields apart from `astro-ph.GA`
32
+
33
  ### Want to learn more?
34
+ - Check out `AstroLLaMA` [paper](https://huggingface.co/papers/2309.06126)
35
  - Check out `chaotic_neural` [(link)](http://chaotic-neural.readthedocs.io/)
36
  - Jump into our [documentation](https://docs.streamlit.io)
37
  - Contribute!
38
+
39
+ arXiv+GPT is developed and maintained by [UniverseTBD](https://universetbd.org/). Updates on [huggingface](https://huggingface.co/universeTBD) or [twitter](https://twitter.com/universe_tbd).
40
+
41
  """
42
  )
pages/2_arxiv_embedding.py CHANGED
@@ -11,11 +11,6 @@ import pickle
11
  from scipy import stats
12
  from urllib.request import urlopen
13
 
14
- st.title("ArXiv+GPT3 embedding explorer")
15
- st.markdown('[Includes papers up to: `'+dateval+'`]')
16
- st.markdown("This is an explorer for astro-ph.GA papers on the arXiv (up to Apt 18th, 2023). The papers have been preprocessed with `chaotic_neural` [(link)](http://chaotic-neural.readthedocs.io/) after which the collected abstracts are run through `text-embedding-ada-002` with [langchain](https://python.langchain.com/en/latest/ecosystem/openai.html) to generate a unique vector correpsonding to each paper. These are then compressed using [umap](https://umap-learn.readthedocs.io/en/latest/) and shown here, and can be used for similarity searches with methods like [faiss](https://github.com/facebookresearch/faiss). The scatterplot here can be paired with a heatmap for more targeted searches looking at a specific topic or area (see sidebar). Upgrade to chaotic neural suggested by Jo Ciucă, thank you! More to come (hopefully) with GPT-4 and its applications!")
17
- st.markdown("Interpreting the UMAP plot: the algorithm creates a 2d embedding from the high-dim vector space that tries to conserve as much similarity information as possible. Nearby points in UMAP space are similar, and grow dissimiliar as you move farther away. The axes do not have any physical meaning.")
18
-
19
  @st.cache_data
20
  def get_feeds_data(url):
21
  # data = cp.load(urlopen(url))
 
11
  from scipy import stats
12
  from urllib.request import urlopen
13
 
 
 
 
 
 
14
  @st.cache_data
15
  def get_feeds_data(url):
16
  # data = cp.load(urlopen(url))