lfoppiano commited on
Commit
fde76b0
·
1 Parent(s): 5b25803

add documentation

Browse files
Files changed (2) hide show
  1. README.md +4 -2
  2. streamlit_app.py +3 -2
README.md CHANGED
@@ -12,13 +12,15 @@ license: apache-2.0
12
 
13
  # DocumentIQA: Scientific Document Insight QA
14
 
 
 
15
  ## Introduction
16
 
17
  Question/Answering on scientific documents using LLMs (OpenAI, Mistral, ~~LLama2,~~ etc..).
18
  This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS.
19
- Differently to most of the project, we focus on scientific articles and we are using [Grobid](https://github.com/kermitt2/grobid) for text extraction instead of the raw PDF2Text converter (which is comparable with most of other solutions) allow to extract only full-text.
20
 
21
- **Work in progress**
22
 
23
  **Demos**:
24
  - (on HuggingFace spaces): https://lfoppiano-document-qa.hf.space/
 
12
 
13
  # DocumentIQA: Scientific Document Insight QA
14
 
15
+ **Work in progress** :construction_worker:
16
+
17
  ## Introduction
18
 
19
  Question/Answering on scientific documents using LLMs (OpenAI, Mistral, ~~LLama2,~~ etc..).
20
  This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS.
21
+ Differently to most of the project, we focus on scientific articles. We target only the full-text using [Grobid](https://github.com/kermitt2/grobid) that provide and cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).
22
 
23
+ **NER in LLM response**: The responses from the LLMs are post-processed to extract <span stype="color:yellow">physical quantities, measurements</span> and <span stype="color:blue">materials</span> mentions.
24
 
25
  **Demos**:
26
  - (on HuggingFace spaces): https://lfoppiano-document-qa.hf.space/
streamlit_app.py CHANGED
@@ -177,6 +177,7 @@ with st.sidebar:
177
  st.markdown(
178
  """After entering your API Key (Open AI or Huggingface). Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress. Once the spinner stops, you can proceed to ask your questions.""")
179
 
 
180
  if st.session_state['git_rev'] != "unknown":
181
  st.markdown("**Revision number**: [" + st.session_state[
182
  'git_rev'] + "](https://github.com/lfoppiano/document-qa/commit/" + st.session_state['git_rev'] + ")")
@@ -231,8 +232,8 @@ if st.session_state.loaded_embeddings and question and len(question) > 0 and st.
231
  # for entity in entities:
232
  # entity
233
  decorated_text = decorate_text_with_annotations(text_response.strip(), entities)
234
- decorated_text = decorated_text.replace('class="label material"', 'style="color:blue"')
235
- decorated_text = re.sub(r'class="label[^"]+"', 'style="color:yellow"', decorated_text)
236
  st.markdown(decorated_text, unsafe_allow_html=True)
237
  text_response = decorated_text
238
  else:
 
177
  st.markdown(
178
  """After entering your API Key (Open AI or Huggingface). Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress. Once the spinner stops, you can proceed to ask your questions.""")
179
 
180
+ st.markdown('**NER on LLM responses**: The responses from the LLMs are post-processed to extract <span style="color:orange">physical quantities, measurements</span> and <span style="color:green">materials</span> mentions.', unsafe_allow_html=True)
181
  if st.session_state['git_rev'] != "unknown":
182
  st.markdown("**Revision number**: [" + st.session_state[
183
  'git_rev'] + "](https://github.com/lfoppiano/document-qa/commit/" + st.session_state['git_rev'] + ")")
 
232
  # for entity in entities:
233
  # entity
234
  decorated_text = decorate_text_with_annotations(text_response.strip(), entities)
235
+ decorated_text = decorated_text.replace('class="label material"', 'style="color:green"')
236
+ decorated_text = re.sub(r'class="label[^"]+"', 'style="color:orange"', decorated_text)
237
  st.markdown(decorated_text, unsafe_allow_html=True)
238
  text_response = decorated_text
239
  else: