lfoppiano commited on
Commit
cd17f01
1 Parent(s): 56267ff

improve documentation

Browse files
Files changed (1) hide show
  1. README.md +27 -9
README.md CHANGED
@@ -28,20 +28,39 @@ Differently to most of the project, we focus on scientific articles. We target o
28
 
29
  ## Getting started
30
 
31
- - Select the model+embedding combination you want ot use ~~(for LLama2 you must acknowledge their licence both on meta.com and on huggingface. See [here](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf))~~(Llama2 was removed due to API limitations).
32
  - Enter your API Key ([Open AI](https://platform.openai.com/account/api-keys) or [Huggingface](https://huggingface.co/docs/hub/security-tokens)).
33
  - Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress.
34
  - Once the spinner stops, you can proceed to ask your questions
35
 
36
  ![screenshot2.png](docs%2Fimages%2Fscreenshot2.png)
37
 
38
- ### Options
39
- #### Context size
40
- Allow to change the number of embedding chunks that are considered for responding. The text chunk are around 250 tokens, which uses around 1000 tokens for each question.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
- #### Query mode
43
- By default, the mode is set to LLM (Language Model) which enables question/answering. You can directly ask questions related to the document content, and the system will answer the question using content from the document.
44
- If you switch the mode to "Embedding," the system will return specific chunks from the document that are semantically related to your query. This mode helps to test why sometimes the answers are not satisfying or incomplete.
45
 
46
  ## Development notes
47
 
@@ -59,10 +78,9 @@ To install the library with Pypi:
59
  - `pip install document-qa-engine`
60
 
61
 
62
-
63
  ## Acknolwedgement
64
 
65
- This project is developed at the [National Institute for Materials Science](https://www.nims.go.jp) (NIMS) in Japan in collaboration with the [Lambard-ML-Team](https://github.com/Lambard-ML-Team).
66
 
67
 
68
 
 
28
 
29
  ## Getting started
30
 
31
+ - Select the model+embedding combination you want ot use
32
  - Enter your API Key ([Open AI](https://platform.openai.com/account/api-keys) or [Huggingface](https://huggingface.co/docs/hub/security-tokens)).
33
  - Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress.
34
  - Once the spinner stops, you can proceed to ask your questions
35
 
36
  ![screenshot2.png](docs%2Fimages%2Fscreenshot2.png)
37
 
38
+ ## Documentation
39
+
40
+ ### Context size
41
+ Allow to change the number of blocks from the original document that are considered for responding.
42
+ The default size of each block is 250 tokens (which can be changed before uploading the first document).
43
+ With default settings, each question uses around 1000 tokens.
44
+
45
+ **NOTE**: if the chat answers something like "the information is not provided in the given context", **changing the context size might be a solution**~~~~.
46
+
47
+ ### Chunks size
48
+ When uploaded, each document is split into blocks of a determined size (250 tokens by default).
49
+ This setting allow users to modify the size of such blocks.
50
+ Smaller blocks will result in smaller context, yielding more precise sections of the document.
51
+ Larger blocks will result in larger context less constrained around the question.
52
+
53
+ ### Query mode
54
+ Indicates whether sending a question to the LLM (Language Model) or to the vector storage.
55
+ - LLM (default) enables question/answering related to the document content.
56
+ - Embeddings: the response will consist of the raw text from the document related to the question (based on the embeddings). This mode helps to test why sometimes the answers are not satisfying or incomplete.
57
+
58
+ ### NER (Named Entities Recognition)
59
+
60
+ This feature is specifically crafted for people working with scientific documents in materials science.
61
+ It enables to run NER on the response from the LLM, to identify materials mentions and properties (quantities, masurements).
62
+ This feature leverages both [grobid-quantities](https://github.com/kermitt2/grobid-quanities) and [grobid-superconductors](https://github.com/lfoppiano/grobid-superconductors) external services.
63
 
 
 
 
64
 
65
  ## Development notes
66
 
 
78
  - `pip install document-qa-engine`
79
 
80
 
 
81
  ## Acknolwedgement
82
 
83
+ This project is developed at the [National Institute for Materials Science](https://www.nims.go.jp) (NIMS) in Japan in collaboration with the [Lambard-ML-Team](https://github.com/Lambard-ML-Team).
84
 
85
 
86