writinwaters commited on
Commit
7346a8e
·
1 Parent(s): 30090ae

Added release notes (#3969)

Browse files

### What problem does this PR solve?



### Type of change

- [x] Documentation Update

docs/guides/configure_knowledge_base.md CHANGED
@@ -58,7 +58,7 @@ You can also change the chunk template for a particular file on the **Datasets**
58
 
59
  ### Select embedding model
60
 
61
- An embedding model converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, You must delete all chunks in the knowledge base. The obvious reason is that we *must* ensure that files in a specific knowledge base are converted to embeddings using the *same* embedding model (ensure that they are compared in the same embedding space).
62
 
63
  The following embedding models can be deployed locally:
64
 
 
58
 
59
  ### Select embedding model
60
 
61
+ An embedding model converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, you must delete all existing chunks in the knowledge base. The obvious reason is that we *must* ensure that files in a specific knowledge base are converted to embeddings using the *same* embedding model (ensure that they are compared in the same embedding space).
62
 
63
  The following embedding models can be deployed locally:
64
 
docs/release_notes.md CHANGED
@@ -13,7 +13,7 @@ Released on November 29, 2024.
13
 
14
  ### Improvements
15
 
16
- Adds [Infinity's configuration file](https://github.com/infiniflow/ragflow/blob/main/docker/infinity_conf.toml) to facilitate integration and customization of Infinity as a document engine. From this release onwards, updates to Infinity's configuration can be made directly within RAGFlow and will take effect immediately after restarting RAGFlow using `docker compose`. [#3715](https://github.com/infiniflow/ragflow/pull/3715)
17
 
18
  ### Fixed issues
19
 
@@ -137,7 +137,7 @@ See [Upgrade RAGFlow](https://ragflow.io/docs/dev/upgrade_ragflow) for instructi
137
 
138
  ## v0.11.0
139
 
140
- Released on September 14, 2024
141
 
142
  ### New features
143
 
@@ -152,4 +152,100 @@ Released on September 14, 2024
152
  - Supports running retrieval benchmarking on the following datasets:
153
  - [ms_marco_v1.1](https://huggingface.co/datasets/microsoft/ms_marco)
154
  - [trivia_qa](https://huggingface.co/datasets/mandarjoshi/trivia_qa)
155
- - [miracl](https://huggingface.co/datasets/miracl/miracl)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  ### Improvements
15
 
16
+ Adds [Infinity's configuration file](https://github.com/infiniflow/ragflow/blob/main/docker/infinity_conf.toml) to facilitate integration and customization of [Infinity](https://github.com/infiniflow/infinity) as a document engine. From this release onwards, updates to Infinity's configuration can be made directly within RAGFlow and will take effect immediately after restarting RAGFlow using `docker compose`. [#3715](https://github.com/infiniflow/ragflow/pull/3715)
17
 
18
  ### Fixed issues
19
 
 
137
 
138
  ## v0.11.0
139
 
140
+ Released on September 14, 2024.
141
 
142
  ### New features
143
 
 
152
  - Supports running retrieval benchmarking on the following datasets:
153
  - [ms_marco_v1.1](https://huggingface.co/datasets/microsoft/ms_marco)
154
  - [trivia_qa](https://huggingface.co/datasets/mandarjoshi/trivia_qa)
155
+ - [miracl](https://huggingface.co/datasets/miracl/miracl)
156
+
157
+ ## v0.10.0
158
+
159
+ Released on August 26, 2024.
160
+
161
+ ### New features
162
+
163
+ - Introduces a text-to-SQL template in the Agent UI.
164
+ - Implements Agent APIs.
165
+ - Incorporates monitoring for the task executor.
166
+ - Introduces Agent tools **GitHub**, **DeepL**, **BaiduFanyi**, **QWeather**, and **GoogleScholar**.
167
+ - Supports chunking of EML files.
168
+ - Supports more LLMs or model services: **GPT-4o-mini**, **PerfXCloud**, **TogetherAI**, **Upstage**, **Novita.AI**, **01.AI**, **SiliconFlow**, **XunFei Spark**, **Baidu Yiyan**, and **Tencent Hunyuan**.
169
+
170
+ ## v0.9.0
171
+
172
+ Released on August 6, 2024.
173
+
174
+ ### New features
175
+
176
+ - Supports GraphRAG as a chunk method.
177
+ - Introduces Agent component **Keyword** and search tools, including **Baidu**, **DduckDuckGo**, **PubMed**, **Wikipedia**, **Bing**, and **Google**.
178
+ - Supports speech-to-text recognition for audio files.
179
+ - Supports model vendors **Gemini** and **Groq**.
180
+ - Supports inference frameworks, engines, and services including **LM studio**, **OpenRouter**, **LocalAI**, and **Nvidia API**.
181
+ - Supports using reranker models in Xinference.
182
+
183
+ ## v0.8.0
184
+
185
+ Released on July 8, 2024.
186
+
187
+ ### New features
188
+
189
+ - Supports Agentic RAG, enabling graph-based workflow construction for RAG and agents.
190
+ - Supports model vendors **Mistral**, **MiniMax**, **Bedrock**, and **Azure OpenAI**.
191
+ - Supports DOCX files in the MANUAL chunk method.
192
+ - Supports DOCX, MD, and PDF files in the Q&A chunk method.
193
+
194
+ ## v0.7.0
195
+
196
+ Released on May 31, 2024.
197
+
198
+ ### New features
199
+
200
+ - Supports the use of reranker models.
201
+ - Integrates reranker and embedding models: [BCE](https://github.com/netease-youdao/BCEmbedding), [BGE](https://github.com/FlagOpen/FlagEmbedding), and [Jina](https://jina.ai/embeddings/).
202
+ - Supports LLMs Baichuan and VolcanoArk.
203
+ - Implements [RAPTOR](https://arxiv.org/html/2401.18059v1) for improved text retrieval.
204
+ - Supports HTML files in the GENERAL chunk method.
205
+ - Provides HTTP and Python APIs for deleting documents by ID.
206
+ - Supports ARM64 platforms.
207
+
208
+ :::danger IMPORTANT
209
+ While we also test RAGFlow on ARM64 platforms, we do not plan to maintain RAGFlow Docker images for ARM.
210
+
211
+ If you are on an ARM platform, following [this guide](https://ragflow.io/docs/dev/build_docker_image) to build a RAGFlow Docker image.
212
+ :::
213
+
214
+ ### Related APIs
215
+
216
+ #### HTTP API
217
+
218
+ - [Delete documents](https://ragflow.io/docs/dev/http_api_reference#delete-documents)
219
+
220
+ #### Python API
221
+
222
+ - [Delete documents](https://ragflow.io/docs/dev/python_api_reference#delete-documents)
223
+
224
+ ## v0.6.0
225
+
226
+ Released on May 21, 2024.
227
+
228
+ ### New features
229
+
230
+ - Supports streaming output.
231
+ - Provides HTTP and Python APIs for retrieving document chunks.
232
+ - Supports monitoring of system components, including Elasticsearch, MySQL, Redis, and MinIO.
233
+ - Supports disabling **Layout Recognition** in the GENERAL chunk method to reduce file chunking time.
234
+
235
+ ### Related APIs
236
+
237
+ #### HTTP API
238
+
239
+ - [Retrieve chunks](https://ragflow.io/docs/dev/http_api_reference#retrieve-chunks)
240
+
241
+ #### Python API
242
+
243
+ - [Retrieve chunks](https://ragflow.io/docs/dev/python_api_reference#retrieve-chunks)
244
+
245
+ ## v0.5.0
246
+
247
+ Released on May 8, 2024.
248
+
249
+ ### New features
250
+
251
+ - Supports LLM DeepSeek.
web/src/locales/en.ts CHANGED
@@ -86,7 +86,7 @@ export default {
86
  namePlaceholder: 'Please input name!',
87
  doc: 'Docs',
88
  datasetDescription:
89
- '😉 Questions and answers can only be answered after the parsing is successful.',
90
  addFile: 'Add file',
91
  searchFiles: 'Search your files',
92
  localFiles: 'Local files',
@@ -158,17 +158,17 @@ export default {
158
  topKTip: `K chunks will be fed into rerank models.`,
159
  delimiter: `Delimiter`,
160
  delimiterTip:
161
- 'Supports multiple characters as separators, and the multiple character separators are wrapped with `. For example, if it is configured like this: \n`##`; then the text will be separated by line breaks, two #s and a semicolon, and then assembled according to the size of the "token number".',
162
  html4excel: 'Excel to HTML',
163
  html4excelTip: `When enabled, the spreadsheet will be parsed into HTML tables, and at most 256 rows for one table. Otherwise, it will be parsed into key-value pairs by row.`,
164
  autoKeywords: 'Auto-keyword',
165
- autoKeywordsTip: `Extract N keywords for each chunk to increase their ranking for queries containing those keywords. You can check or update the added keywords for a chunk from the chunk list. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
166
  autoQuestions: 'Auto-question',
167
- autoQuestionsTip: `Extract N questions for each chunk to increase their ranking for queries containing those questions. You can check or update the added questions for a chunk from the chunk list. This feature will not disrupt the chunking process if an error occurs, except that it may add an empty result to the original chunk. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
168
  },
169
  knowledgeConfiguration: {
170
  titleDescription:
171
- 'Update your knowledge base configurations here, particularly the chunk method.',
172
  name: 'Knowledge base name',
173
  photo: 'Knowledge base photo',
174
  description: 'Description',
@@ -180,13 +180,13 @@ export default {
180
  chunkTokenNumber: 'Chunk token number',
181
  chunkTokenNumberMessage: 'Chunk token number is required',
182
  embeddingModelTip:
183
- 'The model that converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, You must delete all chunks in the knowledge base.',
184
  permissionsTip:
185
  "If set to 'Team', all team members will be able to manage the knowledge base.",
186
  chunkTokenNumberTip:
187
  'It sets the token threshold for a chunk. A paragraph with fewer tokens than this threshold will be combined with the following paragraph until the token count exceeds the threshold, at which point a chunk is created.',
188
  chunkMethod: 'Chunk method',
189
- chunkMethodTip: 'Tips are on the right.',
190
  upload: 'Upload',
191
  english: 'English',
192
  chinese: 'Chinese',
@@ -279,12 +279,12 @@ export default {
279
  </p>`,
280
  knowledgeGraph: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML</b>
281
 
282
- <p>This approach chunks files using the 'naive'/'General' method. It splits a document into segements and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number', at which point a chunk is created.</p>
283
  <p>The chunks are then fed to the LLM to extract entities and relationships for a knowledge graph and a mind map.</p>
284
  <p>Ensure that you set the <b>Entity types</b>.</p>`,
285
  useRaptor: 'Use RAPTOR to enhance retrieval',
286
  useRaptorTip:
287
- 'Recursive Abstractive Processing for Tree-Organized Retrieval, see https://huggingface.co/papers/2401.18059 for more information',
288
  prompt: 'Prompt',
289
  promptTip: 'LLM prompt used for summarization.',
290
  promptMessage: 'Prompt is required',
@@ -305,7 +305,7 @@ The above is the content you need to summarize.`,
305
  entityTypes: 'Entity types',
306
  vietnamese: 'Vietamese',
307
  pageRank: 'Page rank',
308
- pageRankTip: `This is used to boost the relevance score. The relevance score with all the retrieved chunks will plus this number, When you want to search the given knowledge base at first place, set a higher pagerank score than others.`,
309
  },
310
  chunk: {
311
  chunk: 'Chunk',
 
86
  namePlaceholder: 'Please input name!',
87
  doc: 'Docs',
88
  datasetDescription:
89
+ '😉 Please wait for your file to finish parsing before starting an AI-powered chat.',
90
  addFile: 'Add file',
91
  searchFiles: 'Search your files',
92
  localFiles: 'Local files',
 
158
  topKTip: `K chunks will be fed into rerank models.`,
159
  delimiter: `Delimiter`,
160
  delimiterTip:
161
+ 'A delimiter or separator can consist of one or multiple special characters. If it is multiple characters, ensure they are enclosed in backticks( ``). For example, if you configure your delimiters like this: \n`##`;, then your texts will be separated at line breaks, double hash symbols (##), or semicolons.',
162
  html4excel: 'Excel to HTML',
163
  html4excelTip: `When enabled, the spreadsheet will be parsed into HTML tables, and at most 256 rows for one table. Otherwise, it will be parsed into key-value pairs by row.`,
164
  autoKeywords: 'Auto-keyword',
165
+ autoKeywordsTip: `Automatically extract N keywords for each chunk to increase their ranking for queries containing those keywords. You can check or update the added keywords for a chunk from the chunk list. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
166
  autoQuestions: 'Auto-question',
167
+ autoQuestionsTip: `Automatically extract N questions for each chunk to increase their ranking for queries containing those questions. You can check or update the added questions for a chunk from the chunk list. This feature will not disrupt the chunking process if an error occurs, except that it may add an empty result to the original chunk. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
168
  },
169
  knowledgeConfiguration: {
170
  titleDescription:
171
+ 'Update your knowledge base configuration here, particularly the chunk method.',
172
  name: 'Knowledge base name',
173
  photo: 'Knowledge base photo',
174
  description: 'Description',
 
180
  chunkTokenNumber: 'Chunk token number',
181
  chunkTokenNumberMessage: 'Chunk token number is required',
182
  embeddingModelTip:
183
+ 'The model that converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, you must delete all existing chunks in the knowledge base.',
184
  permissionsTip:
185
  "If set to 'Team', all team members will be able to manage the knowledge base.",
186
  chunkTokenNumberTip:
187
  'It sets the token threshold for a chunk. A paragraph with fewer tokens than this threshold will be combined with the following paragraph until the token count exceeds the threshold, at which point a chunk is created.',
188
  chunkMethod: 'Chunk method',
189
+ chunkMethodTip: 'View the tips on the right.',
190
  upload: 'Upload',
191
  english: 'English',
192
  chinese: 'Chinese',
 
279
  </p>`,
280
  knowledgeGraph: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML</b>
281
 
282
+ <p>This approach chunks files using the 'naive'/'General' method. It splits a document into segments and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number', at which point a chunk is created.</p>
283
  <p>The chunks are then fed to the LLM to extract entities and relationships for a knowledge graph and a mind map.</p>
284
  <p>Ensure that you set the <b>Entity types</b>.</p>`,
285
  useRaptor: 'Use RAPTOR to enhance retrieval',
286
  useRaptorTip:
287
+ 'Recursive Abstractive Processing for Tree-Organized Retrieval, see https://huggingface.co/papers/2401.18059 for more information.',
288
  prompt: 'Prompt',
289
  promptTip: 'LLM prompt used for summarization.',
290
  promptMessage: 'Prompt is required',
 
305
  entityTypes: 'Entity types',
306
  vietnamese: 'Vietamese',
307
  pageRank: 'Page rank',
308
+ pageRankTip: `This increases the relevance score of the knowledge base. Its value will be added to the relevance score of all retrieved chunks from this knowledge base. Useful when you are searching within multiple knowledge bases and wanting to assign a higher pagerank score to a specific one.`,
309
  },
310
  chunk: {
311
  chunk: 'Chunk',