writinwaters
commited on
Commit
·
7346a8e
1
Parent(s):
30090ae
Added release notes (#3969)
Browse files### What problem does this PR solve?
### Type of change
- [x] Documentation Update
- docs/guides/configure_knowledge_base.md +1 -1
- docs/release_notes.md +99 -3
- web/src/locales/en.ts +10 -10
docs/guides/configure_knowledge_base.md
CHANGED
@@ -58,7 +58,7 @@ You can also change the chunk template for a particular file on the **Datasets**
|
|
58 |
|
59 |
### Select embedding model
|
60 |
|
61 |
-
An embedding model converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model,
|
62 |
|
63 |
The following embedding models can be deployed locally:
|
64 |
|
|
|
58 |
|
59 |
### Select embedding model
|
60 |
|
61 |
+
An embedding model converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, you must delete all existing chunks in the knowledge base. The obvious reason is that we *must* ensure that files in a specific knowledge base are converted to embeddings using the *same* embedding model (ensure that they are compared in the same embedding space).
|
62 |
|
63 |
The following embedding models can be deployed locally:
|
64 |
|
docs/release_notes.md
CHANGED
@@ -13,7 +13,7 @@ Released on November 29, 2024.
|
|
13 |
|
14 |
### Improvements
|
15 |
|
16 |
-
Adds [Infinity's configuration file](https://github.com/infiniflow/ragflow/blob/main/docker/infinity_conf.toml) to facilitate integration and customization of Infinity as a document engine. From this release onwards, updates to Infinity's configuration can be made directly within RAGFlow and will take effect immediately after restarting RAGFlow using `docker compose`. [#3715](https://github.com/infiniflow/ragflow/pull/3715)
|
17 |
|
18 |
### Fixed issues
|
19 |
|
@@ -137,7 +137,7 @@ See [Upgrade RAGFlow](https://ragflow.io/docs/dev/upgrade_ragflow) for instructi
|
|
137 |
|
138 |
## v0.11.0
|
139 |
|
140 |
-
Released on September 14, 2024
|
141 |
|
142 |
### New features
|
143 |
|
@@ -152,4 +152,100 @@ Released on September 14, 2024
|
|
152 |
- Supports running retrieval benchmarking on the following datasets:
|
153 |
- [ms_marco_v1.1](https://huggingface.co/datasets/microsoft/ms_marco)
|
154 |
- [trivia_qa](https://huggingface.co/datasets/mandarjoshi/trivia_qa)
|
155 |
-
- [miracl](https://huggingface.co/datasets/miracl/miracl)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
### Improvements
|
15 |
|
16 |
+
Adds [Infinity's configuration file](https://github.com/infiniflow/ragflow/blob/main/docker/infinity_conf.toml) to facilitate integration and customization of [Infinity](https://github.com/infiniflow/infinity) as a document engine. From this release onwards, updates to Infinity's configuration can be made directly within RAGFlow and will take effect immediately after restarting RAGFlow using `docker compose`. [#3715](https://github.com/infiniflow/ragflow/pull/3715)
|
17 |
|
18 |
### Fixed issues
|
19 |
|
|
|
137 |
|
138 |
## v0.11.0
|
139 |
|
140 |
+
Released on September 14, 2024.
|
141 |
|
142 |
### New features
|
143 |
|
|
|
152 |
- Supports running retrieval benchmarking on the following datasets:
|
153 |
- [ms_marco_v1.1](https://huggingface.co/datasets/microsoft/ms_marco)
|
154 |
- [trivia_qa](https://huggingface.co/datasets/mandarjoshi/trivia_qa)
|
155 |
+
- [miracl](https://huggingface.co/datasets/miracl/miracl)
|
156 |
+
|
157 |
+
## v0.10.0
|
158 |
+
|
159 |
+
Released on August 26, 2024.
|
160 |
+
|
161 |
+
### New features
|
162 |
+
|
163 |
+
- Introduces a text-to-SQL template in the Agent UI.
|
164 |
+
- Implements Agent APIs.
|
165 |
+
- Incorporates monitoring for the task executor.
|
166 |
+
- Introduces Agent tools **GitHub**, **DeepL**, **BaiduFanyi**, **QWeather**, and **GoogleScholar**.
|
167 |
+
- Supports chunking of EML files.
|
168 |
+
- Supports more LLMs or model services: **GPT-4o-mini**, **PerfXCloud**, **TogetherAI**, **Upstage**, **Novita.AI**, **01.AI**, **SiliconFlow**, **XunFei Spark**, **Baidu Yiyan**, and **Tencent Hunyuan**.
|
169 |
+
|
170 |
+
## v0.9.0
|
171 |
+
|
172 |
+
Released on August 6, 2024.
|
173 |
+
|
174 |
+
### New features
|
175 |
+
|
176 |
+
- Supports GraphRAG as a chunk method.
|
177 |
+
- Introduces Agent component **Keyword** and search tools, including **Baidu**, **DduckDuckGo**, **PubMed**, **Wikipedia**, **Bing**, and **Google**.
|
178 |
+
- Supports speech-to-text recognition for audio files.
|
179 |
+
- Supports model vendors **Gemini** and **Groq**.
|
180 |
+
- Supports inference frameworks, engines, and services including **LM studio**, **OpenRouter**, **LocalAI**, and **Nvidia API**.
|
181 |
+
- Supports using reranker models in Xinference.
|
182 |
+
|
183 |
+
## v0.8.0
|
184 |
+
|
185 |
+
Released on July 8, 2024.
|
186 |
+
|
187 |
+
### New features
|
188 |
+
|
189 |
+
- Supports Agentic RAG, enabling graph-based workflow construction for RAG and agents.
|
190 |
+
- Supports model vendors **Mistral**, **MiniMax**, **Bedrock**, and **Azure OpenAI**.
|
191 |
+
- Supports DOCX files in the MANUAL chunk method.
|
192 |
+
- Supports DOCX, MD, and PDF files in the Q&A chunk method.
|
193 |
+
|
194 |
+
## v0.7.0
|
195 |
+
|
196 |
+
Released on May 31, 2024.
|
197 |
+
|
198 |
+
### New features
|
199 |
+
|
200 |
+
- Supports the use of reranker models.
|
201 |
+
- Integrates reranker and embedding models: [BCE](https://github.com/netease-youdao/BCEmbedding), [BGE](https://github.com/FlagOpen/FlagEmbedding), and [Jina](https://jina.ai/embeddings/).
|
202 |
+
- Supports LLMs Baichuan and VolcanoArk.
|
203 |
+
- Implements [RAPTOR](https://arxiv.org/html/2401.18059v1) for improved text retrieval.
|
204 |
+
- Supports HTML files in the GENERAL chunk method.
|
205 |
+
- Provides HTTP and Python APIs for deleting documents by ID.
|
206 |
+
- Supports ARM64 platforms.
|
207 |
+
|
208 |
+
:::danger IMPORTANT
|
209 |
+
While we also test RAGFlow on ARM64 platforms, we do not plan to maintain RAGFlow Docker images for ARM.
|
210 |
+
|
211 |
+
If you are on an ARM platform, following [this guide](https://ragflow.io/docs/dev/build_docker_image) to build a RAGFlow Docker image.
|
212 |
+
:::
|
213 |
+
|
214 |
+
### Related APIs
|
215 |
+
|
216 |
+
#### HTTP API
|
217 |
+
|
218 |
+
- [Delete documents](https://ragflow.io/docs/dev/http_api_reference#delete-documents)
|
219 |
+
|
220 |
+
#### Python API
|
221 |
+
|
222 |
+
- [Delete documents](https://ragflow.io/docs/dev/python_api_reference#delete-documents)
|
223 |
+
|
224 |
+
## v0.6.0
|
225 |
+
|
226 |
+
Released on May 21, 2024.
|
227 |
+
|
228 |
+
### New features
|
229 |
+
|
230 |
+
- Supports streaming output.
|
231 |
+
- Provides HTTP and Python APIs for retrieving document chunks.
|
232 |
+
- Supports monitoring of system components, including Elasticsearch, MySQL, Redis, and MinIO.
|
233 |
+
- Supports disabling **Layout Recognition** in the GENERAL chunk method to reduce file chunking time.
|
234 |
+
|
235 |
+
### Related APIs
|
236 |
+
|
237 |
+
#### HTTP API
|
238 |
+
|
239 |
+
- [Retrieve chunks](https://ragflow.io/docs/dev/http_api_reference#retrieve-chunks)
|
240 |
+
|
241 |
+
#### Python API
|
242 |
+
|
243 |
+
- [Retrieve chunks](https://ragflow.io/docs/dev/python_api_reference#retrieve-chunks)
|
244 |
+
|
245 |
+
## v0.5.0
|
246 |
+
|
247 |
+
Released on May 8, 2024.
|
248 |
+
|
249 |
+
### New features
|
250 |
+
|
251 |
+
- Supports LLM DeepSeek.
|
web/src/locales/en.ts
CHANGED
@@ -86,7 +86,7 @@ export default {
|
|
86 |
namePlaceholder: 'Please input name!',
|
87 |
doc: 'Docs',
|
88 |
datasetDescription:
|
89 |
-
'😉
|
90 |
addFile: 'Add file',
|
91 |
searchFiles: 'Search your files',
|
92 |
localFiles: 'Local files',
|
@@ -158,17 +158,17 @@ export default {
|
|
158 |
topKTip: `K chunks will be fed into rerank models.`,
|
159 |
delimiter: `Delimiter`,
|
160 |
delimiterTip:
|
161 |
-
'
|
162 |
html4excel: 'Excel to HTML',
|
163 |
html4excelTip: `When enabled, the spreadsheet will be parsed into HTML tables, and at most 256 rows for one table. Otherwise, it will be parsed into key-value pairs by row.`,
|
164 |
autoKeywords: 'Auto-keyword',
|
165 |
-
autoKeywordsTip: `
|
166 |
autoQuestions: 'Auto-question',
|
167 |
-
autoQuestionsTip: `
|
168 |
},
|
169 |
knowledgeConfiguration: {
|
170 |
titleDescription:
|
171 |
-
'Update your knowledge base
|
172 |
name: 'Knowledge base name',
|
173 |
photo: 'Knowledge base photo',
|
174 |
description: 'Description',
|
@@ -180,13 +180,13 @@ export default {
|
|
180 |
chunkTokenNumber: 'Chunk token number',
|
181 |
chunkTokenNumberMessage: 'Chunk token number is required',
|
182 |
embeddingModelTip:
|
183 |
-
'The model that converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model,
|
184 |
permissionsTip:
|
185 |
"If set to 'Team', all team members will be able to manage the knowledge base.",
|
186 |
chunkTokenNumberTip:
|
187 |
'It sets the token threshold for a chunk. A paragraph with fewer tokens than this threshold will be combined with the following paragraph until the token count exceeds the threshold, at which point a chunk is created.',
|
188 |
chunkMethod: 'Chunk method',
|
189 |
-
chunkMethodTip: '
|
190 |
upload: 'Upload',
|
191 |
english: 'English',
|
192 |
chinese: 'Chinese',
|
@@ -279,12 +279,12 @@ export default {
|
|
279 |
</p>`,
|
280 |
knowledgeGraph: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML</b>
|
281 |
|
282 |
-
<p>This approach chunks files using the 'naive'/'General' method. It splits a document into
|
283 |
<p>The chunks are then fed to the LLM to extract entities and relationships for a knowledge graph and a mind map.</p>
|
284 |
<p>Ensure that you set the <b>Entity types</b>.</p>`,
|
285 |
useRaptor: 'Use RAPTOR to enhance retrieval',
|
286 |
useRaptorTip:
|
287 |
-
'Recursive Abstractive Processing for Tree-Organized Retrieval, see https://huggingface.co/papers/2401.18059 for more information',
|
288 |
prompt: 'Prompt',
|
289 |
promptTip: 'LLM prompt used for summarization.',
|
290 |
promptMessage: 'Prompt is required',
|
@@ -305,7 +305,7 @@ The above is the content you need to summarize.`,
|
|
305 |
entityTypes: 'Entity types',
|
306 |
vietnamese: 'Vietamese',
|
307 |
pageRank: 'Page rank',
|
308 |
-
pageRankTip: `This
|
309 |
},
|
310 |
chunk: {
|
311 |
chunk: 'Chunk',
|
|
|
86 |
namePlaceholder: 'Please input name!',
|
87 |
doc: 'Docs',
|
88 |
datasetDescription:
|
89 |
+
'😉 Please wait for your file to finish parsing before starting an AI-powered chat.',
|
90 |
addFile: 'Add file',
|
91 |
searchFiles: 'Search your files',
|
92 |
localFiles: 'Local files',
|
|
|
158 |
topKTip: `K chunks will be fed into rerank models.`,
|
159 |
delimiter: `Delimiter`,
|
160 |
delimiterTip:
|
161 |
+
'A delimiter or separator can consist of one or multiple special characters. If it is multiple characters, ensure they are enclosed in backticks( ``). For example, if you configure your delimiters like this: \n`##`;, then your texts will be separated at line breaks, double hash symbols (##), or semicolons.',
|
162 |
html4excel: 'Excel to HTML',
|
163 |
html4excelTip: `When enabled, the spreadsheet will be parsed into HTML tables, and at most 256 rows for one table. Otherwise, it will be parsed into key-value pairs by row.`,
|
164 |
autoKeywords: 'Auto-keyword',
|
165 |
+
autoKeywordsTip: `Automatically extract N keywords for each chunk to increase their ranking for queries containing those keywords. You can check or update the added keywords for a chunk from the chunk list. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
|
166 |
autoQuestions: 'Auto-question',
|
167 |
+
autoQuestionsTip: `Automatically extract N questions for each chunk to increase their ranking for queries containing those questions. You can check or update the added questions for a chunk from the chunk list. This feature will not disrupt the chunking process if an error occurs, except that it may add an empty result to the original chunk. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
|
168 |
},
|
169 |
knowledgeConfiguration: {
|
170 |
titleDescription:
|
171 |
+
'Update your knowledge base configuration here, particularly the chunk method.',
|
172 |
name: 'Knowledge base name',
|
173 |
photo: 'Knowledge base photo',
|
174 |
description: 'Description',
|
|
|
180 |
chunkTokenNumber: 'Chunk token number',
|
181 |
chunkTokenNumberMessage: 'Chunk token number is required',
|
182 |
embeddingModelTip:
|
183 |
+
'The model that converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, you must delete all existing chunks in the knowledge base.',
|
184 |
permissionsTip:
|
185 |
"If set to 'Team', all team members will be able to manage the knowledge base.",
|
186 |
chunkTokenNumberTip:
|
187 |
'It sets the token threshold for a chunk. A paragraph with fewer tokens than this threshold will be combined with the following paragraph until the token count exceeds the threshold, at which point a chunk is created.',
|
188 |
chunkMethod: 'Chunk method',
|
189 |
+
chunkMethodTip: 'View the tips on the right.',
|
190 |
upload: 'Upload',
|
191 |
english: 'English',
|
192 |
chinese: 'Chinese',
|
|
|
279 |
</p>`,
|
280 |
knowledgeGraph: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML</b>
|
281 |
|
282 |
+
<p>This approach chunks files using the 'naive'/'General' method. It splits a document into segments and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number', at which point a chunk is created.</p>
|
283 |
<p>The chunks are then fed to the LLM to extract entities and relationships for a knowledge graph and a mind map.</p>
|
284 |
<p>Ensure that you set the <b>Entity types</b>.</p>`,
|
285 |
useRaptor: 'Use RAPTOR to enhance retrieval',
|
286 |
useRaptorTip:
|
287 |
+
'Recursive Abstractive Processing for Tree-Organized Retrieval, see https://huggingface.co/papers/2401.18059 for more information.',
|
288 |
prompt: 'Prompt',
|
289 |
promptTip: 'LLM prompt used for summarization.',
|
290 |
promptMessage: 'Prompt is required',
|
|
|
305 |
entityTypes: 'Entity types',
|
306 |
vietnamese: 'Vietamese',
|
307 |
pageRank: 'Page rank',
|
308 |
+
pageRankTip: `This increases the relevance score of the knowledge base. Its value will be added to the relevance score of all retrieved chunks from this knowledge base. Useful when you are searching within multiple knowledge bases and wanting to assign a higher pagerank score to a specific one.`,
|
309 |
},
|
310 |
chunk: {
|
311 |
chunk: 'Chunk',
|