writinwaters
commited on
Commit
·
e587fd6
1
Parent(s):
4c39067
DRAFT: Miscellaneous proofedits on Python APIs (#2903)
Browse files### What problem does this PR solve?
### Type of change
- [x] Documentation Update
- api/python_api_reference.md +167 -125
api/python_api_reference.md
CHANGED
@@ -2,10 +2,14 @@
|
|
2 |
|
3 |
**THE API REFERENCES BELOW ARE STILL UNDER DEVELOPMENT.**
|
4 |
|
|
|
|
|
5 |
:::tip NOTE
|
6 |
Dataset Management
|
7 |
:::
|
8 |
|
|
|
|
|
9 |
## Create dataset
|
10 |
|
11 |
```python
|
@@ -55,11 +59,24 @@ The language setting of the dataset to create. Available options:
|
|
55 |
|
56 |
#### permission
|
57 |
|
58 |
-
Specifies who can
|
59 |
|
60 |
#### chunk_method, `str`
|
61 |
|
62 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63 |
|
64 |
#### parser_config
|
65 |
|
@@ -67,7 +84,7 @@ The parser configuration of the dataset. A `ParserConfig` object contains the fo
|
|
67 |
|
68 |
- `chunk_token_count`: Defaults to `128`.
|
69 |
- `layout_recognize`: Defaults to `True`.
|
70 |
-
- `delimiter`: Defaults to `
|
71 |
- `task_page_size`: Defaults to `12`.
|
72 |
|
73 |
### Returns
|
@@ -81,7 +98,7 @@ The parser configuration of the dataset. A `ParserConfig` object contains the fo
|
|
81 |
from ragflow import RAGFlow
|
82 |
|
83 |
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
84 |
-
|
85 |
```
|
86 |
|
87 |
---
|
@@ -92,13 +109,13 @@ ds = rag_object.create_dataset(name="kb_1")
|
|
92 |
RAGFlow.delete_datasets(ids: list[str] = None)
|
93 |
```
|
94 |
|
95 |
-
Deletes datasets
|
96 |
|
97 |
### Parameters
|
98 |
|
99 |
-
#### ids
|
100 |
|
101 |
-
The IDs of the datasets to delete.
|
102 |
|
103 |
### Returns
|
104 |
|
@@ -108,7 +125,7 @@ The IDs of the datasets to delete.
|
|
108 |
### Examples
|
109 |
|
110 |
```python
|
111 |
-
|
112 |
```
|
113 |
|
114 |
---
|
@@ -132,15 +149,18 @@ Retrieves a list of datasets.
|
|
132 |
|
133 |
#### page: `int`
|
134 |
|
135 |
-
|
136 |
|
137 |
#### page_size: `int`
|
138 |
|
139 |
-
The number of
|
140 |
|
141 |
-
####
|
142 |
|
143 |
-
The field by which
|
|
|
|
|
|
|
144 |
|
145 |
#### desc: `bool`
|
146 |
|
@@ -148,15 +168,15 @@ Indicates whether the retrieved datasets should be sorted in descending order. D
|
|
148 |
|
149 |
#### id: `str`
|
150 |
|
151 |
-
The
|
152 |
|
153 |
#### name: `str`
|
154 |
|
155 |
-
The name of the dataset to
|
156 |
|
157 |
### Returns
|
158 |
|
159 |
-
- Success: A list of `DataSet` objects
|
160 |
- Failure: `Exception`.
|
161 |
|
162 |
### Examples
|
@@ -164,8 +184,8 @@ The name of the dataset to be got. Defaults to `None`.
|
|
164 |
#### List all datasets
|
165 |
|
166 |
```python
|
167 |
-
for
|
168 |
-
print(
|
169 |
```
|
170 |
|
171 |
#### Retrieve a dataset by ID
|
@@ -183,16 +203,18 @@ print(dataset[0])
|
|
183 |
DataSet.update(update_message: dict)
|
184 |
```
|
185 |
|
186 |
-
Updates the current dataset.
|
187 |
|
188 |
### Parameters
|
189 |
|
190 |
#### update_message: `dict[str, str|int]`, *Required*
|
191 |
|
|
|
|
|
192 |
- `"name"`: `str` The name of the dataset to update.
|
193 |
-
- `"embedding_model"`: `str` The embedding model
|
194 |
- Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
|
195 |
-
- `"chunk_method"`: `str` The
|
196 |
- `"naive"`: General
|
197 |
- `"manual`: Manual
|
198 |
- `"qa"`: Q&A
|
@@ -216,8 +238,8 @@ Updates the current dataset.
|
|
216 |
```python
|
217 |
from ragflow import RAGFlow
|
218 |
|
219 |
-
|
220 |
-
dataset =
|
221 |
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})
|
222 |
```
|
223 |
|
@@ -239,7 +261,7 @@ Uploads documents to the current dataset.
|
|
239 |
|
240 |
### Parameters
|
241 |
|
242 |
-
#### document_list
|
243 |
|
244 |
A list of dictionaries representing the documents to upload, each containing the following keys:
|
245 |
|
@@ -272,6 +294,8 @@ Updates configurations for the current document.
|
|
272 |
|
273 |
#### update_message: `dict[str, str|dict[]]`, *Required*
|
274 |
|
|
|
|
|
275 |
- `"name"`: `str` The name of the document to update.
|
276 |
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document:
|
277 |
- `"chunk_token_count"`: Defaults to `128`.
|
@@ -302,9 +326,9 @@ Updates configurations for the current document.
|
|
302 |
```python
|
303 |
from ragflow import RAGFlow
|
304 |
|
305 |
-
|
306 |
-
dataset=
|
307 |
-
dataset=dataset[0]
|
308 |
doc = dataset.list_documents(id="wdfxb5t547d")
|
309 |
doc = doc[0]
|
310 |
doc.update([{"parser_config": {"chunk_token_count": 256}}, {"chunk_method": "manual"}])
|
@@ -318,7 +342,7 @@ doc.update([{"parser_config": {"chunk_token_count": 256}}, {"chunk_method": "man
|
|
318 |
Document.download() -> bytes
|
319 |
```
|
320 |
|
321 |
-
Downloads the current document
|
322 |
|
323 |
### Returns
|
324 |
|
@@ -350,30 +374,30 @@ Retrieves a list of documents from the current dataset.
|
|
350 |
|
351 |
### Parameters
|
352 |
|
353 |
-
#### id
|
354 |
|
355 |
The ID of the document to retrieve. Defaults to `None`.
|
356 |
|
357 |
-
#### keywords
|
358 |
|
359 |
The keywords to match document titles. Defaults to `None`.
|
360 |
|
361 |
-
#### offset
|
362 |
|
363 |
-
The
|
364 |
|
365 |
-
#### limit
|
366 |
|
367 |
-
|
368 |
|
369 |
-
#### orderby
|
370 |
|
371 |
-
The field by which
|
372 |
|
373 |
-
- `"create_time"` (
|
374 |
- `"update_time"`
|
375 |
|
376 |
-
#### desc
|
377 |
|
378 |
Indicates whether the retrieved documents should be sorted in descending order. Defaults to `True`.
|
379 |
|
@@ -384,22 +408,24 @@ Indicates whether the retrieved documents should be sorted in descending order.
|
|
384 |
|
385 |
A `Document` object contains the following attributes:
|
386 |
|
387 |
-
- `id
|
388 |
-
- `
|
389 |
-
- `
|
390 |
-
- `
|
391 |
-
- `
|
392 |
-
- `
|
393 |
-
- `
|
394 |
-
- `
|
395 |
-
- `
|
396 |
-
- `size`: `int`
|
397 |
-
- `token_count`: `int`
|
398 |
-
- `chunk_count`: `int`
|
399 |
-
- `progress`: `float`
|
400 |
-
- `progress_msg`: `str`
|
401 |
-
- `process_begin_at`: `datetime`
|
402 |
-
- `process_duation`: `float` Duration of the processing in seconds or minutes
|
|
|
|
|
403 |
|
404 |
### Examples
|
405 |
|
@@ -410,11 +436,10 @@ rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
|
410 |
dataset = rag.create_dataset(name="kb_1")
|
411 |
|
412 |
filename1 = "~/ragflow.txt"
|
413 |
-
blob=open(filename1 , "rb").read()
|
414 |
-
|
415 |
-
dataset.
|
416 |
-
|
417 |
-
print(d)
|
418 |
```
|
419 |
|
420 |
---
|
@@ -425,7 +450,13 @@ for d in dataset.list_documents(keywords="rag", offset=0, limit=12):
|
|
425 |
DataSet.delete_documents(ids: list[str] = None)
|
426 |
```
|
427 |
|
428 |
-
Deletes
|
|
|
|
|
|
|
|
|
|
|
|
|
429 |
|
430 |
### Returns
|
431 |
|
@@ -437,10 +468,10 @@ Deletes specified documents or all documents from the current dataset.
|
|
437 |
```python
|
438 |
from ragflow import RAGFlow
|
439 |
|
440 |
-
|
441 |
-
|
442 |
-
|
443 |
-
|
444 |
```
|
445 |
|
446 |
---
|
@@ -453,7 +484,7 @@ DataSet.async_parse_documents(document_ids:list[str]) -> None
|
|
453 |
|
454 |
### Parameters
|
455 |
|
456 |
-
#### document_ids: `list[str]
|
457 |
|
458 |
The IDs of the documents to parse.
|
459 |
|
@@ -465,23 +496,20 @@ The IDs of the documents to parse.
|
|
465 |
### Examples
|
466 |
|
467 |
```python
|
468 |
-
|
469 |
-
|
470 |
-
ds = rag.create_dataset(name="dataset_name")
|
471 |
documents = [
|
472 |
{'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
|
473 |
{'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
|
474 |
{'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
|
475 |
]
|
476 |
-
|
477 |
-
documents=
|
478 |
-
ids=[]
|
479 |
for document in documents:
|
480 |
ids.append(document.id)
|
481 |
-
|
482 |
-
print("Async bulk parsing initiated")
|
483 |
-
ds.async_cancel_parse_documents(ids)
|
484 |
-
print("Async bulk parsing cancelled")
|
485 |
```
|
486 |
|
487 |
---
|
@@ -494,9 +522,9 @@ DataSet.async_cancel_parse_documents(document_ids:list[str])-> None
|
|
494 |
|
495 |
### Parameters
|
496 |
|
497 |
-
#### document_ids: `list[str]
|
498 |
|
499 |
-
The IDs of the documents
|
500 |
|
501 |
### Returns
|
502 |
|
@@ -506,23 +534,22 @@ The IDs of the documents to stop parsing.
|
|
506 |
### Examples
|
507 |
|
508 |
```python
|
509 |
-
|
510 |
-
|
511 |
-
ds = rag.create_dataset(name="dataset_name")
|
512 |
documents = [
|
513 |
{'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
|
514 |
{'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
|
515 |
{'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
|
516 |
]
|
517 |
-
|
518 |
-
documents=
|
519 |
-
ids=[]
|
520 |
for document in documents:
|
521 |
ids.append(document.id)
|
522 |
-
|
523 |
-
print("Async bulk parsing initiated")
|
524 |
-
|
525 |
-
print("Async bulk parsing cancelled")
|
526 |
```
|
527 |
|
528 |
---
|
@@ -533,19 +560,21 @@ print("Async bulk parsing cancelled")
|
|
533 |
Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id : str = None) -> list[Chunk]
|
534 |
```
|
535 |
|
|
|
|
|
536 |
### Parameters
|
537 |
|
538 |
-
#### keywords
|
539 |
|
540 |
List chunks whose name has the given keywords. Defaults to `None`
|
541 |
|
542 |
-
#### offset
|
543 |
|
544 |
-
The
|
545 |
|
546 |
#### limit
|
547 |
|
548 |
-
|
549 |
|
550 |
#### id
|
551 |
|
@@ -553,19 +582,20 @@ The ID of the chunk to retrieve. Default: `None`
|
|
553 |
|
554 |
### Returns
|
555 |
|
556 |
-
list
|
|
|
557 |
|
558 |
### Examples
|
559 |
|
560 |
```python
|
561 |
from ragflow import RAGFlow
|
562 |
|
563 |
-
|
564 |
-
|
565 |
-
|
566 |
-
|
567 |
-
for
|
568 |
-
print(
|
569 |
```
|
570 |
|
571 |
## Add chunk
|
@@ -578,7 +608,7 @@ Document.add_chunk(content:str) -> Chunk
|
|
578 |
|
579 |
#### content: *Required*
|
580 |
|
581 |
-
The
|
582 |
|
583 |
#### important_keywords :`list[str]`
|
584 |
|
@@ -609,11 +639,13 @@ chunk = doc.add_chunk(content="xxxxxxx")
|
|
609 |
Document.delete_chunks(chunk_ids: list[str])
|
610 |
```
|
611 |
|
|
|
|
|
612 |
### Parameters
|
613 |
|
614 |
-
#### chunk_ids
|
615 |
|
616 |
-
|
617 |
|
618 |
### Returns
|
619 |
|
@@ -642,15 +674,17 @@ doc.delete_chunks(["id_1","id_2"])
|
|
642 |
Chunk.update(update_message: dict)
|
643 |
```
|
644 |
|
645 |
-
Updates the current chunk.
|
646 |
|
647 |
### Parameters
|
648 |
|
649 |
#### update_message: `dict[str, str|list[str]|int]` *Required*
|
650 |
|
|
|
|
|
651 |
- `"content"`: `str` Content of the chunk.
|
652 |
- `"important_keywords"`: `list[str]` A list of key terms to attach to the chunk.
|
653 |
-
- `"available"`: `int` The chunk's availability status in the dataset.
|
654 |
- `0`: Unavailable
|
655 |
- `1`: Available
|
656 |
|
@@ -697,11 +731,11 @@ The documents to search from. `None` means no limitation. Defaults to `None`.
|
|
697 |
|
698 |
#### offset: `int`
|
699 |
|
700 |
-
The
|
701 |
|
702 |
#### limit: `int`
|
703 |
|
704 |
-
The maximum number of chunks to
|
705 |
|
706 |
#### Similarity_threshold: `float`
|
707 |
|
@@ -764,6 +798,8 @@ for c in rag_object.retrieve(question="What's ragflow?",
|
|
764 |
Chat Assistant Management
|
765 |
:::
|
766 |
|
|
|
|
|
767 |
## Create chat assistant
|
768 |
|
769 |
```python
|
@@ -856,15 +892,17 @@ assi = rag.create_chat("Miss R", knowledgebases=list_kb)
|
|
856 |
Chat.update(update_message: dict)
|
857 |
```
|
858 |
|
859 |
-
Updates the current chat assistant.
|
860 |
|
861 |
### Parameters
|
862 |
|
863 |
-
#### update_message: `dict[str,
|
|
|
|
|
864 |
|
865 |
- `"name"`: `str` The name of the chat assistant to update.
|
866 |
- `"avatar"`: `str` Base64 encoding of the avatar. Defaults to `""`
|
867 |
-
- `"knowledgebases"`: `list[str]` datasets to update.
|
868 |
- `"llm"`: `dict` The LLM settings:
|
869 |
- `"model_name"`, `str` The chat model name.
|
870 |
- `"temperature"`, `float` Controls the randomness of the model's predictions.
|
@@ -906,17 +944,17 @@ assistant.update({"name": "Stefan", "llm": {"temperature": 0.8}, "prompt": {"top
|
|
906 |
|
907 |
## Delete chats
|
908 |
|
909 |
-
Deletes specified chat assistants.
|
910 |
-
|
911 |
```python
|
912 |
RAGFlow.delete_chats(ids: list[str] = None)
|
913 |
```
|
914 |
|
|
|
|
|
915 |
### Parameters
|
916 |
|
917 |
-
#### ids
|
918 |
|
919 |
-
IDs of the chat assistants to delete. If not specified, all chat assistants will be deleted.
|
920 |
|
921 |
### Returns
|
922 |
|
@@ -953,11 +991,11 @@ Retrieves a list of chat assistants.
|
|
953 |
|
954 |
#### page
|
955 |
|
956 |
-
Specifies the page on which the
|
957 |
|
958 |
#### page_size
|
959 |
|
960 |
-
The number of
|
961 |
|
962 |
#### order_by
|
963 |
|
@@ -985,8 +1023,8 @@ The name of the chat to retrieve. Defaults to `None`.
|
|
985 |
```python
|
986 |
from ragflow import RAGFlow
|
987 |
|
988 |
-
|
989 |
-
for assistant in
|
990 |
print(assistant)
|
991 |
```
|
992 |
|
@@ -996,6 +1034,8 @@ for assistant in rag.list_chats():
|
|
996 |
Chat-session APIs
|
997 |
:::
|
998 |
|
|
|
|
|
999 |
## Create session
|
1000 |
|
1001 |
```python
|
@@ -1036,12 +1076,14 @@ session = assistant.create_session()
|
|
1036 |
Session.update(update_message: dict)
|
1037 |
```
|
1038 |
|
1039 |
-
Updates the current session.
|
1040 |
|
1041 |
### Parameters
|
1042 |
|
1043 |
#### update_message: `dict[str, Any]`, *Required*
|
1044 |
|
|
|
|
|
1045 |
- `"name"`: `str` The name of the session to update.
|
1046 |
|
1047 |
### Returns
|
@@ -1169,17 +1211,17 @@ Lists sessions associated with the current chat assistant.
|
|
1169 |
|
1170 |
#### page
|
1171 |
|
1172 |
-
Specifies the page on which
|
1173 |
|
1174 |
#### page_size
|
1175 |
|
1176 |
-
The number of
|
1177 |
|
1178 |
#### orderby
|
1179 |
|
1180 |
-
The field by which
|
1181 |
|
1182 |
-
- `"create_time"` (
|
1183 |
- `"update_time"`
|
1184 |
|
1185 |
#### desc
|
@@ -1204,8 +1246,8 @@ The name of the chat to retrieve. Defaults to `None`.
|
|
1204 |
```python
|
1205 |
from ragflow import RAGFlow
|
1206 |
|
1207 |
-
|
1208 |
-
assistant =
|
1209 |
assistant = assistant[0]
|
1210 |
for session in assistant.list_sessions():
|
1211 |
print(session)
|
@@ -1219,13 +1261,13 @@ for session in assistant.list_sessions():
|
|
1219 |
Chat.delete_sessions(ids:list[str] = None)
|
1220 |
```
|
1221 |
|
1222 |
-
Deletes
|
1223 |
|
1224 |
### Parameters
|
1225 |
|
1226 |
-
#### ids
|
1227 |
|
1228 |
-
IDs of the sessions to delete. If not specified, all sessions associated with the current chat assistant will be deleted.
|
1229 |
|
1230 |
### Returns
|
1231 |
|
|
|
2 |
|
3 |
**THE API REFERENCES BELOW ARE STILL UNDER DEVELOPMENT.**
|
4 |
|
5 |
+
---
|
6 |
+
|
7 |
:::tip NOTE
|
8 |
Dataset Management
|
9 |
:::
|
10 |
|
11 |
+
---
|
12 |
+
|
13 |
## Create dataset
|
14 |
|
15 |
```python
|
|
|
59 |
|
60 |
#### permission
|
61 |
|
62 |
+
Specifies who can access the dataset to create. You can set it only to `"me"` for now.
|
63 |
|
64 |
#### chunk_method, `str`
|
65 |
|
66 |
+
The chunking method of the dataset to create. Available options:
|
67 |
+
|
68 |
+
- `"naive"`: General (default)
|
69 |
+
- `"manual`: Manual
|
70 |
+
- `"qa"`: Q&A
|
71 |
+
- `"table"`: Table
|
72 |
+
- `"paper"`: Paper
|
73 |
+
- `"book"`: Book
|
74 |
+
- `"laws"`: Laws
|
75 |
+
- `"presentation"`: Presentation
|
76 |
+
- `"picture"`: Picture
|
77 |
+
- `"one"`:One
|
78 |
+
- `"knowledge_graph"`: Knowledge Graph
|
79 |
+
- `"email"`: Email
|
80 |
|
81 |
#### parser_config
|
82 |
|
|
|
84 |
|
85 |
- `chunk_token_count`: Defaults to `128`.
|
86 |
- `layout_recognize`: Defaults to `True`.
|
87 |
+
- `delimiter`: Defaults to `"\n!?。;!?"`.
|
88 |
- `task_page_size`: Defaults to `12`.
|
89 |
|
90 |
### Returns
|
|
|
98 |
from ragflow import RAGFlow
|
99 |
|
100 |
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
101 |
+
dataset = rag_object.create_dataset(name="kb_1")
|
102 |
```
|
103 |
|
104 |
---
|
|
|
109 |
RAGFlow.delete_datasets(ids: list[str] = None)
|
110 |
```
|
111 |
|
112 |
+
Deletes specified datasets or all datasets in the system.
|
113 |
|
114 |
### Parameters
|
115 |
|
116 |
+
#### ids: `list[str]`
|
117 |
|
118 |
+
The IDs of the datasets to delete. Defaults to `None`. If not specified, all datasets in the system will be deleted.
|
119 |
|
120 |
### Returns
|
121 |
|
|
|
125 |
### Examples
|
126 |
|
127 |
```python
|
128 |
+
rag_object.delete_datasets(ids=["id_1","id_2"])
|
129 |
```
|
130 |
|
131 |
---
|
|
|
149 |
|
150 |
#### page: `int`
|
151 |
|
152 |
+
Specifies the page on which the datasets will be displayed. Defaults to `1`.
|
153 |
|
154 |
#### page_size: `int`
|
155 |
|
156 |
+
The number of datasets on each page. Defaults to `1024`.
|
157 |
|
158 |
+
#### orderby: `str`
|
159 |
|
160 |
+
The field by which datasets should be sorted. Available options:
|
161 |
+
|
162 |
+
- `"create_time"` (default)
|
163 |
+
- `"update_time"`
|
164 |
|
165 |
#### desc: `bool`
|
166 |
|
|
|
168 |
|
169 |
#### id: `str`
|
170 |
|
171 |
+
The ID of the dataset to retrieve. Defaults to `None`.
|
172 |
|
173 |
#### name: `str`
|
174 |
|
175 |
+
The name of the dataset to retrieve. Defaults to `None`.
|
176 |
|
177 |
### Returns
|
178 |
|
179 |
+
- Success: A list of `DataSet` objects.
|
180 |
- Failure: `Exception`.
|
181 |
|
182 |
### Examples
|
|
|
184 |
#### List all datasets
|
185 |
|
186 |
```python
|
187 |
+
for dataset in rag_object.list_datasets():
|
188 |
+
print(dataset)
|
189 |
```
|
190 |
|
191 |
#### Retrieve a dataset by ID
|
|
|
203 |
DataSet.update(update_message: dict)
|
204 |
```
|
205 |
|
206 |
+
Updates configurations for the current dataset.
|
207 |
|
208 |
### Parameters
|
209 |
|
210 |
#### update_message: `dict[str, str|int]`, *Required*
|
211 |
|
212 |
+
A dictionary representing the attributes to update, with the following keys:
|
213 |
+
|
214 |
- `"name"`: `str` The name of the dataset to update.
|
215 |
+
- `"embedding_model"`: `str` The embedding model name to update.
|
216 |
- Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
|
217 |
+
- `"chunk_method"`: `str` The chunking method for the dataset. Available options:
|
218 |
- `"naive"`: General
|
219 |
- `"manual`: Manual
|
220 |
- `"qa"`: Q&A
|
|
|
238 |
```python
|
239 |
from ragflow import RAGFlow
|
240 |
|
241 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
242 |
+
dataset = rag_object.list_datasets(name="kb_name")
|
243 |
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})
|
244 |
```
|
245 |
|
|
|
261 |
|
262 |
### Parameters
|
263 |
|
264 |
+
#### document_list: `list[dict]`, *Required*
|
265 |
|
266 |
A list of dictionaries representing the documents to upload, each containing the following keys:
|
267 |
|
|
|
294 |
|
295 |
#### update_message: `dict[str, str|dict[]]`, *Required*
|
296 |
|
297 |
+
A dictionary representing the attributes to update, with the following keys:
|
298 |
+
|
299 |
- `"name"`: `str` The name of the document to update.
|
300 |
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document:
|
301 |
- `"chunk_token_count"`: Defaults to `128`.
|
|
|
326 |
```python
|
327 |
from ragflow import RAGFlow
|
328 |
|
329 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
330 |
+
dataset = rag_object.list_datasets(id='id')
|
331 |
+
dataset = dataset[0]
|
332 |
doc = dataset.list_documents(id="wdfxb5t547d")
|
333 |
doc = doc[0]
|
334 |
doc.update([{"parser_config": {"chunk_token_count": 256}}, {"chunk_method": "manual"}])
|
|
|
342 |
Document.download() -> bytes
|
343 |
```
|
344 |
|
345 |
+
Downloads the current document.
|
346 |
|
347 |
### Returns
|
348 |
|
|
|
374 |
|
375 |
### Parameters
|
376 |
|
377 |
+
#### id: `str`
|
378 |
|
379 |
The ID of the document to retrieve. Defaults to `None`.
|
380 |
|
381 |
+
#### keywords: `str`
|
382 |
|
383 |
The keywords to match document titles. Defaults to `None`.
|
384 |
|
385 |
+
#### offset: `int`
|
386 |
|
387 |
+
The starting index for the documents to retrieve. Typically used in confunction with `limit`. Defaults to `0`.
|
388 |
|
389 |
+
#### limit: `int`
|
390 |
|
391 |
+
The maximum number of documents to retrieve. Defaults to `1024`. A value of `-1` indicates that all documents should be returned.
|
392 |
|
393 |
+
#### orderby: `str`
|
394 |
|
395 |
+
The field by which documents should be sorted. Available options:
|
396 |
|
397 |
+
- `"create_time"` (default)
|
398 |
- `"update_time"`
|
399 |
|
400 |
+
#### desc: `bool`
|
401 |
|
402 |
Indicates whether the retrieved documents should be sorted in descending order. Defaults to `True`.
|
403 |
|
|
|
408 |
|
409 |
A `Document` object contains the following attributes:
|
410 |
|
411 |
+
- `id`: The document ID. Defaults to `""`.
|
412 |
+
- `name`: The document name. Defaults to `""`.
|
413 |
+
- `thumbnail`: The thumbnail image of the document. Defaults to `None`.
|
414 |
+
- `knowledgebase_id`: The dataset ID associated with the document. Defaults to `None`.
|
415 |
+
- `chunk_method` The chunk method name. Defaults to `""`. ?????naive??????
|
416 |
+
- `parser_config`: `ParserConfig` Configuration object for the parser. Defaults to `{"pages": [[1, 1000000]]}`.
|
417 |
+
- `source_type`: The source type of the document. Defaults to `"local"`.
|
418 |
+
- `type`: Type or category of the document???????????. Defaults to `""`.
|
419 |
+
- `created_by`: `str` The creator of the document. Defaults to `""`.
|
420 |
+
- `size`: `int` The document size in bytes. Defaults to `0`.
|
421 |
+
- `token_count`: `int` The number of tokens in the document. Defaults to `0`.
|
422 |
+
- `chunk_count`: `int` The number of chunks that the document is split into. Defaults to `0`.
|
423 |
+
- `progress`: `float` The current processing progress as a percentage. Defaults to `0.0`.
|
424 |
+
- `progress_msg`: `str` A message indicating the current progress status. Defaults to `""`.
|
425 |
+
- `process_begin_at`: `datetime` The start time of document processing. Defaults to `None`.
|
426 |
+
- `process_duation`: `float` Duration of the processing in seconds or minutes.??????? Defaults to `0.0`.
|
427 |
+
- `run`: `str` ?????????????????? Defaults to `"0"`.
|
428 |
+
- `status`: `str` ??????????????????? Defaults to `"1"`.
|
429 |
|
430 |
### Examples
|
431 |
|
|
|
436 |
dataset = rag.create_dataset(name="kb_1")
|
437 |
|
438 |
filename1 = "~/ragflow.txt"
|
439 |
+
blob = open(filename1 , "rb").read()
|
440 |
+
dataset.upload_documents([{"name":filename1,"blob":blob}])
|
441 |
+
for doc in dataset.list_documents(keywords="rag", offset=0, limit=12):
|
442 |
+
print(doc)
|
|
|
443 |
```
|
444 |
|
445 |
---
|
|
|
450 |
DataSet.delete_documents(ids: list[str] = None)
|
451 |
```
|
452 |
|
453 |
+
Deletes documents by ID.
|
454 |
+
|
455 |
+
### Parameters
|
456 |
+
|
457 |
+
#### ids: `list[list]`
|
458 |
+
|
459 |
+
The IDs of the documents to delete. Defaults to `None`. If not specified, all documents in the dataset will be deleted.
|
460 |
|
461 |
### Returns
|
462 |
|
|
|
468 |
```python
|
469 |
from ragflow import RAGFlow
|
470 |
|
471 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
472 |
+
dataset = rag_object.list_datasets(name="kb_1")
|
473 |
+
dataset = dataset[0]
|
474 |
+
dataset.delete_documents(ids=["id_1","id_2"])
|
475 |
```
|
476 |
|
477 |
---
|
|
|
484 |
|
485 |
### Parameters
|
486 |
|
487 |
+
#### document_ids: `list[str]`, *Required*
|
488 |
|
489 |
The IDs of the documents to parse.
|
490 |
|
|
|
496 |
### Examples
|
497 |
|
498 |
```python
|
499 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
500 |
+
dataset = rag_object.create_dataset(name="dataset_name")
|
|
|
501 |
documents = [
|
502 |
{'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
|
503 |
{'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
|
504 |
{'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
|
505 |
]
|
506 |
+
dataset.upload_documents(documents)
|
507 |
+
documents = dataset.list_documents(keywords="test")
|
508 |
+
ids = []
|
509 |
for document in documents:
|
510 |
ids.append(document.id)
|
511 |
+
dataset.async_parse_documents(ids)
|
512 |
+
print("Async bulk parsing initiated.")
|
|
|
|
|
513 |
```
|
514 |
|
515 |
---
|
|
|
522 |
|
523 |
### Parameters
|
524 |
|
525 |
+
#### document_ids: `list[str]`, *Required*
|
526 |
|
527 |
+
The IDs of the documents for which parsing should be stopped.
|
528 |
|
529 |
### Returns
|
530 |
|
|
|
534 |
### Examples
|
535 |
|
536 |
```python
|
537 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
538 |
+
dataset = rag_object.create_dataset(name="dataset_name")
|
|
|
539 |
documents = [
|
540 |
{'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
|
541 |
{'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
|
542 |
{'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
|
543 |
]
|
544 |
+
dataset.upload_documents(documents)
|
545 |
+
documents = dataset.list_documents(keywords="test")
|
546 |
+
ids = []
|
547 |
for document in documents:
|
548 |
ids.append(document.id)
|
549 |
+
dataset.async_parse_documents(ids)
|
550 |
+
print("Async bulk parsing initiated.")
|
551 |
+
dataset.async_cancel_parse_documents(ids)
|
552 |
+
print("Async bulk parsing cancelled.")
|
553 |
```
|
554 |
|
555 |
---
|
|
|
560 |
Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id : str = None) -> list[Chunk]
|
561 |
```
|
562 |
|
563 |
+
Retrieves a list of document chunks.
|
564 |
+
|
565 |
### Parameters
|
566 |
|
567 |
+
#### keywords: `str`
|
568 |
|
569 |
List chunks whose name has the given keywords. Defaults to `None`
|
570 |
|
571 |
+
#### offset: `int`
|
572 |
|
573 |
+
The starting index for the chunks to retrieve. Defaults to `1`
|
574 |
|
575 |
#### limit
|
576 |
|
577 |
+
The maximum number of chunks to retrieve. Default: `30`
|
578 |
|
579 |
#### id
|
580 |
|
|
|
582 |
|
583 |
### Returns
|
584 |
|
585 |
+
- Success: A list of `Chunk` objects.
|
586 |
+
- Failure: `Exception`.
|
587 |
|
588 |
### Examples
|
589 |
|
590 |
```python
|
591 |
from ragflow import RAGFlow
|
592 |
|
593 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
594 |
+
dataset = rag_object.list_datasets("123")
|
595 |
+
dataset = dataset[0]
|
596 |
+
dataset.async_parse_documents(["wdfxb5t547d"])
|
597 |
+
for chunk in doc.list_chunks(keywords="rag", offset=0, limit=12):
|
598 |
+
print(chunk)
|
599 |
```
|
600 |
|
601 |
## Add chunk
|
|
|
608 |
|
609 |
#### content: *Required*
|
610 |
|
611 |
+
The text content of the chunk.
|
612 |
|
613 |
#### important_keywords :`list[str]`
|
614 |
|
|
|
639 |
Document.delete_chunks(chunk_ids: list[str])
|
640 |
```
|
641 |
|
642 |
+
Deletes chunks by ID.
|
643 |
+
|
644 |
### Parameters
|
645 |
|
646 |
+
#### chunk_ids: `list[str]`
|
647 |
|
648 |
+
The IDs of the chunks to delete. Defaults to `None`. If not specified, all chunks of the current document will be deleted.
|
649 |
|
650 |
### Returns
|
651 |
|
|
|
674 |
Chunk.update(update_message: dict)
|
675 |
```
|
676 |
|
677 |
+
Updates content or configurations for the current chunk.
|
678 |
|
679 |
### Parameters
|
680 |
|
681 |
#### update_message: `dict[str, str|list[str]|int]` *Required*
|
682 |
|
683 |
+
A dictionary representing the attributes to update, with the following keys:
|
684 |
+
|
685 |
- `"content"`: `str` Content of the chunk.
|
686 |
- `"important_keywords"`: `list[str]` A list of key terms to attach to the chunk.
|
687 |
+
- `"available"`: `int` The chunk's availability status in the dataset. Value options:
|
688 |
- `0`: Unavailable
|
689 |
- `1`: Available
|
690 |
|
|
|
731 |
|
732 |
#### offset: `int`
|
733 |
|
734 |
+
The starting index for the documents to retrieve. Defaults to `0`??????.
|
735 |
|
736 |
#### limit: `int`
|
737 |
|
738 |
+
The maximum number of chunks to retrieve. Defaults to `6`.
|
739 |
|
740 |
#### Similarity_threshold: `float`
|
741 |
|
|
|
798 |
Chat Assistant Management
|
799 |
:::
|
800 |
|
801 |
+
---
|
802 |
+
|
803 |
## Create chat assistant
|
804 |
|
805 |
```python
|
|
|
892 |
Chat.update(update_message: dict)
|
893 |
```
|
894 |
|
895 |
+
Updates configurations for the current chat assistant.
|
896 |
|
897 |
### Parameters
|
898 |
|
899 |
+
#### update_message: `dict[str, str|list[str]|dict[]]`, *Required*
|
900 |
+
|
901 |
+
A dictionary representing the attributes to update, with the following keys:
|
902 |
|
903 |
- `"name"`: `str` The name of the chat assistant to update.
|
904 |
- `"avatar"`: `str` Base64 encoding of the avatar. Defaults to `""`
|
905 |
+
- `"knowledgebases"`: `list[str]` The datasets to update.
|
906 |
- `"llm"`: `dict` The LLM settings:
|
907 |
- `"model_name"`, `str` The chat model name.
|
908 |
- `"temperature"`, `float` Controls the randomness of the model's predictions.
|
|
|
944 |
|
945 |
## Delete chats
|
946 |
|
|
|
|
|
947 |
```python
|
948 |
RAGFlow.delete_chats(ids: list[str] = None)
|
949 |
```
|
950 |
|
951 |
+
Deletes chat assistants by ID.
|
952 |
+
|
953 |
### Parameters
|
954 |
|
955 |
+
#### ids: `list[str]`
|
956 |
|
957 |
+
The IDs of the chat assistants to delete. Defaults to `None`. If not specified, all chat assistants in the system will be deleted.
|
958 |
|
959 |
### Returns
|
960 |
|
|
|
991 |
|
992 |
#### page
|
993 |
|
994 |
+
Specifies the page on which the chat assistants will be displayed. Defaults to `1`.
|
995 |
|
996 |
#### page_size
|
997 |
|
998 |
+
The number of chat assistants on each page. Defaults to `1024`.
|
999 |
|
1000 |
#### order_by
|
1001 |
|
|
|
1023 |
```python
|
1024 |
from ragflow import RAGFlow
|
1025 |
|
1026 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
1027 |
+
for assistant in rag_object.list_chats():
|
1028 |
print(assistant)
|
1029 |
```
|
1030 |
|
|
|
1034 |
Chat-session APIs
|
1035 |
:::
|
1036 |
|
1037 |
+
---
|
1038 |
+
|
1039 |
## Create session
|
1040 |
|
1041 |
```python
|
|
|
1076 |
Session.update(update_message: dict)
|
1077 |
```
|
1078 |
|
1079 |
+
Updates the current session name.
|
1080 |
|
1081 |
### Parameters
|
1082 |
|
1083 |
#### update_message: `dict[str, Any]`, *Required*
|
1084 |
|
1085 |
+
A dictionary representing the attributes to update, with only one key:
|
1086 |
+
|
1087 |
- `"name"`: `str` The name of the session to update.
|
1088 |
|
1089 |
### Returns
|
|
|
1211 |
|
1212 |
#### page
|
1213 |
|
1214 |
+
Specifies the page on which the sessions will be displayed. Defaults to `1`.
|
1215 |
|
1216 |
#### page_size
|
1217 |
|
1218 |
+
The number of sessions on each page. Defaults to `1024`.
|
1219 |
|
1220 |
#### orderby
|
1221 |
|
1222 |
+
The field by which sessions should be sorted. Available options:
|
1223 |
|
1224 |
+
- `"create_time"` (default)
|
1225 |
- `"update_time"`
|
1226 |
|
1227 |
#### desc
|
|
|
1246 |
```python
|
1247 |
from ragflow import RAGFlow
|
1248 |
|
1249 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
1250 |
+
assistant = rag_object.list_chats(name="Miss R")
|
1251 |
assistant = assistant[0]
|
1252 |
for session in assistant.list_sessions():
|
1253 |
print(session)
|
|
|
1261 |
Chat.delete_sessions(ids:list[str] = None)
|
1262 |
```
|
1263 |
|
1264 |
+
Deletes sessions by ID.
|
1265 |
|
1266 |
### Parameters
|
1267 |
|
1268 |
+
#### ids: `list[str]`
|
1269 |
|
1270 |
+
The IDs of the sessions to delete. Defaults to `None`. If not specified, all sessions associated with the current chat assistant will be deleted.
|
1271 |
|
1272 |
### Returns
|
1273 |
|