Update README.md
Browse files
README.md
CHANGED
@@ -36,11 +36,27 @@ language:
|
|
36 |
<a href="https://github.com/netease-youdao/BCEmbedding">GitHub</a>
|
37 |
</p>
|
38 |
|
39 |
-
|
40 |
-
|
41 |
-
-
|
42 |
-
|
43 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
Related link for **EmbeddingModel** : [bce-embedding-base_v1](https://huggingface.co/maidalun1020/bce-embedding-base_v1)
|
46 |
|
|
|
36 |
<a href="https://github.com/netease-youdao/BCEmbedding">GitHub</a>
|
37 |
</p>
|
38 |
|
39 |
+
### Our Goals
|
40 |
+
|
41 |
+
Provide a bilingual and crosslingual two-stage retrieval model repository for the RAG community, which can be used directly without finetuning, including `EmbeddingModel` and `RerankerModel`:
|
42 |
+
|
43 |
+
- One Model: `EmbeddingModel` handle **bilingual and crosslingual** retrieval task in English and Chinese. `RerankerModel` supports **English, Chinese, Japanese and Korean**.
|
44 |
+
- One Model: **Cover common business application scenarios with RAG optimization**. e.g. Education, Medical Scenario, Law, Finance, Literature, FAQ, Textbook, Wikipedia, General Conversation.
|
45 |
+
- Easy to Integrate: We provide **API** in `BCEmbedding` for LlamaIndex and LangChain integrations.
|
46 |
+
- Others Points:
|
47 |
+
- `RerankerModel` supports **long passages (more than 512 tokens) reranking**;
|
48 |
+
- `RerankerModel` provides **meaningful relevance score** that helps to remove passages with low quality.
|
49 |
+
- `EmbeddingModel` **does not need specific instructions**.
|
50 |
+
|
51 |
+
给RAG社区一个可以直接拿来用,尽可能不需要用户finetune的中英双语和跨语种二阶段检索模型库,包含`EmbeddingModel`和`RerankerModel`。
|
52 |
+
|
53 |
+
- 只需一个模型:`EmbeddingModel`覆盖 **中英双语和中英跨语种** 检索任务,尤其是其跨语种能力。`RerankerModel`支持 **中英日韩** 四个语种及其跨语种。
|
54 |
+
- 只需一个模型: **覆盖常见业务落地领域**(针对众多常见rag场景已做优化),比如:教育、医疗、法律、金融、科研论文、客服(FAQ)、书籍、百科、通用QA等场景。用户不需要在上述特定领域finetune,直接可以用。
|
55 |
+
- 方便集成:`EmbeddingModel`和`RerankerModel`提供了LlamaIndex和LangChain **集成接口** ,用户可非常方便集成进现有产品中。
|
56 |
+
- 其他特性:
|
57 |
+
- `RerankerModel`支持 **长passage(超过512)rerank**;
|
58 |
+
- `RerankerModel`可以给出有意义 **相关性分数** ,帮助 **过滤低质量召回**;
|
59 |
+
- `EmbeddingModel` **不需要“精心设计”instruction** ,尽可能召回有用片段。
|
60 |
|
61 |
Related link for **EmbeddingModel** : [bce-embedding-base_v1](https://huggingface.co/maidalun1020/bce-embedding-base_v1)
|
62 |
|