maidalun1020
/

bce-embedding-base_v1

@@ -12,7 +12,7 @@ license: apache-2.0
 <h1 align="center">BCEmbedding: Bilingual and Crosslingual Embedding for RAG</h1>
 <p align="center">
-  <a href="https://github.com/netease-youdao/BCEmbedding/LICENSE">
     <img src="https://img.shields.io/badge/license-Apache--2.0-yellow">
   </a>
   <a href="https://twitter.com/YDopensource">
@@ -27,24 +27,24 @@ license: apache-2.0
 <details open="open">
 <summary>Click to Open Contents</summary>
-- <a href="#t1">🌐 Bilingual and Crosslingual Superiority</a>
-- <a href="#t2">💡 Key Features</a>
-- <a href="#t3">🚀 Latest Updates</a>
-- <a href="#t4">🍎 Model List</a>
-- <a href="#t5">📖 Manual</a>
-  - <a href="#installation">Installation</a>
-  - <a href="#quick-start">Quick Start</a>
-- <a href="#t6">⚙️ Evaluation</a>
-  - <a href="#evaluate-semantic-representation-by-mteb">Evaluate Semantic Representation by MTEB</a>
-  - <a href="#evaluate-rag-by-llamaindex">Evaluate RAG by LlamaIndex</a>
-- <a href="#t7">📈 Leaderboard</a>
-  - <a href="#semantic-representation-evaluations-in-mteb">Semantic Representation Evaluations in MTEB</a>
-  - <a href="#rag-evaluations-in-llamaindex">RAG Evaluations in LlamaIndex</a>
-- <a href="#t8">🛠 Youdao's BCEmbedding API</a>
-- <a href="#t9">🧲 WeChat Group</a>
-- <a href="#t10">✏️ Citation</a>
-- <a href="#t11">🔐 License</a>
-- <a href="#t12">🔗 Related Links</a>
 </details>
 <br>
@@ -54,18 +54,17 @@ license: apache-2.0
 `BCEmbedding` serves as the cornerstone of Youdao's Retrieval Augmented Generation (RAG) implmentation, notably [QAnything](http://qanything.ai) [[github](https://github.com/netease-youdao/qanything)], an open-source implementation widely integrated in various Youdao products like [Youdao Speed Reading](https://read.youdao.com/#/home) and [Youdao Translation](https://fanyi.youdao.com/download-Mac?keyfrom=fanyiweb_navigation).
 Distinguished for its bilingual and crosslingual proficiency, `BCEmbedding` excels in bridging Chinese and English linguistic gaps, which achieves
-- **A high performence on <a href=#semantic-representation-evaluations-in-mteb>Semantic Representation Evaluations in MTEB</a>**;
-- **A new benchmark in the realm of <a href=#rag-evaluations-in-llamaindex>RAG Evaluations in LlamaIndex</a>**.
   `BCEmbedding`是由网易有道开发的双语和跨语种语义表征算法模型库，其中包含`EmbeddingModel`和`RerankerModel`两类基础模型。`EmbeddingModel`专门用于生成语义向量，在语义搜索和问答中起着关键作用，而`RerankerModel`擅长优化语义搜索结果和语义相关顺序精排。
   `BCEmbedding`作为有道的检索增强生成式应用（RAG）的基石，特别是在[QAnything](http://qanything.ai) [[github](https://github.com/netease-youdao/qanything)]中发挥着重要作用。QAnything作为一个网易有道开源项目，在有道许多产品中有很好的应用实践，比如[有道速读](https://read.youdao.com/#/home)和[有道翻译](https://fanyi.youdao.com/download-Mac?keyfrom=fanyiweb_navigation)
   `BCEmbedding`以其出色的双语和跨语种能力而著称，在语义检索中消除中英语言之间的差异，从而实现：
-  - **强大的双语和跨语种语义表征能力【<a href=#t7-1>基于MTEB的语义表征评测指标</a>】。**
-  - **基于LlamaIndex的RAG评测，表现SOTA【<a href=#t7-2>基于LlamaIndex的RAG评测指标</a>】。**
-<t id="t1"></t>
 ## 🌐 Bilingual and Crosslingual Superiority
 Existing embedding models often encounter performance challenges in bilingual and crosslingual scenarios, particularly in Chinese, English and their crosslingual tasks. `BCEmbedding`, leveraging the strength of Youdao's translation engine, excels in delivering superior performance across monolingual, bilingual, and crosslingual settings.
@@ -76,7 +75,6 @@ Existing embedding models often encounter performance challenges in bilingual an
   `EmbeddingModel`支持***中文和英文***（之后会支持更多语种）；`RerankerModel`支持***中文，英文，日文和韩文***。
-<t id="t2"></t>
 ## 💡 Key Features
 - **Bilingual and Crosslingual Proficiency**: Powered by Youdao's translation engine, excelling in Chinese, English and their crosslingual retrieval task, with upcoming support for additional languages.
@@ -95,7 +93,7 @@ Existing embedding models often encounter performance challenges in bilingual an
   - **双语和跨语种能力**：基于有道翻译引擎的强大能力，我们的`BCEmbedding`具备强大的中英双语和跨语种语义表征能力。
-  - **RAG适配**：面向RAG做了针对性优化，可以适配大多数相关任务，比如**翻译，摘要，问答**等。此外，针对**问题理解**（query understanding）也做了针对优化，详见 <a href=#t7-2>基于LlamaIndex的RAG评测指标</a>。
   - **高效且精确的语义检索**：`EmbeddingModel`采用双编码器，可以在第一阶段实现高效的语义检索。`RerankerModel`采用交叉编码器，可以在第二阶段实现更高精度的语义顺序精排。
@@ -107,18 +105,16 @@ Existing embedding models often encounter performance challenges in bilingual an
   - **产品化检验**：`BCEmbedding`已经被有道众多真实产品检验。
-<t id="t3"></t>
 ## 🚀 Latest Updates
 - ***2024-01-03***: **Model Releases** - [bce-embedding-base_v1](https://huggingface.co/maidalun1020/bce-embedding-base_v1) and [bce-reranker-base_v1](https://huggingface.co/maidalun1020/bce-reranker-base_v1) are available.
 - ***2024-01-03***: **Eval Datasets** [[CrosslingualMultiDomainsDataset](https://huggingface.co/datasets/maidalun1020/CrosslingualMultiDomainsDataset)] - Evaluate the performence of RAG, using [LlamaIndex](https://github.com/run-llama/llama_index).
-- ***2024-01-03***: **Eval Datasets** [[Details](https://github.com/netease-youdao/BCEmbedding/BCEmbedding/evaluation/c_mteb/Retrieval.py)] - Evaluate the performence of crosslingual semantic representation, using [MTEB](https://github.com/embeddings-benchmark/mteb).
   - ***2024-01-03***: **模型发布** - [bce-embedding-base_v1](https://huggingface.co/maidalun1020/bce-embedding-base_v1)和[bce-reranker-base_v1](https://huggingface.co/maidalun1020/bce-reranker-base_v1)已发布.
   - ***2024-01-03***: **RAG评测数据** [[CrosslingualMultiDomainsDataset](https://huggingface.co/datasets/maidalun1020/CrosslingualMultiDomainsDataset)] - 基于[LlamaIndex](https://github.com/run-llama/llama_index)的RAG评测数据已发布。
-  - ***2024-01-03***: **跨语种语义表征评测数据** [[详情](https://github.com/netease-youdao/BCEmbedding/BCEmbedding/evaluation/c_mteb/Retrieval.py)] - 基于[MTEB](https://github.com/embeddings-benchmark/mteb)的跨语种评测数据已发布.
-<t id="t4"></t>
 ## 🍎 Model List
 | Model Name | Model Type | Languages | Parameters | Weights |
@@ -126,7 +122,6 @@ Existing embedding models often encounter performance challenges in bilingual an
 | bce-embedding-base_v1 | `EmbeddingModel` | ch, en | 279M | [download](https://huggingface.co/maidalun1020/bce-embedding-base_v1) |
 | bce-reranker-base_v1 | `RerankerModel` | ch, en, ja, ko | 279M | [download](https://huggingface.co/maidalun1020/bce-reranker-base_v1) |
-<t id="t5"></t>
 ## 📖 Manual
 ### Installation
@@ -151,7 +146,7 @@ pip install -v -e .
 ### Quick Start
-Use `EmbeddingModel` by `BCEmbedding`, and `cls` [pooler](https://github.com/netease-youdao/BCEmbedding/BCEmbedding/models/embedding.py#L24) is default.
 ```python
 from BCEmbedding import EmbeddingModel
@@ -188,7 +183,6 @@ scores = model.compute_score(sentence_pairs)
 rerank_results = model.rerank(query, passages)
 ```
-<t id="t6"></t>
 ## ⚙️ Evaluation
 ### Evaluate Semantic Representation by MTEB
@@ -240,9 +234,9 @@ The evaluation tasks contain ***12 datastes*** of **"Reranking"**.
 #### 3. Metrics Visualization Tool
-We proveide a one-click script to sumarize evaluation results of `embedding` and `reranker` models as [Embedding Models Evaluation Summary](https://github.com/netease-youdao/BCEmbedding/Docs/EvaluationSummary/embedding_eval_summary.md) and [Reranker Models Evaluation Summary](https://github.com/netease-youdao/BCEmbedding/Docs/EvaluationSummary/reranker_eval_summary.md).
-  我们提供了`embedding`和`reranker`模型的指标可视化一键脚本，输出一个markdown文件，详见[Embedding模型指标汇总](https://github.com/netease-youdao/BCEmbedding/Docs/EvaluationSummary/embedding_eval_summary.md)和[Reranker模型指标汇总](https://github.com/netease-youdao/BCEmbedding/Docs/EvaluationSummary/reranker_eval_summary.md)。
 ```bash
 python BCEmbedding/evaluation/mteb/summarize_eval_results.py --results_dir {your_embedding_results_dir | your_reranker_results_dir}
@@ -293,12 +287,12 @@ Then, sumarize the evaluation results by:
 python BCEmbedding/tools/eval_rag/summarize_eval_results.py --results_dir results/rag_reproduce_results
 ```
-Results Reproduced from the LlamaIndex Blog can be checked in ***[Reproduced Summary of RAG Evaluation](https://github.com/netease-youdao/BCEmbedding/Docs/EvaluationSummary/rag_eval_reproduced_summary.md)***, with some obvious ***conclusions***:
 - In `WithoutReranker` setting, our `bce-embedding-base_v1` outperforms all the other embedding models.
 - With fixing the embedding model, our `bce-reranker-base_v1` achieves the best performence.
 - ***The combination of `bce-embedding-base_v1` and `bce-reranker-base_v1` is SOTA.***
-  输出的指标汇总详见 ***[LlamaIndex RAG评测结果复现](https://github.com/netease-youdao/BCEmbedding/Docs/EvaluationSummary/rag_eval_reproduced_summary.md)***。从该复现结果中，可以看出：
   - 在`WithoutReranker`设置下（**竖排对比**），`bce-embedding-base_v1`比其他embedding模型效果都要好。
   - 在固定embedding模型设置下，对比不同reranker效果（**横排对比**），`bce-reranker-base_v1`比其他reranker模型效果都要好。
   - ***`bce-embedding-base_v1`和`bce-reranker-base_v1`组合，表现SOTA。***
@@ -323,7 +317,6 @@ python BCEmbedding/tools/eval_rag/summarize_eval_results.py --results_dir result
 The summary of multiple domains evaluations can be seen in <a href=#1-multiple-domains-scenarios>Multiple Domains Scenarios</a>.
-<t id="t7"></t>
 ## 📈 Leaderboard
 ### Semantic Representation Evaluations in MTEB
@@ -344,14 +337,14 @@ The summary of multiple domains evaluations can be seen in <a href=#1-multiple-d
 ***NOTE:***
 - Our ***bce-embedding-base_v1*** outperforms other opensource embedding models with various model size.
 - ***114 datastes*** of **"Retrieval", "STS", "PairClassification", "Classification", "Reranking" and "Clustering"** in `["en", "zh", "en-zh", "zh-en"]` setting.
-- The [crosslingual evaluation datasets](https://github.com/netease-youdao/BCEmbedding/BCEmbedding/evaluation/c_mteb/Retrieval.py) we released belong to `Retrieval` task.
-- More evaluation details please check [Embedding Models Evaluation Summary](https://github.com/netease-youdao/BCEmbedding/Docs/EvaluationSummary/embedding_eval_summary.md).
   ***要点：***
   - 对比所有开源的各种规模的embedding模型，***bce-embedding-base_v1*** 表现最好。
   - 评测包含 **"Retrieval"， "STS"， "PairClassification"， "Classification"， "Reranking"和"Clustering"** 这六大类任务的共 ***114个数据集***。
-  - 我们开源的[跨语种语义表征评测数据](https://github.com/netease-youdao/BCEmbedding/BCEmbedding/evaluation/c_mteb/Retrieval.py)属于`Retrieval`任务。
-  - 更详细的评测结果详见[Embedding模型指标汇总](https://github.com/netease-youdao/BCEmbedding/Docs/EvaluationSummary/embedding_eval_summary.md)。
 #### 2. Reranker Models
@@ -364,12 +357,12 @@ The summary of multiple domains evaluations can be seen in <a href=#1-multiple-d
 ***NOTE:***
 - Our ***bce-reranker-base_v1*** outperforms other opensource reranker models.
 - ***12 datastes*** of **"Reranking"** in `["en", "zh", "en-zh", "zh-en"]` setting.
-- More evaluation details please check [Reranker Models Evaluation Summary](https://github.com/netease-youdao/BCEmbedding/Docs/EvaluationSummary/reranker_eval_summary.md).
   ***要点：***
   - ***bce-reranker-base_v1*** 优于其他开源reranker模型。
   - 评测包含 **"Reranking"** 任务的 ***12个数据集***。
-  - 更详细的评测结果详见[Reranker模型指标汇总](https://github.com/netease-youdao/BCEmbedding/Docs/EvaluationSummary/reranker_eval_summary.md)
 ### RAG Evaluations in LlamaIndex
@@ -381,8 +374,9 @@ The summary of multiple domains evaluations can be seen in <a href=#1-multiple-d
 | bge-large-en-v1.5 | 52.67/34.69 | 64.59/52.11 | 64.71/52.05 | **65.36/55.50** |
 | bge-large-zh-v1.5 | 69.81/47.38 | 79.37/62.13 | 80.11/63.95 | **81.19/68.50** |
 | llm-embedder | 50.85/33.26 | 63.62/51.45 | 63.54/51.32 | **64.47/54.98** |
-| CohereV3 | 53.10/35.39 | 65.75/52.80 | 66.29/53.31 | **66.91/56.93** |
-| JinaAI-Base | 50.27/32.31 | 63.97/51.10 | 64.28/51.83 | **64.82/54.98** |
 | ***bce-embedding-base_v1*** | **85.91/62.36** | **91.25/69.38** | **91.80/71.13** | ***93.46/77.02*** |
 ***NOTE:***
@@ -395,23 +389,20 @@ The summary of multiple domains evaluations can be seen in <a href=#1-multiple-d
   - 在固定Embedding模型设置下，对比不同reranker效果（**横排对比**），`bce-reranker-base_v1`比其他reranker模型效果都要好，包括开源和闭源。
   - ***`bce-embedding-base_v1`和`bce-reranker-base_v1`组合，表现SOTA。***
-<t id="t8"></t>
 ## 🛠 Youdao's BCEmbedding API
 For users who prefer a hassle-free experience without the need to download and configure the model on their own systems, `BCEmbedding` is readily accessible through Youdao's API. This option offers a streamlined and efficient way to integrate BCEmbedding into your projects, bypassing the complexities of manual setup and maintenance. Detailed instructions and comprehensive API documentation are available at [Youdao BCEmbedding API](https://ai.youdao.com/DOCSIRMA/html/aigc/api/embedding/index.html). Here, you'll find all the necessary guidance to easily implement `BCEmbedding` across a variety of use cases, ensuring a smooth and effective integration for optimal results.
   对于那些更喜欢直接调用api的用户，有道提供方便的`BCEmbedding`调用api。该方式是一种简化和高效的方式，将`BCEmbedding`集成到您的项目中，避开了手动设置和系统维护的复杂性。更详细的api调用接口说明详见[有道BCEmbedding API](https://ai.youdao.com/DOCSIRMA/html/aigc/api/embedding/index.html)。
-<t id="t9"></t>
 ## 🧲 WeChat Group
 Welcome to scan the QR code below and join the WeChat group.
   欢迎大家扫码加入官方微信交流群。
-<img src="https://github.com/netease-youdao/BCEmbedding/Docs/assets/Wechat.jpg" width="20%" height="auto">
-<t id="t10"></t>
 ## ✏️ Citation
 If you use `BCEmbedding` in your research or project, please feel free to cite and star it:
@@ -427,12 +418,10 @@ If you use `BCEmbedding` in your research or project, please feel free to cite a
 }
 ```
-<t id="t11"></t>
 ## 🔐 License
-`BCEmbedding` is licensed under [Apache 2.0 License](https://github.com/netease-youdao/BCEmbedding/LICENSE)
-<t id="t12"></t>
 ## 🔗 Related Links
 [Netease Youdao - QAnything](https://github.com/netease-youdao/qanything)

 <h1 align="center">BCEmbedding: Bilingual and Crosslingual Embedding for RAG</h1>
 <p align="center">
+  <a href="https://github.com/netease-youdao/BCEmbedding/blob/master/LICENSE">
     <img src="https://img.shields.io/badge/license-Apache--2.0-yellow">
   </a>
   <a href="https://twitter.com/YDopensource">
 <details open="open">
 <summary>Click to Open Contents</summary>
+- <a href="#-bilingual-and-crosslingual-superiority" target="_Self">🌐 Bilingual and Crosslingual Superiority</a>
+- <a href="#-key-features" target="_Self">💡 Key Features</a>
+- <a href="#-latest-updates" target="_Self">🚀 Latest Updates</a>
+- <a href="#-model-list" target="_Self">🍎 Model List</a>
+- <a href="#-manual" target="_Self">📖 Manual</a>
+  - <a href="#installation" target="_Self">Installation</a>
+  - <a href="#quick-start" target="_Self">Quick Start</a>
+- <a href="#%EF%B8%8F-evaluation" target="_Self">⚙️ Evaluation</a>
+  - <a href="#evaluate-semantic-representation-by-mteb" target="_Self">Evaluate Semantic Representation by MTEB</a>
+  - <a href="#evaluate-rag-by-llamaindex" target="_Self">Evaluate RAG by LlamaIndex</a>
+- <a href="#-leaderboard" target="_Self">📈 Leaderboard</a>
+  - <a href="#semantic-representation-evaluations-in-mteb" target="_Self">Semantic Representation Evaluations in MTEB</a>
+  - <a href="#rag-evaluations-in-llamaindex" target="_Self">RAG Evaluations in LlamaIndex</a>
+- <a href="#-youdaos-bcembedding-api" target="_Self">🛠 Youdao's BCEmbedding API</a>
+- <a href="#-wechat-group" target="_Self">🧲 WeChat Group</a>
+- <a href="#%EF%B8%8F-citation" target="_Self">✏️ Citation</a>
+- <a href="#-license" target="_Self">🔐 License</a>
+- <a href="#-related-links" target="_Self">🔗 Related Links</a>
 </details>
 <br>
 `BCEmbedding` serves as the cornerstone of Youdao's Retrieval Augmented Generation (RAG) implmentation, notably [QAnything](http://qanything.ai) [[github](https://github.com/netease-youdao/qanything)], an open-source implementation widely integrated in various Youdao products like [Youdao Speed Reading](https://read.youdao.com/#/home) and [Youdao Translation](https://fanyi.youdao.com/download-Mac?keyfrom=fanyiweb_navigation).
 Distinguished for its bilingual and crosslingual proficiency, `BCEmbedding` excels in bridging Chinese and English linguistic gaps, which achieves
+- **A high performence on <a href="#semantic-representation-evaluations-in-mteb">Semantic Representation Evaluations in MTEB</a>**;
+- **A new benchmark in the realm of <a href="#rag-evaluations-in-llamaindex">RAG Evaluations in LlamaIndex</a>**.
   `BCEmbedding`是由网易有道开发的双语和跨语种语义表征算法模型库，其中包含`EmbeddingModel`和`RerankerModel`两类基础模型。`EmbeddingModel`专门用于生成语义向量，在语义搜索和问答中起着关键作用，而`RerankerModel`擅长优化语义搜索结果和语义相关顺序精排。
   `BCEmbedding`作为有道的检索增强生成式应用（RAG）的基石，特别是在[QAnything](http://qanything.ai) [[github](https://github.com/netease-youdao/qanything)]中发挥着重要作用。QAnything作为一个网易有道开源项目，在有道许多产品中有很好的应用实践，比如[有道速读](https://read.youdao.com/#/home)和[有道翻译](https://fanyi.youdao.com/download-Mac?keyfrom=fanyiweb_navigation)
   `BCEmbedding`以其出色的双语和跨语种能力而著称，在语义检索中消除中英语言之间的差异，从而实现：
+  - **强大的双语和跨语种语义表征能力【<a href="#semantic-representation-evaluations-in-mteb">基于MTEB的语义表征评测指标</a>】。**
+  - **基于LlamaIndex的RAG评测，表现SOTA【<a href="#rag-evaluations-in-llamaindex">基于LlamaIndex的RAG评测指标</a>】。**
 ## 🌐 Bilingual and Crosslingual Superiority
 Existing embedding models often encounter performance challenges in bilingual and crosslingual scenarios, particularly in Chinese, English and their crosslingual tasks. `BCEmbedding`, leveraging the strength of Youdao's translation engine, excels in delivering superior performance across monolingual, bilingual, and crosslingual settings.
   `EmbeddingModel`支持***中文和英文***（之后会支持更多语种）；`RerankerModel`支持***中文，英文，日文和韩文***。
 ## 💡 Key Features
 - **Bilingual and Crosslingual Proficiency**: Powered by Youdao's translation engine, excelling in Chinese, English and their crosslingual retrieval task, with upcoming support for additional languages.
   - **双语和跨语种能力**：基于有道翻译引擎的强大能力，我们的`BCEmbedding`具备强大的中英双语和跨语种语义表征能力。
+  - **RAG适配**：面向RAG做了针对性优化，可以适配大多数相关任务，比如**翻译，摘要，问答**等。此外，针对**问题理解**（query understanding）也做了针对优化，详见 <a href="#rag-evaluations-in-llamaindex">基于LlamaIndex的RAG评测指标</a>。
   - **高效且精确的语义检索**：`EmbeddingModel`采用双编码器，可以在第一阶段实现高效的语义检索。`RerankerModel`采用交叉编码器，可以在第二阶段实现更高精度的语义顺序精排。
   - **产品化检验**：`BCEmbedding`已经被有道众多真实产品检验。
 ## 🚀 Latest Updates
 - ***2024-01-03***: **Model Releases** - [bce-embedding-base_v1](https://huggingface.co/maidalun1020/bce-embedding-base_v1) and [bce-reranker-base_v1](https://huggingface.co/maidalun1020/bce-reranker-base_v1) are available.
 - ***2024-01-03***: **Eval Datasets** [[CrosslingualMultiDomainsDataset](https://huggingface.co/datasets/maidalun1020/CrosslingualMultiDomainsDataset)] - Evaluate the performence of RAG, using [LlamaIndex](https://github.com/run-llama/llama_index).
+- ***2024-01-03***: **Eval Datasets** [[Details](https://github.com/netease-youdao/BCEmbedding/blob/master/BCEmbedding/evaluation/c_mteb/Retrieval.py)] - Evaluate the performence of crosslingual semantic representation, using [MTEB](https://github.com/embeddings-benchmark/mteb).
   - ***2024-01-03***: **模型发布** - [bce-embedding-base_v1](https://huggingface.co/maidalun1020/bce-embedding-base_v1)和[bce-reranker-base_v1](https://huggingface.co/maidalun1020/bce-reranker-base_v1)已发布.
   - ***2024-01-03***: **RAG评测数据** [[CrosslingualMultiDomainsDataset](https://huggingface.co/datasets/maidalun1020/CrosslingualMultiDomainsDataset)] - 基于[LlamaIndex](https://github.com/run-llama/llama_index)的RAG评测数据已发布。
+  - ***2024-01-03***: **跨语种语义表征评测数据** [[详情](https://github.com/netease-youdao/BCEmbedding/blob/master/BCEmbedding/evaluation/c_mteb/Retrieval.py)] - 基于[MTEB](https://github.com/embeddings-benchmark/mteb)的跨语种评测数据已发布.
 ## 🍎 Model List
 | Model Name | Model Type | Languages | Parameters | Weights |
 | bce-embedding-base_v1 | `EmbeddingModel` | ch, en | 279M | [download](https://huggingface.co/maidalun1020/bce-embedding-base_v1) |
 | bce-reranker-base_v1 | `RerankerModel` | ch, en, ja, ko | 279M | [download](https://huggingface.co/maidalun1020/bce-reranker-base_v1) |
 ## 📖 Manual
 ### Installation
 ### Quick Start
+Use `EmbeddingModel` by `BCEmbedding`, and `cls` [pooler](https://github.com/netease-youdao/BCEmbedding/blob/master/BCEmbedding/models/embedding.py#L24) is default.
 ```python
 from BCEmbedding import EmbeddingModel
 rerank_results = model.rerank(query, passages)
 ```
 ## ⚙️ Evaluation
 ### Evaluate Semantic Representation by MTEB
 #### 3. Metrics Visualization Tool
+We proveide a one-click script to sumarize evaluation results of `embedding` and `reranker` models as [Embedding Models Evaluation Summary](https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/EvaluationSummary/embedding_eval_summary.md) and [Reranker Models Evaluation Summary](https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/EvaluationSummary/reranker_eval_summary.md).
+  我们提供了`embedding`和`reranker`模型的指标可视化一键脚本，输出一个markdown文件，详见[Embedding模型指标汇总](https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/EvaluationSummary/embedding_eval_summary.md)和[Reranker模型指标汇总](https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/EvaluationSummary/reranker_eval_summary.md)。
 ```bash
 python BCEmbedding/evaluation/mteb/summarize_eval_results.py --results_dir {your_embedding_results_dir | your_reranker_results_dir}
 python BCEmbedding/tools/eval_rag/summarize_eval_results.py --results_dir results/rag_reproduce_results
 ```
+Results Reproduced from the LlamaIndex Blog can be checked in ***[Reproduced Summary of RAG Evaluation](https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/EvaluationSummary/rag_eval_reproduced_summary.md)***, with some obvious ***conclusions***:
 - In `WithoutReranker` setting, our `bce-embedding-base_v1` outperforms all the other embedding models.
 - With fixing the embedding model, our `bce-reranker-base_v1` achieves the best performence.
 - ***The combination of `bce-embedding-base_v1` and `bce-reranker-base_v1` is SOTA.***
+  输出的指标汇总详见 ***[LlamaIndex RAG评测结果复现](https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/EvaluationSummary/rag_eval_reproduced_summary.md)***。从该复现结果中，可以看出：
   - 在`WithoutReranker`设置下（**竖排对比**），`bce-embedding-base_v1`比其他embedding模型效果都要好。
   - 在固定embedding模型设置下，对比不同reranker效果（**横排对比**），`bce-reranker-base_v1`比其他reranker模型效果都要好。
   - ***`bce-embedding-base_v1`和`bce-reranker-base_v1`组合，表现SOTA。***
 The summary of multiple domains evaluations can be seen in <a href=#1-multiple-domains-scenarios>Multiple Domains Scenarios</a>.
 ## 📈 Leaderboard
 ### Semantic Representation Evaluations in MTEB
 ***NOTE:***
 - Our ***bce-embedding-base_v1*** outperforms other opensource embedding models with various model size.
 - ***114 datastes*** of **"Retrieval", "STS", "PairClassification", "Classification", "Reranking" and "Clustering"** in `["en", "zh", "en-zh", "zh-en"]` setting.
+- The [crosslingual evaluation datasets](https://github.com/netease-youdao/BCEmbedding/blob/master/BCEmbedding/evaluation/c_mteb/Retrieval.py) we released belong to `Retrieval` task.
+- More evaluation details please check [Embedding Models Evaluation Summary](https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/EvaluationSummary/embedding_eval_summary.md).
   ***要点：***
   - 对比所有开源的各种规模的embedding模型，***bce-embedding-base_v1*** 表现最好。
   - 评测包含 **"Retrieval"， "STS"， "PairClassification"， "Classification"， "Reranking"和"Clustering"** 这六大类任务的共 ***114个数据集***。
+  - 我们开源的[跨语种语义表征评测数据](https://github.com/netease-youdao/BCEmbedding/blob/master/BCEmbedding/evaluation/c_mteb/Retrieval.py)属于`Retrieval`任务。
+  - 更详细的评测结果详见[Embedding模型指标汇总](https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/EvaluationSummary/embedding_eval_summary.md)。
 #### 2. Reranker Models
 ***NOTE:***
 - Our ***bce-reranker-base_v1*** outperforms other opensource reranker models.
 - ***12 datastes*** of **"Reranking"** in `["en", "zh", "en-zh", "zh-en"]` setting.
+- More evaluation details please check [Reranker Models Evaluation Summary](https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/EvaluationSummary/reranker_eval_summary.md).
   ***要点：***
   - ***bce-reranker-base_v1*** 优于其他开源reranker模型。
   - 评测包含 **"Reranking"** 任务的 ***12个数据集***。
+  - 更详细的评测结果详见[Reranker模型指标汇总](https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/EvaluationSummary/reranker_eval_summary.md)
 ### RAG Evaluations in LlamaIndex
 | bge-large-en-v1.5 | 52.67/34.69 | 64.59/52.11 | 64.71/52.05 | **65.36/55.50** |
 | bge-large-zh-v1.5 | 69.81/47.38 | 79.37/62.13 | 80.11/63.95 | **81.19/68.50** |
 | llm-embedder | 50.85/33.26 | 63.62/51.45 | 63.54/51.32 | **64.47/54.98** |
+| CohereV3-en | 53.10/35.39 | 65.75/52.80 | 66.29/53.31 | **66.91/56.93** |
+| CohereV3-multilingual | 79.80/57.22 | 86.34/66.62 | 86.76/68.56 | **88.35/73.73** |
+| JinaAI-v2-Base-en | 50.27/32.31 | 63.97/51.10 | 64.28/51.83 | **64.82/54.98** |
 | ***bce-embedding-base_v1*** | **85.91/62.36** | **91.25/69.38** | **91.80/71.13** | ***93.46/77.02*** |
 ***NOTE:***
   - 在固定Embedding模型设置下，对比不同reranker效果（**横排对比**），`bce-reranker-base_v1`比其他reranker模型效果都要好，包括开源和闭源。
   - ***`bce-embedding-base_v1`和`bce-reranker-base_v1`组合，表现SOTA。***
 ## 🛠 Youdao's BCEmbedding API
 For users who prefer a hassle-free experience without the need to download and configure the model on their own systems, `BCEmbedding` is readily accessible through Youdao's API. This option offers a streamlined and efficient way to integrate BCEmbedding into your projects, bypassing the complexities of manual setup and maintenance. Detailed instructions and comprehensive API documentation are available at [Youdao BCEmbedding API](https://ai.youdao.com/DOCSIRMA/html/aigc/api/embedding/index.html). Here, you'll find all the necessary guidance to easily implement `BCEmbedding` across a variety of use cases, ensuring a smooth and effective integration for optimal results.
   对于那些更喜欢直接调用api的用户，有道提供方便的`BCEmbedding`调用api。该方式是一种简化和高效的方式，将`BCEmbedding`集成到您的项目中，避开了手动设置和系统维护的复杂性。更详细的api调用接口说明详见[有道BCEmbedding API](https://ai.youdao.com/DOCSIRMA/html/aigc/api/embedding/index.html)。
 ## 🧲 WeChat Group
 Welcome to scan the QR code below and join the WeChat group.
   欢迎大家扫码加入官方微信交流群。
+<img src="https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/assets/Wechat.jpg" width="20%" height="auto">
 ## ✏️ Citation
 If you use `BCEmbedding` in your research or project, please feel free to cite and star it:
 }
 ```
 ## 🔐 License
+`BCEmbedding` is licensed under [Apache 2.0 License](https://github.com/netease-youdao/BCEmbedding/blob/master/LICENSE)
 ## 🔗 Related Links
 [Netease Youdao - QAnything](https://github.com/netease-youdao/qanything)