thenlper commited on
Commit
7c27bfe
1 Parent(s): c153a79

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -1061,7 +1061,7 @@ language:
1061
  license: mit
1062
  ---
1063
 
1064
- # gte-large-zh
1065
 
1066
  General Text Embeddings (GTE) model. [Towards General Text Embeddings with Multi-stage Contrastive Learning](https://arxiv.org/abs/2308.03281)
1067
 
@@ -1084,6 +1084,25 @@ The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the
1084
  We compared the performance of the GTE models with other popular text embedding models on the MTEB (CMTEB for Chinese language) benchmark. For more detailed comparison results, please refer to the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
1085
 
1086
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1087
  ## Usage
1088
 
1089
  Code example
@@ -1116,6 +1135,7 @@ print(scores.tolist())
1116
  ```
1117
 
1118
  Use with sentence-transformers:
 
1119
  ```python
1120
  from sentence_transformers import SentenceTransformer
1121
  from sentence_transformers.util import cos_sim
 
1061
  license: mit
1062
  ---
1063
 
1064
+ # gte-base-zh
1065
 
1066
  General Text Embeddings (GTE) model. [Towards General Text Embeddings with Multi-stage Contrastive Learning](https://arxiv.org/abs/2308.03281)
1067
 
 
1084
  We compared the performance of the GTE models with other popular text embedding models on the MTEB (CMTEB for Chinese language) benchmark. For more detailed comparison results, please refer to the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
1085
 
1086
 
1087
+ - Evaluation results on CMTEB
1088
+
1089
+ | Model | Model Size (GB) | Embedding Dimensions | Sequence Length | Average (35 datasets) | Classification (9 datasets) | Clustering (4 datasets) | Pair Classification (2 datasets) | Reranking (4 datasets) | Retrieval (8 datasets) | STS (8 datasets) |
1090
+ | ------------------- | -------------- | -------------------- | ---------------- | --------------------- | ------------------------------------ | ------------------------------ | --------------------------------------- | ------------------------------ | ---------------------------- | ------------------------ |
1091
+ | **gte-large-zh** | 0.65 | 1024 | 512 | **66.72** | 71.34 | 53.07 | 81.14 | 67.42 | 72.49 | 57.82 |
1092
+ | gte-base-zh | 0.20 | 768 | 512 | 65.92 | 71.26 | 53.86 | 80.44 | 67.00 | 71.71 | 55.96 |
1093
+ | stella-large-zh-v2 | 0.65 | 1024 | 1024 | 65.13 | 69.05 | 49.16 | 82.68 | 66.41 | 70.14 | 58.66 |
1094
+ | stella-large-zh | 0.65 | 1024 | 1024 | 64.54 | 67.62 | 48.65 | 78.72 | 65.98 | 71.02 | 58.3 |
1095
+ | bge-large-zh-v1.5 | 1.3 | 1024 | 512 | 64.53 | 69.13 | 48.99 | 81.6 | 65.84 | 70.46 | 56.25 |
1096
+ | stella-base-zh-v2 | 0.21 | 768 | 1024 | 64.36 | 68.29 | 49.4 | 79.96 | 66.1 | 70.08 | 56.92 |
1097
+ | stella-base-zh | 0.21 | 768 | 1024 | 64.16 | 67.77 | 48.7 | 76.09 | 66.95 | 71.07 | 56.54 |
1098
+ | piccolo-large-zh | 0.65 | 1024 | 512 | 64.11 | 67.03 | 47.04 | 78.38 | 65.98 | 70.93 | 58.02 |
1099
+ | piccolo-base-zh | 0.2 | 768 | 512 | 63.66 | 66.98 | 47.12 | 76.61 | 66.68 | 71.2 | 55.9 |
1100
+ | gte-small-zh | 0.1 | 512 | 512 | 60.08 | 64.49 | 48.95 | 69.99 | 66.21 | 65.50 | 49.72 |
1101
+ | bge-small-zh-v1.5 | 0.1 | 512 | 512 | 57.82 | 63.96 | 44.18 | 70.4 | 60.92 | 61.77 | 49.1 |
1102
+ | m3e-base | 0.41 | 768 | 512 | 57.79 | 67.52 | 47.68 | 63.99 | 59.54| 56.91 | 50.47 |
1103
+ |text-embedding-ada-002(openai) | - | 1536| 8192 | 53.02 | 64.31 | 45.68 | 69.56 | 54.28 | 52.0 | 43.35 |
1104
+
1105
+
1106
  ## Usage
1107
 
1108
  Code example
 
1135
  ```
1136
 
1137
  Use with sentence-transformers:
1138
+
1139
  ```python
1140
  from sentence_transformers import SentenceTransformer
1141
  from sentence_transformers.util import cos_sim