Upload README.md
Browse files
README.md
CHANGED
@@ -1061,7 +1061,7 @@ language:
|
|
1061 |
license: mit
|
1062 |
---
|
1063 |
|
1064 |
-
# gte-
|
1065 |
|
1066 |
General Text Embeddings (GTE) model. [Towards General Text Embeddings with Multi-stage Contrastive Learning](https://arxiv.org/abs/2308.03281)
|
1067 |
|
@@ -1084,6 +1084,25 @@ The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the
|
|
1084 |
We compared the performance of the GTE models with other popular text embedding models on the MTEB (CMTEB for Chinese language) benchmark. For more detailed comparison results, please refer to the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
|
1085 |
|
1086 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1087 |
## Usage
|
1088 |
|
1089 |
Code example
|
@@ -1116,6 +1135,7 @@ print(scores.tolist())
|
|
1116 |
```
|
1117 |
|
1118 |
Use with sentence-transformers:
|
|
|
1119 |
```python
|
1120 |
from sentence_transformers import SentenceTransformer
|
1121 |
from sentence_transformers.util import cos_sim
|
|
|
1061 |
license: mit
|
1062 |
---
|
1063 |
|
1064 |
+
# gte-base-zh
|
1065 |
|
1066 |
General Text Embeddings (GTE) model. [Towards General Text Embeddings with Multi-stage Contrastive Learning](https://arxiv.org/abs/2308.03281)
|
1067 |
|
|
|
1084 |
We compared the performance of the GTE models with other popular text embedding models on the MTEB (CMTEB for Chinese language) benchmark. For more detailed comparison results, please refer to the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
|
1085 |
|
1086 |
|
1087 |
+
- Evaluation results on CMTEB
|
1088 |
+
|
1089 |
+
| Model | Model Size (GB) | Embedding Dimensions | Sequence Length | Average (35 datasets) | Classification (9 datasets) | Clustering (4 datasets) | Pair Classification (2 datasets) | Reranking (4 datasets) | Retrieval (8 datasets) | STS (8 datasets) |
|
1090 |
+
| ------------------- | -------------- | -------------------- | ---------------- | --------------------- | ------------------------------------ | ------------------------------ | --------------------------------------- | ------------------------------ | ---------------------------- | ------------------------ |
|
1091 |
+
| **gte-large-zh** | 0.65 | 1024 | 512 | **66.72** | 71.34 | 53.07 | 81.14 | 67.42 | 72.49 | 57.82 |
|
1092 |
+
| gte-base-zh | 0.20 | 768 | 512 | 65.92 | 71.26 | 53.86 | 80.44 | 67.00 | 71.71 | 55.96 |
|
1093 |
+
| stella-large-zh-v2 | 0.65 | 1024 | 1024 | 65.13 | 69.05 | 49.16 | 82.68 | 66.41 | 70.14 | 58.66 |
|
1094 |
+
| stella-large-zh | 0.65 | 1024 | 1024 | 64.54 | 67.62 | 48.65 | 78.72 | 65.98 | 71.02 | 58.3 |
|
1095 |
+
| bge-large-zh-v1.5 | 1.3 | 1024 | 512 | 64.53 | 69.13 | 48.99 | 81.6 | 65.84 | 70.46 | 56.25 |
|
1096 |
+
| stella-base-zh-v2 | 0.21 | 768 | 1024 | 64.36 | 68.29 | 49.4 | 79.96 | 66.1 | 70.08 | 56.92 |
|
1097 |
+
| stella-base-zh | 0.21 | 768 | 1024 | 64.16 | 67.77 | 48.7 | 76.09 | 66.95 | 71.07 | 56.54 |
|
1098 |
+
| piccolo-large-zh | 0.65 | 1024 | 512 | 64.11 | 67.03 | 47.04 | 78.38 | 65.98 | 70.93 | 58.02 |
|
1099 |
+
| piccolo-base-zh | 0.2 | 768 | 512 | 63.66 | 66.98 | 47.12 | 76.61 | 66.68 | 71.2 | 55.9 |
|
1100 |
+
| gte-small-zh | 0.1 | 512 | 512 | 60.08 | 64.49 | 48.95 | 69.99 | 66.21 | 65.50 | 49.72 |
|
1101 |
+
| bge-small-zh-v1.5 | 0.1 | 512 | 512 | 57.82 | 63.96 | 44.18 | 70.4 | 60.92 | 61.77 | 49.1 |
|
1102 |
+
| m3e-base | 0.41 | 768 | 512 | 57.79 | 67.52 | 47.68 | 63.99 | 59.54| 56.91 | 50.47 |
|
1103 |
+
|text-embedding-ada-002(openai) | - | 1536| 8192 | 53.02 | 64.31 | 45.68 | 69.56 | 54.28 | 52.0 | 43.35 |
|
1104 |
+
|
1105 |
+
|
1106 |
## Usage
|
1107 |
|
1108 |
Code example
|
|
|
1135 |
```
|
1136 |
|
1137 |
Use with sentence-transformers:
|
1138 |
+
|
1139 |
```python
|
1140 |
from sentence_transformers import SentenceTransformer
|
1141 |
from sentence_transformers.util import cos_sim
|