Alibaba-NLP
/

gte-large-en-v1.5

@@ -2627,7 +2627,7 @@ We also present the [`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-N
 | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo |
 |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: |
 |[`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct)| English | 7720 | 32768 | 4096 | 67.34 | 87.57 |
-|[`gte-large-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) | English | 409 | 8192 | 1024 | 65.39 | 86.71 |
 |[`gte-base-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) | English | 137 | 8192 | 768 | 64.11 | 87.44 |
@@ -2673,7 +2673,7 @@ from sentence_transformers.util import cos_sim
 sentences = ['That is a happy person', 'That is a very happy person']
-model = SentenceTransformer('Alibaba-NLP/gte-large-en-v1.5')
 embeddings = model.encode(sentences)
 print(cos_sim(embeddings[0], embeddings[1]))
 ```
@@ -2688,6 +2688,11 @@ print(cos_sim(embeddings[0], embeddings[1]))
 ### Training Procedure
 - MLM-512: lr 2e-4, mlm_probability 0.3, batch_size 4096, num_steps 300000, rope_base 10000
 - MLM-2048: lr 5e-5, mlm_probability 0.3, batch_size 4096, num_steps 30000, rope_base 10000
 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 30000, rope_base 160000
@@ -2700,7 +2705,9 @@ print(cos_sim(embeddings[0], embeddings[1]))
 ### MTEB
-The gte results setting: `mteb==1.2.0, fp16 auto mix precision, max_length=8192`, and set ntk scaling factor to 2 (equivalent to rope_base * 2).
 | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) |
 |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|

 | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo |
 |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: |
 |[`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct)| English | 7720 | 32768 | 4096 | 67.34 | 87.57 |
+|[`gte-large-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) | English | 434 | 8192 | 1024 | 65.39 | 86.71 |
 |[`gte-base-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) | English | 137 | 8192 | 768 | 64.11 | 87.44 |
 sentences = ['That is a happy person', 'That is a very happy person']
+model = SentenceTransformer('Alibaba-NLP/gte-large-en-v1.5', trust_remote_code=True)
 embeddings = model.encode(sentences)
 print(cos_sim(embeddings[0], embeddings[1]))
 ```
 ### Training Procedure
+To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy.
+The model first undergoes preliminary MLM pre-training on shorter lengths.
+And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training.
+The entire training process is as follows:
 - MLM-512: lr 2e-4, mlm_probability 0.3, batch_size 4096, num_steps 300000, rope_base 10000
 - MLM-2048: lr 5e-5, mlm_probability 0.3, batch_size 4096, num_steps 30000, rope_base 10000
 - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 30000, rope_base 160000
 ### MTEB
+The results of other models are retrieved from [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
+The gte evaluation setting: `mteb==1.2.0, fp16 auto mix precision, max_length=8192`, and set ntk scaling factor to 2 (equivalent to rope_base * 2).
 | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) |
 |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|