izhx commited on
Commit
124e8cc
1 Parent(s): 73f70c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -2627,7 +2627,7 @@ We also present the [`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-N
2627
  | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo |
2628
  |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: |
2629
  |[`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct)| English | 7720 | 32768 | 4096 | 67.34 | 87.57 |
2630
- |[`gte-large-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) | English | 409 | 8192 | 1024 | 65.39 | 86.71 |
2631
  |[`gte-base-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) | English | 137 | 8192 | 768 | 64.11 | 87.44 |
2632
 
2633
 
@@ -2673,7 +2673,7 @@ from sentence_transformers.util import cos_sim
2673
 
2674
  sentences = ['That is a happy person', 'That is a very happy person']
2675
 
2676
- model = SentenceTransformer('Alibaba-NLP/gte-large-en-v1.5')
2677
  embeddings = model.encode(sentences)
2678
  print(cos_sim(embeddings[0], embeddings[1]))
2679
  ```
@@ -2688,6 +2688,11 @@ print(cos_sim(embeddings[0], embeddings[1]))
2688
 
2689
  ### Training Procedure
2690
 
 
 
 
 
 
2691
  - MLM-512: lr 2e-4, mlm_probability 0.3, batch_size 4096, num_steps 300000, rope_base 10000
2692
  - MLM-2048: lr 5e-5, mlm_probability 0.3, batch_size 4096, num_steps 30000, rope_base 10000
2693
  - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 30000, rope_base 160000
@@ -2700,7 +2705,9 @@ print(cos_sim(embeddings[0], embeddings[1]))
2700
 
2701
  ### MTEB
2702
 
2703
- The gte results setting: `mteb==1.2.0, fp16 auto mix precision, max_length=8192`, and set ntk scaling factor to 2 (equivalent to rope_base * 2).
 
 
2704
 
2705
  | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) |
2706
  |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
 
2627
  | Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo |
2628
  |:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: |
2629
  |[`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct)| English | 7720 | 32768 | 4096 | 67.34 | 87.57 |
2630
+ |[`gte-large-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) | English | 434 | 8192 | 1024 | 65.39 | 86.71 |
2631
  |[`gte-base-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) | English | 137 | 8192 | 768 | 64.11 | 87.44 |
2632
 
2633
 
 
2673
 
2674
  sentences = ['That is a happy person', 'That is a very happy person']
2675
 
2676
+ model = SentenceTransformer('Alibaba-NLP/gte-large-en-v1.5', trust_remote_code=True)
2677
  embeddings = model.encode(sentences)
2678
  print(cos_sim(embeddings[0], embeddings[1]))
2679
  ```
 
2688
 
2689
  ### Training Procedure
2690
 
2691
+ To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy.
2692
+ The model first undergoes preliminary MLM pre-training on shorter lengths.
2693
+ And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training.
2694
+
2695
+ The entire training process is as follows:
2696
  - MLM-512: lr 2e-4, mlm_probability 0.3, batch_size 4096, num_steps 300000, rope_base 10000
2697
  - MLM-2048: lr 5e-5, mlm_probability 0.3, batch_size 4096, num_steps 30000, rope_base 10000
2698
  - MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 30000, rope_base 160000
 
2705
 
2706
  ### MTEB
2707
 
2708
+ The results of other models are retrieved from [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
2709
+
2710
+ The gte evaluation setting: `mteb==1.2.0, fp16 auto mix precision, max_length=8192`, and set ntk scaling factor to 2 (equivalent to rope_base * 2).
2711
 
2712
  | Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) |
2713
  |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|