thenlper commited on
Commit
5bf056c
1 Parent(s): e7ca9d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -2
README.md CHANGED
@@ -1,5 +1,10 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
3
  ---
4
 
5
  ## gte-multilingual-reranker-base
@@ -23,8 +28,10 @@ Using Huggingface transformers (transformers>=4.36.0)
23
  import torch
24
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
25
 
26
- tokenizer = AutoTokenizer.from_pretrained('Alibaba-NLP/gte-multilingual-reranker-base')
27
- model = AutoModelForSequenceClassification.from_pretrained('Alibaba-NLP/gte-multilingual-reranker-base', trust_remote_code=True)
 
 
28
  model.eval()
29
 
30
  pairs = [["中国的首都在哪儿","北京"], ["what is the capital of China?", "北京"], ["how to implement quick sort in python?","Introduction of quick sort"]]
@@ -32,11 +39,31 @@ with torch.no_grad():
32
  inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
33
  scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
34
  print(scores)
 
 
35
  ```
36
 
37
  ### How to use it offline
38
  Refer to [Disable trust_remote_code](https://huggingface.co/Alibaba-NLP/new-impl/discussions/2#662b08d04d8c3d0a09c88fa3)
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ## Citation
41
  ```
42
  @misc{zhang2024mgtegeneralizedlongcontexttext,
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: text-classification
4
+ tags:
5
+ - transformers
6
+ - sentence-transformers
7
+ - text-embeddings-inference
8
  ---
9
 
10
  ## gte-multilingual-reranker-base
 
28
  import torch
29
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
30
 
31
+ model_name_or_path = "Alibaba-NLP/gte-multilingual-reranker-base"
32
+
33
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
34
+ model = AutoModelForSequenceClassification.from_pretrained(model_name_or_path, trust_remote_code=True)
35
  model.eval()
36
 
37
  pairs = [["中国的首都在哪儿","北京"], ["what is the capital of China?", "北京"], ["how to implement quick sort in python?","Introduction of quick sort"]]
 
39
  inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
40
  scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
41
  print(scores)
42
+
43
+ # tensor([1.2315, 0.5923, 0.3041])
44
  ```
45
 
46
  ### How to use it offline
47
  Refer to [Disable trust_remote_code](https://huggingface.co/Alibaba-NLP/new-impl/discussions/2#662b08d04d8c3d0a09c88fa3)
48
 
49
+ ## Evaluation
50
+
51
+ Results of reranking based on multiple text retreival datasets
52
+
53
+ ![image][./images/mgte-reranker.png]
54
+
55
+ **More detailed experimental results can be found in the [paper](https://arxiv.org/pdf/2407.19669)**.
56
+
57
+ ## Cloud API Services
58
+
59
+ In addition to the open-source [GTE](https://huggingface.co/collections/Alibaba-NLP/gte-models-6680f0b13f885cb431e6d469) series models, GTE series models are also available as commercial API services on Alibaba Cloud.
60
+
61
+ - [Embedding Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-embedding/): Rhree versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service.
62
+ - [ReRank Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-sorting-model/): The gte-rerank model service is available.
63
+
64
+ Note that the models behind the commercial APIs are not entirely identical to the open-source models.
65
+
66
+
67
  ## Citation
68
  ```
69
  @misc{zhang2024mgtegeneralizedlongcontexttext,