Alibaba-NLP
/

gte-multilingual-reranker-base

Text Classification

sentence-transformers

text-embeddings-inference

Model card Files Files and versions Community

izhx commited on Jul 30, 2024

Commit

e7ca9d5

•

1 Parent(s): 66447c8

Update README.md

Files changed (1) hide show

README.md +19 -8

README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 license: apache-2.0
 ---
-## gte-multilingual-base
 The **gte-multilingual-reranker-base** model is the first reranker model in the [GTE](https://huggingface.co/collections/Alibaba-NLP/gte-models-6680f0b13f885cb431e6d469) family of models, featuring several key attributes:
 - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to reranker models of similar size.
@@ -12,18 +12,13 @@ The **gte-multilingual-reranker-base** model is the first reranker model in the
 ## Model Information
-- Model Size: 304M
 - Max Input Tokens: 8192
-## Requirements
-```
-transformers>=4.39.2
-flash_attn>=2.5.6
-```
 ### Usage
-Using Huggingface transformers
 ```
 import torch
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
@@ -37,4 +32,20 @@ with torch.no_grad():
     inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
     scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
     print(scores)
 ```

 license: apache-2.0
 ---
+## gte-multilingual-reranker-base
 The **gte-multilingual-reranker-base** model is the first reranker model in the [GTE](https://huggingface.co/collections/Alibaba-NLP/gte-models-6680f0b13f885cb431e6d469) family of models, featuring several key attributes:
 - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to reranker models of similar size.
 ## Model Information
+- Model Size: 306M
 - Max Input Tokens: 8192
 ### Usage
+Using Huggingface transformers (transformers>=4.36.0)
 ```
 import torch
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
     inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
     scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
     print(scores)
+```
+### How to use it offline
+Refer to [Disable trust_remote_code](https://huggingface.co/Alibaba-NLP/new-impl/discussions/2#662b08d04d8c3d0a09c88fa3)
+## Citation
+```
+@misc{zhang2024mgtegeneralizedlongcontexttext,
+      title={mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval},
+      author={Xin Zhang and Yanzhao Zhang and Dingkun Long and Wen Xie and Ziqi Dai and Jialong Tang and Huan Lin and Baosong Yang and Pengjun Xie and Fei Huang and Meishan Zhang and Wenjie Li and Min Zhang},
+      year={2024},
+      eprint={2407.19669},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2407.19669},
+}
 ```