File size: 2,042 Bytes
7eb1488
 
 
f103047
 
 
 
 
 
 
 
fb77c59
 
73ee6bb
729ec3c
fb77c59
 
f103047
fb77c59
4ff0b9a
fb77c59
f103047
fb77c59
f103047
 
 
 
fb77c59
f103047
 
 
 
fb77c59
 
f103047
 
 
 
 
 
fb77c59
f103047
 
 
 
 
af606f3
a6c6e7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af606f3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
language:
- zh
pipeline_tag: sentence-similarity
tags:
- PEG
- feature-extraction
- sentence-similarity
- transformers
license: apache-2.0
library_name: transformers
---

<h1 align="center">PEG: Towards Robust Text Retrieval with Progressive Learning</h1>

## Model Details
We propose the PEG model (a Progressively Learned Textual Embedding), which progressively adjusts the weights of samples contributing to the loss within an extremely large batch, based on the difficulty levels of negative samples.
we have amassed an extensive collection of over 110 million data, spanning a wide range of fields such as general knowledge, finance, tourism, medicine, and more.

Our technical report is available at [Paper](https://arxiv.org/pdf/2311.11691.pdf)

## Usage (HuggingFace Transformers)

Install transformers:
```
pip install transformers
```

Then load model and predict:
```python
from transformers import AutoModel, AutoTokenizer
import torch


# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('TownsWu/PEG')
model = AutoModel.from_pretrained('TownsWu/PEG')
sentences = ['如何更换花呗绑定银行卡', '花呗更改绑定银行卡']
# Tokenize sentences
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    last_hidden_state = model(**inputs, return_dict=True).last_hidden_state
    embeddings = last_hidden_state[:, 0]
print("embeddings:")
print(embeddings)
```

## Contact
If you have any question or suggestion related to this project, feel free to open an issue or pull request.
You also can email Tong Wu(townswu@tencent.com). 


## Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry:

```

@article{wu2023towards,
  title={Towards Robust Text Retrieval with Progressive Learning},
  author={Wu, Tong and Qin, Yulei and Zhang, Enwei and Xu, Zihan and Gao, Yuting and Li, Ke and Sun, Xing},
  journal={arXiv preprint arXiv:2311.11691},
  year={2023}
}

```