thiagolaitz
commited on
Commit
•
d8353c5
1
Parent(s):
e173c28
Create README.md (#1)
Browse files- Create README.md (297b08c02f8bb8a13c20e193717ff3b433c87054)
README.md
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# InRanker-small (60M parameters)
|
2 |
+
|
3 |
+
InRanker is a version of monoT5 distilled from [monoT5-3B](https://huggingface.co/castorini/monot5-3b-msmarco-10k) with increased effectiveness on out-of-domain scenarios.
|
4 |
+
Our key insight were to use language models and rerankers to generate as much as possible
|
5 |
+
synthetic "in-domain" training data, i.e., data that closely resembles
|
6 |
+
the data that will be seen at retrieval time. The pipeline used for training consists of
|
7 |
+
two distillation phases that do not require additional user queries
|
8 |
+
or manual annotations: (1) training on existing supervised soft
|
9 |
+
teacher labels, and (2) training on teacher soft labels for synthetic
|
10 |
+
queries generated using a large language model.
|
11 |
+
|
12 |
+
The paper with further details can be found [here](). The code and library are available at
|
13 |
+
https://github.com/unicamp-dl/InRanker
|
14 |
+
|
15 |
+
## Usage
|
16 |
+
The library was tested using python 3.10 and is installed with:
|
17 |
+
```bash
|
18 |
+
pip install inranker
|
19 |
+
```
|
20 |
+
|
21 |
+
The code for inference is:
|
22 |
+
```python
|
23 |
+
from inranker import T5Ranker
|
24 |
+
|
25 |
+
model = T5Ranker(model_name_or_path="unicamp-dl/InRanker-small")
|
26 |
+
|
27 |
+
docs = [
|
28 |
+
"The capital of France is Paris",
|
29 |
+
"Learn deep learning with InRanker and transformers"
|
30 |
+
]
|
31 |
+
scores = model.get_scores(
|
32 |
+
query="What is the best way to learn deep learning?",
|
33 |
+
docs=docs
|
34 |
+
)
|
35 |
+
# Scores are sorted in descending order (most relevant to least)
|
36 |
+
# scores -> [0, 1]
|
37 |
+
sorted_scores = sorted(zip(scores, docs), key=lambda x: x[0], reverse=True)
|
38 |
+
|
39 |
+
""" InRanker-small:
|
40 |
+
sorted_scores = [
|
41 |
+
(0.4844, 'Learn deep learning with InRanker and transformers'),
|
42 |
+
(7.83e-06, 'The capital of France is Paris')
|
43 |
+
]
|
44 |
+
"""
|
45 |
+
```
|