Text Ranking
sentence-transformers
Safetensors
new
cross-encoder
reranker
Generated from Trainer
dataset_size:24588
loss:BinaryCrossEntropyLoss
custom_code
Eval Results (legacy)
text-embeddings-inference
Instructions to use TakoData/chart-reranker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use TakoData/chart-reranker with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("TakoData/chart-reranker", trust_remote_code=True) query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Upload fine-tuned chart reranker model
Browse files- README.md +44 -37
- eval/CrossEncoderCorrelationEvaluator_validation_results.csv +5 -5
- model.safetensors +1 -1
- training_info.txt +1 -1
README.md
CHANGED
|
@@ -4,7 +4,7 @@ tags:
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
-
- dataset_size:
|
| 8 |
- loss:BinaryCrossEntropyLoss
|
| 9 |
base_model: Alibaba-NLP/gte-reranker-modernbert-base
|
| 10 |
pipeline_tag: text-ranking
|
|
@@ -23,10 +23,10 @@ model-index:
|
|
| 23 |
type: validation
|
| 24 |
metrics:
|
| 25 |
- type: pearson
|
| 26 |
-
value: 0.
|
| 27 |
name: Pearson
|
| 28 |
- type: spearman
|
| 29 |
-
value: 0.
|
| 30 |
name: Spearman
|
| 31 |
---
|
| 32 |
|
|
@@ -70,11 +70,11 @@ from sentence_transformers import CrossEncoder
|
|
| 70 |
model = CrossEncoder("cross_encoder_model_id")
|
| 71 |
# Get scores for pairs of texts
|
| 72 |
pairs = [
|
| 73 |
-
['
|
| 74 |
-
[
|
| 75 |
-
['
|
| 76 |
-
['
|
| 77 |
-
['
|
| 78 |
]
|
| 79 |
scores = model.predict(pairs)
|
| 80 |
print(scores.shape)
|
|
@@ -82,13 +82,13 @@ print(scores.shape)
|
|
| 82 |
|
| 83 |
# Or rank different texts based on similarity to a single text
|
| 84 |
ranks = model.rank(
|
| 85 |
-
'
|
| 86 |
[
|
| 87 |
-
'Title: "
|
| 88 |
-
'Title: "
|
| 89 |
-
'Title: "
|
| 90 |
-
'Title: "
|
| 91 |
-
'Title: "
|
| 92 |
]
|
| 93 |
)
|
| 94 |
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
|
|
@@ -129,8 +129,8 @@ You can finetune this model on your own dataset.
|
|
| 129 |
|
| 130 |
| Metric | Value |
|
| 131 |
|:-------------|:-----------|
|
| 132 |
-
| pearson | 0.
|
| 133 |
-
| **spearman** | **0.
|
| 134 |
|
| 135 |
<!--
|
| 136 |
## Bias, Risks and Limitations
|
|
@@ -150,19 +150,19 @@ You can finetune this model on your own dataset.
|
|
| 150 |
|
| 151 |
#### Unnamed Dataset
|
| 152 |
|
| 153 |
-
* Size:
|
| 154 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 155 |
* Approximate statistics based on the first 1000 samples:
|
| 156 |
-
| | sentence_0
|
| 157 |
-
|:--------|:----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
|
| 158 |
-
| type | string
|
| 159 |
-
| details | <ul><li>min:
|
| 160 |
* Samples:
|
| 161 |
-
| sentence_0
|
| 162 |
-
|:-------------------------------------------------------------
|
| 163 |
-
| <code>
|
| 164 |
-
| <code>
|
| 165 |
-
| <code>
|
| 166 |
* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
|
| 167 |
```json
|
| 168 |
{
|
|
@@ -305,17 +305,24 @@ You can finetune this model on your own dataset.
|
|
| 305 |
</details>
|
| 306 |
|
| 307 |
### Training Logs
|
| 308 |
-
| Epoch
|
| 309 |
-
|:-----:|:----:|:-------------:|:-------------------:|
|
| 310 |
-
| 0.
|
| 311 |
-
|
|
| 312 |
-
| 1.
|
| 313 |
-
|
|
| 314 |
-
|
|
| 315 |
-
|
|
| 316 |
-
|
|
| 317 |
-
|
|
| 318 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 319 |
|
| 320 |
|
| 321 |
### Framework Versions
|
|
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
+
- dataset_size:7779
|
| 8 |
- loss:BinaryCrossEntropyLoss
|
| 9 |
base_model: Alibaba-NLP/gte-reranker-modernbert-base
|
| 10 |
pipeline_tag: text-ranking
|
|
|
|
| 23 |
type: validation
|
| 24 |
metrics:
|
| 25 |
- type: pearson
|
| 26 |
+
value: 0.8888985992978667
|
| 27 |
name: Pearson
|
| 28 |
- type: spearman
|
| 29 |
+
value: 0.8845425048973017
|
| 30 |
name: Spearman
|
| 31 |
---
|
| 32 |
|
|
|
|
| 70 |
model = CrossEncoder("cross_encoder_model_id")
|
| 71 |
# Get scores for pairs of texts
|
| 72 |
pairs = [
|
| 73 |
+
['Cohere funding history: amounts raised by round', 'Title: "Cohere Overview"\nCollections: Companies\nChart Type: company:private\nSources: S&P Global'],
|
| 74 |
+
['villes sympa à voir entre turin et come', 'Title: "Turin F.C. Schedule"\nCollections: Soccer\nChart Type: schedule:soccer_team_v2'],
|
| 75 |
+
['Current housing inventory in Chattanooga, TN', 'Title: "Tusculum, TN Inventory - House"\nCollections: Residential Real Estate\nDatasets: RegionalRealEstateIndicators\nChart Type: timeseries:eav_v2\nCanonical forms: "Inventory"="inventory_seasonally_unadjusted"\nSources: Redfin'],
|
| 76 |
+
["What's Tesla's raw material inventory?", 'Title: "Tesla Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Tesla"="Tesla, Inc.", "Overview"="Stock Overview"\nSources: S&P Global'],
|
| 77 |
+
['current weather in hong kong', 'Title: "Hong Kong Weather"\nCollections: Weather Forecasts\nChart Type: weather:international_forecast\nSources: OpenWeather'],
|
| 78 |
]
|
| 79 |
scores = model.predict(pairs)
|
| 80 |
print(scores.shape)
|
|
|
|
| 82 |
|
| 83 |
# Or rank different texts based on similarity to a single text
|
| 84 |
ranks = model.rank(
|
| 85 |
+
'Cohere funding history: amounts raised by round',
|
| 86 |
[
|
| 87 |
+
'Title: "Cohere Overview"\nCollections: Companies\nChart Type: company:private\nSources: S&P Global',
|
| 88 |
+
'Title: "Turin F.C. Schedule"\nCollections: Soccer\nChart Type: schedule:soccer_team_v2',
|
| 89 |
+
'Title: "Tusculum, TN Inventory - House"\nCollections: Residential Real Estate\nDatasets: RegionalRealEstateIndicators\nChart Type: timeseries:eav_v2\nCanonical forms: "Inventory"="inventory_seasonally_unadjusted"\nSources: Redfin',
|
| 90 |
+
'Title: "Tesla Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Tesla"="Tesla, Inc.", "Overview"="Stock Overview"\nSources: S&P Global',
|
| 91 |
+
'Title: "Hong Kong Weather"\nCollections: Weather Forecasts\nChart Type: weather:international_forecast\nSources: OpenWeather',
|
| 92 |
]
|
| 93 |
)
|
| 94 |
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
|
|
|
|
| 129 |
|
| 130 |
| Metric | Value |
|
| 131 |
|:-------------|:-----------|
|
| 132 |
+
| pearson | 0.8889 |
|
| 133 |
+
| **spearman** | **0.8845** |
|
| 134 |
|
| 135 |
<!--
|
| 136 |
## Bias, Risks and Limitations
|
|
|
|
| 150 |
|
| 151 |
#### Unnamed Dataset
|
| 152 |
|
| 153 |
+
* Size: 7,779 training samples
|
| 154 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 155 |
* Approximate statistics based on the first 1000 samples:
|
| 156 |
+
| | sentence_0 | sentence_1 | label |
|
| 157 |
+
|:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
|
| 158 |
+
| type | string | string | float |
|
| 159 |
+
| details | <ul><li>min: 4 characters</li><li>mean: 44.22 characters</li><li>max: 116 characters</li></ul> | <ul><li>min: 75 characters</li><li>mean: 184.59 characters</li><li>max: 383 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.45</li><li>max: 1.0</li></ul> |
|
| 160 |
* Samples:
|
| 161 |
+
| sentence_0 | sentence_1 | label |
|
| 162 |
+
|:-------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------|
|
| 163 |
+
| <code>Cohere funding history: amounts raised by round</code> | <code>Title: "Cohere Overview"<br>Collections: Companies<br>Chart Type: company:private<br>Sources: S&P Global</code> | <code>0.75</code> |
|
| 164 |
+
| <code>villes sympa à voir entre turin et come</code> | <code>Title: "Turin F.C. Schedule"<br>Collections: Soccer<br>Chart Type: schedule:soccer_team_v2</code> | <code>0.0</code> |
|
| 165 |
+
| <code>Current housing inventory in Chattanooga, TN</code> | <code>Title: "Tusculum, TN Inventory - House"<br>Collections: Residential Real Estate<br>Datasets: RegionalRealEstateIndicators<br>Chart Type: timeseries:eav_v2<br>Canonical forms: "Inventory"="inventory_seasonally_unadjusted"<br>Sources: Redfin</code> | <code>0.25</code> |
|
| 166 |
* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
|
| 167 |
```json
|
| 168 |
{
|
|
|
|
| 305 |
</details>
|
| 306 |
|
| 307 |
### Training Logs
|
| 308 |
+
| Epoch | Step | Training Loss | validation_spearman |
|
| 309 |
+
|:------:|:----:|:-------------:|:-------------------:|
|
| 310 |
+
| 0.4098 | 100 | - | 0.8203 |
|
| 311 |
+
| 0.8197 | 200 | - | 0.8565 |
|
| 312 |
+
| 1.0 | 244 | - | 0.8587 |
|
| 313 |
+
| 1.2295 | 300 | - | 0.8632 |
|
| 314 |
+
| 1.6393 | 400 | - | 0.8772 |
|
| 315 |
+
| 2.0 | 488 | - | 0.8714 |
|
| 316 |
+
| 2.0492 | 500 | 0.4207 | 0.8776 |
|
| 317 |
+
| 2.4590 | 600 | - | 0.8786 |
|
| 318 |
+
| 2.8689 | 700 | - | 0.8761 |
|
| 319 |
+
| 3.0 | 732 | - | 0.8824 |
|
| 320 |
+
| 3.2787 | 800 | - | 0.8817 |
|
| 321 |
+
| 3.6885 | 900 | - | 0.8838 |
|
| 322 |
+
| 4.0 | 976 | - | 0.8835 |
|
| 323 |
+
| 4.0984 | 1000 | 0.3261 | 0.8836 |
|
| 324 |
+
| 4.5082 | 1100 | - | 0.8843 |
|
| 325 |
+
| 4.9180 | 1200 | - | 0.8845 |
|
| 326 |
|
| 327 |
|
| 328 |
### Framework Versions
|
eval/CrossEncoderCorrelationEvaluator_validation_results.csv
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
epoch,steps,Pearson_Correlation,Spearman_Correlation
|
| 2 |
-
1.0,
|
| 3 |
-
2.0,
|
| 4 |
-
3.0,
|
| 5 |
-
4.0,
|
| 6 |
-
5.0,
|
|
|
|
| 1 |
epoch,steps,Pearson_Correlation,Spearman_Correlation
|
| 2 |
+
1.0,244,0.8620642924096914,0.8587166361363444
|
| 3 |
+
2.0,488,0.8764832585164201,0.8713859435370955
|
| 4 |
+
3.0,732,0.8867003524365638,0.8823857804088827
|
| 5 |
+
4.0,976,0.8881431986959347,0.8835376105032559
|
| 6 |
+
5.0,1220,0.8889602207955667,0.8845866499868097
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 598436708
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:71ca08ed8176f01a71eaa842d8135564d04d405af3ad33d2ba4c1f91e581b05d
|
| 3 |
size 598436708
|
training_info.txt
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
Base Model: Alibaba-NLP/gte-reranker-modernbert-base
|
| 2 |
-
Training Samples:
|
| 3 |
Epochs: 5
|
| 4 |
Batch Size: 32
|
| 5 |
Learning Rate: 2e-05
|
|
|
|
| 1 |
Base Model: Alibaba-NLP/gte-reranker-modernbert-base
|
| 2 |
+
Training Samples: 7779
|
| 3 |
Epochs: 5
|
| 4 |
Batch Size: 32
|
| 5 |
Learning Rate: 2e-05
|