|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- feature-extraction |
|
- sentence-similarity |
|
- setfit |
|
- e5 |
|
license: mit |
|
datasets: |
|
- KnutJaegersberg/wikipedia_categories |
|
- KnutJaegersberg/wikipedia_categories_labels |
|
--- |
|
|
|
This English model (e5-large as basis) predicts wikipedia categories (roundabout 37 labels). It is trained on the concatenation of the headlines of the lower level categories articles in few shot setting (i.e. 8 subcategories with their headline concatenations per level 2 category). |
|
Accuracy on test data split is 85 %. |
|
Note that these numbers are just an indicator that training worked, it will differ in production settings, which is why this classifier is meant for corpus exploration. |
|
Use the wikipedia_categories_labels dataset as key. |
|
|
|
|
|
|
|
from setfit import SetFitModel |
|
|
|
Download from Hub and run inference |
|
model = SetFitModel.from_pretrained("KnutJaegersberg/wikipedia_categories_setfit") |
|
|
|
Run inference |
|
preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"]) |