Sentence Similarity
setfit
PyTorch
bert
feature-extraction
e5
KnutJaegersberg commited on
Commit
3a1c4d1
1 Parent(s): 12f15f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -11,10 +11,10 @@ datasets:
11
  - KnutJaegersberg/wikipedia_categories_labels
12
  ---
13
 
14
- This English model predicts the top 2 levels of the wikipedia categories (roundabout 1100 labels). It is trained on the concatenation of the headlines of the lower level categories articles in few shot setting (i.e. 8 subcategories with their headline concatenations per level 2 category).
15
- Accuracy on test data split of the higher category level (37 labels) is 73 % and on level 2 is 60%.
16
  Note that these numbers are just an indicator that training worked, it will differ in production settings, which is why this classifier is meant for corpus exploration.
17
- Use the wikipedia_categories_labels dataset as key.
18
 
19
 
20
 
@@ -24,4 +24,4 @@ Download from Hub and run inference
24
  model = SetFitModel.from_pretrained("KnutJaegersberg/wikipedia_categories_setfit")
25
 
26
  Run inference
27
- preds = model(["Rachel Dolezal Faces Felony Charges For Welfare Fraud", "Elon Musk just got lucky", "The hype on AI is different from the hype on other tech topics"])
 
11
  - KnutJaegersberg/wikipedia_categories_labels
12
  ---
13
 
14
+ This English model (e5-large as basis) predicts wikipedia categories (roundabout 37 labels). It is trained on the concatenation of the headlines of the lower level categories articles in few shot setting (i.e. 8 subcategories with their headline concatenations per level 2 category).
15
+ Accuracy on test data split is 85 %.
16
  Note that these numbers are just an indicator that training worked, it will differ in production settings, which is why this classifier is meant for corpus exploration.
17
+ Use the wikipedia_categories_labels dataset as key.
18
 
19
 
20
 
 
24
  model = SetFitModel.from_pretrained("KnutJaegersberg/wikipedia_categories_setfit")
25
 
26
  Run inference
27
+ preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])