KnutJaegersberg
/

wikipedia_categories_setfit

Sentence Similarity

feature-extraction

Model card Files Files and versions Community

wikipedia_categories_setfit / README.md

KnutJaegersberg's picture

KnutJaegersberg

Update README.md

3a1c4d1 over 1 year ago

|

1.02 kB

	---
	pipeline_tag: sentence-similarity
	tags:
	- feature-extraction
	- sentence-similarity
	- setfit
	- e5
	license: mit
	datasets:
	- KnutJaegersberg/wikipedia_categories
	- KnutJaegersberg/wikipedia_categories_labels
	---

	This English model (e5-large as basis) predicts wikipedia categories (roundabout 37 labels). It is trained on the concatenation of the headlines of the lower level categories articles in few shot setting (i.e. 8 subcategories with their headline concatenations per level 2 category).
	Accuracy on test data split is 85 %.
	Note that these numbers are just an indicator that training worked, it will differ in production settings, which is why this classifier is meant for corpus exploration.
	Use the wikipedia_categories_labels dataset as key.



	from setfit import SetFitModel

	Download from Hub and run inference
	model = SetFitModel.from_pretrained("KnutJaegersberg/wikipedia_categories_setfit")

	Run inference
	preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])