ilsilfverskiold
/

classify-news-category-iptc

@@ -15,43 +15,64 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# news_category_classification
-This model is a fine-tuned version of [KB/bert-base-swedish-cased](https://huggingface.co/KB/bert-base-swedish-cased) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.8030
 - Accuracy: 0.7431
 - F1: 0.7474
 - Precision: 0.7695
 - Recall: 0.7431
-- Accuracy Label Arts, culture, entertainment and media: 0.6842
-- Accuracy Label Conflict, war and peace: 0.7351
-- Accuracy Label Crime, law and justice: 0.8918
-- Accuracy Label Disaster, accident, and emergency incident: 0.8699
-- Accuracy Label Economy, business, and finance: 0.6893
-- Accuracy Label Environment: 0.4483
-- Accuracy Label Health: 0.7222
-- Accuracy Label Human interest: 0.3182
-- Accuracy Label Labour: 0.5
-- Accuracy Label Lifestyle and leisure: 0.5556
-- Accuracy Label Politics: 0.7909
-- Accuracy Label Religion: 0.0
-- Accuracy Label Science and technology: 0.4583
-- Accuracy Label Society: 0.3538
-- Accuracy Label Sport: 0.9615
-- Accuracy Label Weather: 0.0
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# News Category Classification for IPTC NewsCodes
+This model is a fine-tuned version of [KB/bert-base-swedish-cased](https://huggingface.co/KB/bert-base-swedish-cased) on a private dataset.
+Built from a limited set of English, Swedish and Norwegian titles to classify news content within 16 categories as specified by the IPTC NewsCodes.
+The model has been fine-tuned on a dataset that is greatly skewed, but has been slightly augmented to stabilize it.
+# Test examples
+**Input:** Mann siktet for drapsforsøk på Slovakias statsministeren
+**Output:** crime, law and justice
+**Input:** Tre døde i kioskbrann i Tyskland
+Output: disaster, accident, and emergency incident
+**Input:** Kultfilm får Netflix-oppfølger. Kultfilmen «Happy Gilmore» fra 1996 får en oppfølger på Netflix. Det røper strømmetjenesten selv på X, tidligere Twitter. –Happy Gilmore er tilbake!
+**Output:** arts, culture, entertainment and media
+# Performance
 It achieves the following results on the evaluation set:
 - Loss: 0.8030
 - Accuracy: 0.7431
 - F1: 0.7474
 - Precision: 0.7695
 - Recall: 0.7431
+See the performance (accuracy) for each label below:
+- Arts, culture, entertainment and media: 0.6842
+- Conflict, war and peace: 0.7351
+- Crime, law and justice: 0.8918
+- Disaster, accident, and emergency incident: 0.8699
+- Economy, business, and finance: 0.6893
+- Environment: 0.4483
+- Health: 0.7222
+- Human interest: 0.3182
+- Labour: 0.5
+- Lifestyle and leisure: 0.5556
+- Politics: 0.7909
+- Science and technology: 0.4583
+- Society: 0.3538
+- Sport: 0.9615
+- Weather: 1.0
+- Religion: 0.0
 ## Model description
+The model is intended to categorize Norwegian, Swedish and English news content within the specified 16 categories but is a test model for demonstration purposes.
+It needs more data within several categories to provide 100% value but it will outperform Claude Haiku and GPT-3.5 on this use case.
 ## Intended uses & limitations
+Use it to categorize news texts. Only set the category if the value is at least 60% for the label, otherwise the model is uncertain.
 ## Training and evaluation data
+Trained with the trainer, setting a learning rate of 2e-05 and batch size of 16 for 3 epochs.
 ## Training procedure