ilsilfverskiold
commited on
Commit
•
3c122de
1
Parent(s):
45559ac
Update README.md
Browse files
README.md
CHANGED
@@ -15,43 +15,64 @@ model-index:
|
|
15 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
16 |
should probably proofread and complete it, then remove this comment. -->
|
17 |
|
18 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
-
This model is a fine-tuned version of [KB/bert-base-swedish-cased](https://huggingface.co/KB/bert-base-swedish-cased) on the None dataset.
|
21 |
It achieves the following results on the evaluation set:
|
22 |
- Loss: 0.8030
|
23 |
- Accuracy: 0.7431
|
24 |
- F1: 0.7474
|
25 |
- Precision: 0.7695
|
26 |
- Recall: 0.7431
|
27 |
-
|
28 |
-
|
29 |
-
-
|
30 |
-
-
|
31 |
-
-
|
32 |
-
-
|
33 |
-
-
|
34 |
-
-
|
35 |
-
-
|
36 |
-
-
|
37 |
-
-
|
38 |
-
-
|
39 |
-
-
|
40 |
-
-
|
41 |
-
-
|
42 |
-
-
|
|
|
|
|
43 |
|
44 |
## Model description
|
45 |
|
46 |
-
|
|
|
47 |
|
48 |
## Intended uses & limitations
|
49 |
|
50 |
-
|
51 |
|
52 |
## Training and evaluation data
|
53 |
|
54 |
-
|
55 |
|
56 |
## Training procedure
|
57 |
|
|
|
15 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
16 |
should probably proofread and complete it, then remove this comment. -->
|
17 |
|
18 |
+
# News Category Classification for IPTC NewsCodes
|
19 |
+
|
20 |
+
This model is a fine-tuned version of [KB/bert-base-swedish-cased](https://huggingface.co/KB/bert-base-swedish-cased) on a private dataset.
|
21 |
+
|
22 |
+
Built from a limited set of English, Swedish and Norwegian titles to classify news content within 16 categories as specified by the IPTC NewsCodes.
|
23 |
+
|
24 |
+
The model has been fine-tuned on a dataset that is greatly skewed, but has been slightly augmented to stabilize it.
|
25 |
+
|
26 |
+
# Test examples
|
27 |
+
|
28 |
+
**Input:** Mann siktet for drapsforsøk på Slovakias statsministeren
|
29 |
+
**Output:** crime, law and justice
|
30 |
+
|
31 |
+
**Input:** Tre døde i kioskbrann i Tyskland
|
32 |
+
Output: disaster, accident, and emergency incident
|
33 |
+
|
34 |
+
**Input:** Kultfilm får Netflix-oppfølger. Kultfilmen «Happy Gilmore» fra 1996 får en oppfølger på Netflix. Det røper strømmetjenesten selv på X, tidligere Twitter. –Happy Gilmore er tilbake!
|
35 |
+
**Output:** arts, culture, entertainment and media
|
36 |
+
|
37 |
+
# Performance
|
38 |
|
|
|
39 |
It achieves the following results on the evaluation set:
|
40 |
- Loss: 0.8030
|
41 |
- Accuracy: 0.7431
|
42 |
- F1: 0.7474
|
43 |
- Precision: 0.7695
|
44 |
- Recall: 0.7431
|
45 |
+
|
46 |
+
See the performance (accuracy) for each label below:
|
47 |
+
- Arts, culture, entertainment and media: 0.6842
|
48 |
+
- Conflict, war and peace: 0.7351
|
49 |
+
- Crime, law and justice: 0.8918
|
50 |
+
- Disaster, accident, and emergency incident: 0.8699
|
51 |
+
- Economy, business, and finance: 0.6893
|
52 |
+
- Environment: 0.4483
|
53 |
+
- Health: 0.7222
|
54 |
+
- Human interest: 0.3182
|
55 |
+
- Labour: 0.5
|
56 |
+
- Lifestyle and leisure: 0.5556
|
57 |
+
- Politics: 0.7909
|
58 |
+
- Science and technology: 0.4583
|
59 |
+
- Society: 0.3538
|
60 |
+
- Sport: 0.9615
|
61 |
+
- Weather: 1.0
|
62 |
+
- Religion: 0.0
|
63 |
|
64 |
## Model description
|
65 |
|
66 |
+
The model is intended to categorize Norwegian, Swedish and English news content within the specified 16 categories but is a test model for demonstration purposes.
|
67 |
+
It needs more data within several categories to provide 100% value but it will outperform Claude Haiku and GPT-3.5 on this use case.
|
68 |
|
69 |
## Intended uses & limitations
|
70 |
|
71 |
+
Use it to categorize news texts. Only set the category if the value is at least 60% for the label, otherwise the model is uncertain.
|
72 |
|
73 |
## Training and evaluation data
|
74 |
|
75 |
+
Trained with the trainer, setting a learning rate of 2e-05 and batch size of 16 for 3 epochs.
|
76 |
|
77 |
## Training procedure
|
78 |
|