add results
Browse files
README.md
CHANGED
@@ -211,25 +211,24 @@ Hyperparameter:
|
|
211 |
|
212 |
<!-- This should link to a Data Card if possible. -->
|
213 |
|
214 |
-
[
|
215 |
-
|
216 |
-
### Factors
|
217 |
|
218 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
219 |
-
|
220 |
-
[More Information Needed]
|
221 |
|
222 |
### Metrics
|
223 |
|
224 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
225 |
|
226 |
-
|
227 |
|
228 |
## Results
|
229 |
|
230 |
-
| accuracy | precision_macro | recall_macro | f1_macro |
|
231 |
-
|
232 |
-
| 0.7004405286343612 | 0.5717666948436179 | 0.6127063220180629 | 0.5805958812647776 |
|
|
|
|
|
|
|
|
|
233 |
|
234 |
|
235 |
### Summary
|
|
|
211 |
|
212 |
<!-- This should link to a Data Card if possible. -->
|
213 |
|
214 |
+
The evaluation data can be found [here](https://huggingface.co/datasets/and-effect/mdk_gov_data_titles_clf). Since the model is trained on revision 172e61bb1dd20e43903f4c51e5cbec61ec9ae6e6 for evaluation, the evaluation metrics rely on the same revision.
|
|
|
|
|
215 |
|
|
|
|
|
|
|
216 |
|
217 |
### Metrics
|
218 |
|
219 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
220 |
|
221 |
+
The model performance is tested with fours metrices. Accuracy, Precision, Recall and F1 Score. A lot of classes were not predicted and are thus set to zero for the calculation of precision, recall and f1 score. For these metrices the additional calucations were performed exluding classes with less than two predictions for the level 'Bezeichnung' (see in table results 'Bezeichnung II'. Although intepretation of these results should be interpreted with caution, because they do not represent all classes.
|
222 |
|
223 |
## Results
|
224 |
|
225 |
+
| accuracy | precision_macro | recall_macro | f1_macro | Task |
|
226 |
+
|-----|-----|-----|-----|-----|
|
227 |
+
| 0.7004405286343612 | 0.5717666948436179 | 0.6127063220180629 | 0.5805958812647776 | Test dataset Bezeichnung I |
|
228 |
+
| 0.9162995594713657 | 0.9318954248366014 | 0.9122380952380952 | 0.8984289453766925 | Test dataset Thema I |
|
229 |
+
| 0.7004405286343612 | 0.5730158730158731 | 0.8207602339181287 | 0.6515010351966873 | Test dataset Bezeichnung II |
|
230 |
+
| 0.5445544554455446 | 0.41787439613526567 | 0.39929183135704877 | 0.4010173484686228 | Validation dataset Bezeichnung I |
|
231 |
+
| 0.5445544554455446 | 0.6018518518518517 | 0.6278409090909091 | 0.6066776135741653 | Validation dataset Thema I |
|
232 |
|
233 |
|
234 |
### Summary
|