Rahka commited on
Commit
b719141
1 Parent(s): 2c39ce6

add results

Browse files
Files changed (1) hide show
  1. README.md +9 -10
README.md CHANGED
@@ -211,25 +211,24 @@ Hyperparameter:
211
 
212
  <!-- This should link to a Data Card if possible. -->
213
 
214
- [More Information Needed]
215
-
216
- ### Factors
217
 
218
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
219
-
220
- [More Information Needed]
221
 
222
  ### Metrics
223
 
224
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
225
 
226
- [More Information Needed]
227
 
228
  ## Results
229
 
230
- | accuracy | precision_macro | recall_macro | f1_macro |
231
- |-----|-----|-----|-----|
232
- | 0.7004405286343612 | 0.5717666948436179 | 0.6127063220180629 | 0.5805958812647776 |
 
 
 
 
233
 
234
 
235
  ### Summary
 
211
 
212
  <!-- This should link to a Data Card if possible. -->
213
 
214
+ The evaluation data can be found [here](https://huggingface.co/datasets/and-effect/mdk_gov_data_titles_clf). Since the model is trained on revision 172e61bb1dd20e43903f4c51e5cbec61ec9ae6e6 for evaluation, the evaluation metrics rely on the same revision.
 
 
215
 
 
 
 
216
 
217
  ### Metrics
218
 
219
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
220
 
221
+ The model performance is tested with fours metrices. Accuracy, Precision, Recall and F1 Score. A lot of classes were not predicted and are thus set to zero for the calculation of precision, recall and f1 score. For these metrices the additional calucations were performed exluding classes with less than two predictions for the level 'Bezeichnung' (see in table results 'Bezeichnung II'. Although intepretation of these results should be interpreted with caution, because they do not represent all classes.
222
 
223
  ## Results
224
 
225
+ | accuracy | precision_macro | recall_macro | f1_macro | Task |
226
+ |-----|-----|-----|-----|-----|
227
+ | 0.7004405286343612 | 0.5717666948436179 | 0.6127063220180629 | 0.5805958812647776 | Test dataset Bezeichnung I |
228
+ | 0.9162995594713657 | 0.9318954248366014 | 0.9122380952380952 | 0.8984289453766925 | Test dataset Thema I |
229
+ | 0.7004405286343612 | 0.5730158730158731 | 0.8207602339181287 | 0.6515010351966873 | Test dataset Bezeichnung II |
230
+ | 0.5445544554455446 | 0.41787439613526567 | 0.39929183135704877 | 0.4010173484686228 | Validation dataset Bezeichnung I |
231
+ | 0.5445544554455446 | 0.6018518518518517 | 0.6278409090909091 | 0.6066776135741653 | Validation dataset Thema I |
232
 
233
 
234
  ### Summary