JohnnyBoy00 commited on
Commit
00eecff
1 Parent(s): d160911

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -78,14 +78,14 @@ The following hyperparameters were utilized during training:
78
 
79
  ## Evaluation results
80
 
81
- The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
82
 
83
  The following results were achieved.
84
 
85
- | Split | SacreBLEU | ROUGE | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
86
- | --------------------- | :-------: | :---: | :----: | :-------: | :------: | :---------: | :------: |
87
- | test_unseen_answers | 39.5 | 29.8 | 63.3 | 63.1 | 80.1 | 80.3 | 80.7 |
88
- | test_unseen_questions | 0.3 | 0.5 | 33.8 | 31.3 | 48.7 | 46.5 | 40.6 |
89
 
90
 
91
  The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.
 
78
 
79
  ## Evaluation results
80
 
81
+ The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE-2](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
82
 
83
  The following results were achieved.
84
 
85
+ | Split | SacreBLEU | ROUGE-2 | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
86
+ | --------------------- | :-------: | :-----: | :----: | :-------: | :------: | :---------: | :------: |
87
+ | test_unseen_answers | 39.5 | 29.8 | 63.3 | 63.1 | 80.1 | 80.3 | 80.7 |
88
+ | test_unseen_questions | 0.3 | 0.5 | 33.8 | 31.3 | 48.7 | 46.5 | 40.6 |
89
 
90
 
91
  The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.