JohnnyBoy00
commited on
Commit
•
9756ca1
1
Parent(s):
6932723
Update README.md
Browse files
README.md
CHANGED
@@ -78,14 +78,14 @@ The following hyperparameters were used during training:
|
|
78 |
|
79 |
## Evaluation results
|
80 |
|
81 |
-
The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
|
82 |
|
83 |
The following results were achieved.
|
84 |
|
85 |
-
| Split | SacreBLEU | ROUGE | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
|
86 |
-
| --------------------- | :-------: |
|
87 |
-
| test_unseen_answers | 43.6 | 45.3
|
88 |
-
| test_unseen_questions | 3.0 | 4.2
|
89 |
|
90 |
|
91 |
The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.
|
|
|
78 |
|
79 |
## Evaluation results
|
80 |
|
81 |
+
The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE-2](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
|
82 |
|
83 |
The following results were achieved.
|
84 |
|
85 |
+
| Split | SacreBLEU | ROUGE-2 | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
|
86 |
+
| --------------------- | :-------: | :-----: | :----: | :-------: | :------: | :---------: | :------: |
|
87 |
+
| test_unseen_answers | 43.6 | 45.3 | 57.4 | 55.0 | 81.0 | 79.4 | 71.3 |
|
88 |
+
| test_unseen_questions | 3.0 | 4.2 | 19.9 | 16.1 | 60.0 | 54.4 | 53.2 |
|
89 |
|
90 |
|
91 |
The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.
|