Short-Answer-Feedback
/

mbart-finetuned-saf-legal-domain

Text2Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

JohnnyBoy00 commited on Dec 21, 2022

Commit

9756ca1

•

1 Parent(s): 6932723

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -78,14 +78,14 @@ The following hyperparameters were used during training:
 ## Evaluation results
-The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
 The following results were achieved.
-| Split                 | SacreBLEU | ROUGE | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
-| --------------------- | :-------: | :---: | :----: | :-------: | :------: | :---------: | :------: |
-| test_unseen_answers   | 43.6	    | 45.3  | 57.4   | 55.0      | 81.0     | 79.4        | 71.3     |
-| test_unseen_questions | 3.0       | 4.2   | 19.9   | 16.1      | 60.0     | 54.4        | 53.2     |
 The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.

 ## Evaluation results
+The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE-2](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
 The following results were achieved.
+| Split                 | SacreBLEU | ROUGE-2 | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
+| --------------------- | :-------: | :-----: | :----: | :-------: | :------: | :---------: | :------: |
+| test_unseen_answers   | 43.6	    | 45.3    | 57.4   | 55.0      | 81.0     | 79.4        | 71.3     |
+| test_unseen_questions | 3.0       | 4.2     | 19.9   | 16.1      | 60.0     | 54.4        | 53.2     |
 The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.