license: apache-2.0
datasets:
- ehovy/race
language:
- en
base_model:
- google-t5/t5-base
Model Card for t5-base-rc-feedback (220M #params)
Description
The provided model was trained to respond to incorrect student answers in an interactive reading comprehension exercise setting. Incorrect student answers can become valuable learning opportunities, provided that the student understands where they went wrong and why. To this end, rather than being given the correct answer, students should receive elaborated feedback on how to correct a mistake on their own. Highlighting the complex demands that the generation of such feedback places on a model's input utilization abilities, we proposed two extensions to the training pipeline. Firstly, we employed a KL regularization term between a standard and enriched input format to achieve more targeted input representations. Secondly, we added a preference optimization step to encourage student answer-adaptive feedback generation.
Automatic Evaluation Results
The final model was trained and evaluated on all feedback turns from the DIRECT and DIRECT-Feedback datasets partially available at https://github.com/DIRECTDataset/DIRECTFeedback/blob/main/data/feedback_data_partial.csv
BLEU | METEOR | ROUGE | BERTScore |
---|---|---|---|
6.9 | 21.7 | 21.4 | 19.0 |
For additional details we refer the reader to our paper.
Manual Evaluation Results
We sampled 250 items for the joined DIRECT+DIRECT-F feedback set and had one of the authors of this paper manually evaluate the generated feedback.
appropriate (verification, explanation and hint feedback) | direct (correction feedback) | irrelevant or ambigue | unfaithful (contradicting the passage or alluding to an incorrect answer) |
---|---|---|---|
43.6% | 23.6% | 22% | 10.8% |
Execution
Code and instructions on how to perform inference on the model are provided at https://github.com/DIRECTDataset/DIRECTFeedback
Citation
Liermann, W., Huang J., Lee, Y., Lee, K. (2024, November). More Insightful Feedback for Tutoring: Enhancing Generation Mechanisms and Automatic Evaluation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing.