JohnnyBoy00 commited on
Commit
256a288
1 Parent(s): c7c46c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -21
README.md CHANGED
@@ -1,50 +1,91 @@
1
  ---
2
- license: apache-2.0
 
 
 
3
  tags:
4
  - generated_from_trainer
5
- model-index:
6
- - name: bart-score-finetuned-saf-communication-networks
7
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
- should probably proofread and complete it, then remove this comment. -->
12
-
13
  # bart-score-finetuned-saf-communication-networks
14
 
15
- This model is a fine-tuned version of [facebook/bart-large](https://huggingface.co/facebook/bart-large) on the None dataset.
16
 
17
  ## Model description
18
 
19
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## Intended uses & limitations
22
 
23
- More information needed
 
 
24
 
25
  ## Training and evaluation data
26
 
27
- More information needed
 
 
 
 
 
 
 
 
 
28
 
29
  ## Training procedure
30
 
 
 
 
 
31
  ### Training hyperparameters
32
 
33
  The following hyperparameters were used during training:
 
 
34
  - learning_rate: 5e-05
 
35
  - train_batch_size: 1
36
- - eval_batch_size: 4
37
- - seed: 42
38
  - gradient_accumulation_steps: 4
39
- - total_train_batch_size: 4
40
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
41
- - lr_scheduler_type: linear
42
- - num_epochs: 8
43
  - mixed_precision_training: Native AMP
44
-
45
- ### Training results
46
-
47
-
48
 
49
  ### Framework versions
50
 
@@ -52,3 +93,48 @@ The following hyperparameters were used during training:
52
  - Pytorch 1.13.1+cu116
53
  - Datasets 2.9.0
54
  - Tokenizers 0.13.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ datasets:
5
+ - Short-Answer-Feedback/saf_communication_networks_english
6
  tags:
7
  - generated_from_trainer
8
+ widget:
9
+ - text: >-
10
+ Answer: In TCP there is a Sequence Number field to identify packets
11
+ individually for reliability. There is no Sequence Number in UDP. The UDP
12
+ header does not have an options field, while the TCP header does. In TCP
13
+ there is an Advertised Window field for the Sliding Window Protocol for Flow
14
+ Control. There is no Flow Control and therefore no Advertised Window field
15
+ in UDP. In TCP there there is only a Data Offset field that specifies the
16
+ header length. In UDP the whole Packet Length is transmitted. Reference:
17
+ Possible Differences : The UPD header (8 bytes) is much shorter than the TCP
18
+ header (20-60 bytes) The UDP header has a fixed length while the TCP header
19
+ has a variable length Fields contained in the TCP header and not the UDP
20
+ header : -Sequence number -Acknowledgment number -Reserved -Flags/Control
21
+ bits -Advertised window -Urgent Pointer -Options + Padding if the options
22
+ are UDP includes the packet length (data + header) while TCP has the header
23
+ length/data offset (just header) field instead The sender port field is
24
+ optional in UDP, while the source port in TCP is necessary to establish the
25
+ connection Question: State at least 4 of the differences shown in the
26
+ lecture between the UDP and TCP headers.
27
  ---
28
 
 
 
 
29
  # bart-score-finetuned-saf-communication-networks
30
 
31
+ This model is a fine-tuned version of [facebook/bart-large](https://huggingface.co/facebook/bart-large) on the [saf_communication_networks_english](https://huggingface.co/datasets/Short-Answer-Feedback/saf_communication_networks_english) dataset for Short Answer Feedback (SAF), as proposed in [Filighera et al., ACL 2022](https://aclanthology.org/2022.acl-long.587).
32
 
33
  ## Model description
34
 
35
+ This model was built on top of [BART](https://arxiv.org/abs/1910.13461), which is a sequence-to-sequence model trained with denoising as pretraining objective.
36
+
37
+ It expects inputs in the following format:
38
+ ```
39
+ Answer: [answer] Reference: [reference_answer] Question: [question]
40
+ ```
41
+
42
+ In the example above, `[answer]`, `[reference_answer]` and `[question]` should be replaced by the provided answer, the reference answer and the question to which they refer, respectively.
43
+
44
+
45
+ The outputs are formatted as follows:
46
+ ```
47
+ [score] Feedback: [feedback]
48
+ ```
49
+
50
+ Hence, `[score]` will be a numeric value representing the score attributed to the provided answer, while `[feedback]` will be the textual feedback generated by the model according to the given answer.
51
 
52
  ## Intended uses & limitations
53
 
54
+ This model is intended to be used for Short Answer Feedback generation in the context of college-level communication networks topics. Thus, it is not expected to have particularly good performance on sets of questions and answers out of this scope.
55
+
56
+ It is important to acknowledge that the model underperforms when a question that was not seen during training is given as input for inference. In particular, it tends to classify most answers as being correct and does not provide relevant feedback in such cases. Nevertheless, this limitation could be partially overcome by extending the dataset with the desired question (and associated answers) and fine-tuning it for a few epochs on the new data.
57
 
58
  ## Training and evaluation data
59
 
60
+ As mentioned previously, the model was trained on the [saf_communication_networks_english](https://huggingface.co/datasets/Short-Answer-Feedback/saf_communication_networks_english) dataset, which is divided into the following splits.
61
+
62
+ | Split | Number of examples |
63
+ | --------------------- | ------------------ |
64
+ | train | 1700 |
65
+ | validation | 427 |
66
+ | test_unseen_answers | 375 |
67
+ | test_unseen_questions | 479 |
68
+
69
+ Evaluation was performed on the `test_unseen_answers` and `test_unseen_questions` splits.
70
 
71
  ## Training procedure
72
 
73
+ The [Trainer API](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Seq2SeqTrainer) was used to fine-tune the model. The code utilized for pre-processing and training was mostly adapted from the [summarization script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) made available by HuggingFace.
74
+
75
+ Training was completed in a little under 1 hour on a GPU on Google Colab.
76
+
77
  ### Training hyperparameters
78
 
79
  The following hyperparameters were used during training:
80
+ - num_epochs: 8
81
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
82
  - learning_rate: 5e-05
83
+ - lr_scheduler_type: linear
84
  - train_batch_size: 1
 
 
85
  - gradient_accumulation_steps: 4
86
+ - eval_batch_size: 4
 
 
 
87
  - mixed_precision_training: Native AMP
88
+ - PyTorch seed: 42
 
 
 
89
 
90
  ### Framework versions
91
 
 
93
  - Pytorch 1.13.1+cu116
94
  - Datasets 2.9.0
95
  - Tokenizers 0.13.2
96
+
97
+ ## Evaluation results
98
+
99
+ The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE-2](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [Root Mean Squared Error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error) loss from scikit-learn was used for evaluation of the predicted scores in relation to the golden label scores.
100
+
101
+ The following results were achieved.
102
+
103
+ | Split | SacreBLEU | ROUGE-2 | METEOR | BERTScore | RMSE |
104
+ | --------------------- | :-------: | :-----: | :----: | :-------: | :---: |
105
+ | test_unseen_answers | 30.5 | 46.4 | 58.2 | 68.0 | 0.373 |
106
+ | test_unseen_questions | 0.6 | 9.5 | 18.8 | 26.7 | 0.544 |
107
+
108
+
109
+ The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.
110
+
111
+ ## Usage
112
+
113
+ The example below shows how the model can be applied to generate feedback to a given answer.
114
+
115
+ ```python
116
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
117
+
118
+ model = AutoModelForSeq2SeqLM.from_pretrained('Short-Answer-Feedback/bart-score-finetuned-saf-communication-networks')
119
+ tokenizer = AutoTokenizer.from_pretrained('Short-Answer-Feedback/bart-score-finetuned-saf-communication-networks')
120
+
121
+ example_input = 'Answer: In TCP there is a Sequence Number field to identify packets individually for reliability. There is no Sequence Number in UDP. The UDP header does not have an options field, while the TCP header does. In TCP there is an Advertised Window field for the Sliding Window Protocol for Flow Control. There is no Flow Control and therefore no Advertised Window field in UDP. In TCP there there is only a Data Offset field that specifies the header length. In UDP the whole Packet Length is transmitted. Reference: Possible Differences : The UPD header (8 bytes) is much shorter than the TCP header (20-60 bytes) The UDP header has a fixed length while the TCP header has a variable length Fields contained in the TCP header and not the UDP header : -Sequence number -Acknowledgment number -Reserved -Flags/Control bits -Advertised window -Urgent Pointer -Options + Padding if the options are UDP includes the packet length (data + header) while TCP has the header length/data offset (just header) field instead The sender port field is optional in UDP, while the source port in TCP is necessary to establish the connection Question: State at least 4 of the differences shown in the lecture between the UDP and TCP headers.'
122
+ inputs = tokenizer(example_input, max_length=256, padding='max_length', truncation=True, return_tensors='pt')
123
+
124
+ generated_tokens = model.generate(
125
+ inputs['input_ids'],
126
+ attention_mask=inputs['attention_mask'],
127
+ max_length=128
128
+ )
129
+ output = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
130
+ ```
131
+
132
+ The output produced by the model then looks as follows:
133
+
134
+ ```
135
+ 1.0 Feedback: The response correctly identifies four differences between TCP and UDP headers.
136
+ ```
137
+
138
+ ## Related Work
139
+
140
+ [Filighera et al., ACL 2022](https://aclanthology.org/2022.acl-long.587) trained a [T5 model](https://huggingface.co/docs/transformers/model_doc/t5) on this dataset, providing a baseline for SAF generation. The entire code used to define and train the model can be found on [GitHub](https://github.com/SebOchs/SAF).