Spaces:
Sleeping
Sleeping
Add training notebook
Browse files- README.md +23 -14
- notebooks/evaluation.ipynb +0 -0
- notebooks/finetuning_commafixer_with_LoRa.ipynb +0 -0
- setup.py +3 -1
README.md
CHANGED
@@ -88,9 +88,11 @@ dataset:
|
|
88 |
The results of our evaluation of the baseline model out of domain on the English wikitext-103-raw-v1 validation
|
89 |
dataset are as follows:
|
90 |
|
91 |
-
| precision | recall | F1 | support |
|
92 |
-
|
93 |
-
| 0.79 | 0.72 | 0.75 | 10079 |
|
|
|
|
|
94 |
|
95 |
We treat each comma as one token instance, as opposed to the original paper, which NER-tags the whole multiple-token
|
96 |
preceding words as comma class tokens.
|
@@ -100,17 +102,26 @@ In our approach, for each comma from the prediction text obtained from the model
|
|
100 |
* If a comma from ground truth is not predicted, it counts as a false negative.
|
101 |
|
102 |
## Training
|
103 |
-
|
104 |
-
since it preserves the sentence structure and only focuses on commas,
|
105 |
-
with limited GPU resources, we doubt we could beat the baseline model with a similar approach.
|
106 |
-
We could fine-tune the baseline on our data, focusing on commas, and see if it brings any improvement.
|
107 |
|
108 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
109 |
interesting, and wanted to check if we have enough resources for low-rank adaptation or prefix-tuning.
|
|
|
|
|
|
|
110 |
|
111 |
-
We adapt the code from [this tutorial](https://www.youtube.com/watch?v=iYr1xZn26R8) in order to fine-tune a
|
112 |
-
[bloom LLM](https://huggingface.co/bigscience/bloom-560m) to our task using
|
113 |
-
[LoRa](https://arxiv.org/pdf/2106.09685.pdf).
|
114 |
However, even with the smallest model from the family, we struggled with CUDA memory errors using the free Google
|
115 |
colab GPU quotas, and could only train with a batch size of two.
|
116 |
After a short training, it seems the loss keeps fluctuating and the model is only able to learn to repeat the
|
@@ -118,8 +129,6 @@ original phrase back.
|
|
118 |
|
119 |
If time permits, we plan to experiment with seq2seq pre-trained models, increasing gradient accumulation steps, and the
|
120 |
percentage of
|
121 |
-
data with commas.
|
122 |
-
The latter could help since wikitext contains highly diverse data, with many rows being empty strings,
|
123 |
-
headers, or short paragraphs.
|
124 |
|
125 |
|
|
|
88 |
The results of our evaluation of the baseline model out of domain on the English wikitext-103-raw-v1 validation
|
89 |
dataset are as follows:
|
90 |
|
91 |
+
| Model | precision | recall | F1 | support |
|
92 |
+
|----------|-----------|--------|------|---------|
|
93 |
+
| baseline | 0.79 | 0.72 | 0.75 | 10079 |
|
94 |
+
| ours* | 0.86 | 0.85 | 0.85 | 10079 |
|
95 |
+
*details of the fine-tuning process in the next section.
|
96 |
|
97 |
We treat each comma as one token instance, as opposed to the original paper, which NER-tags the whole multiple-token
|
98 |
preceding words as comma class tokens.
|
|
|
102 |
* If a comma from ground truth is not predicted, it counts as a false negative.
|
103 |
|
104 |
## Training
|
105 |
+
The fine-tuned model can be found [here](https://huggingface.co/klasocki/roberta-large-lora-ner-comma-fixer).
|
|
|
|
|
|
|
106 |
|
107 |
+
To compare with the baseline, we fine-tune the same model, RoBERTa large, on the wikitext English dataset.
|
108 |
+
We use a similar approach, where we treat comma-fixing as a NER problem, and for each token predict whether a comma
|
109 |
+
should be inserted after it.
|
110 |
+
|
111 |
+
The biggest differences are the dataset, the fact that we focus on commas, and that we use [LoRa](https://arxiv.org/pdf/2106.09685.pdf)
|
112 |
+
for parameter-efficient fine-tuning of the base model.
|
113 |
+
|
114 |
+
The biggest advantage of this approach is that it preserves the input structure and only focuses on commas,
|
115 |
+
ensuring that nothing else will be changed and that the model will not have to learn repeating the input back in case
|
116 |
+
no commas should be inserted.
|
117 |
+
|
118 |
+
|
119 |
+
We have also thought that trying out pre-trained text-to-text or decoder-only LLMs for this task using PEFT could be
|
120 |
interesting, and wanted to check if we have enough resources for low-rank adaptation or prefix-tuning.
|
121 |
+
While the model would have to learn to not change anything else than commas and the free-form could prove evaluation
|
122 |
+
to be difficult, this approach has added flexibility in case we decide we want to fix other errors in the future not
|
123 |
+
just commas.
|
124 |
|
|
|
|
|
|
|
125 |
However, even with the smallest model from the family, we struggled with CUDA memory errors using the free Google
|
126 |
colab GPU quotas, and could only train with a batch size of two.
|
127 |
After a short training, it seems the loss keeps fluctuating and the model is only able to learn to repeat the
|
|
|
129 |
|
130 |
If time permits, we plan to experiment with seq2seq pre-trained models, increasing gradient accumulation steps, and the
|
131 |
percentage of
|
132 |
+
data with commas, and trying out artificially inserting mistaken commas as opposed to removing them in preprocessing.
|
|
|
|
|
133 |
|
134 |
|
notebooks/evaluation.ipynb
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/finetuning_commafixer_with_LoRa.ipynb
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
setup.py
CHANGED
@@ -21,8 +21,10 @@ setup(
|
|
21 |
extras_require={
|
22 |
'training': [
|
23 |
'datasets==2.14.4',
|
|
|
|
|
24 |
'seqeval',
|
25 |
-
'
|
26 |
],
|
27 |
'test': [
|
28 |
'pytest',
|
|
|
21 |
extras_require={
|
22 |
'training': [
|
23 |
'datasets==2.14.4',
|
24 |
+
'notebook',
|
25 |
+
'peft==0.5.0',
|
26 |
'seqeval',
|
27 |
+
'evaluate==0.4.0'
|
28 |
],
|
29 |
'test': [
|
30 |
'pytest',
|