Update README.md
Browse files
README.md
CHANGED
@@ -9,14 +9,16 @@ tags:
|
|
9 |
## Model description
|
10 |
This model is a T5 Transformer ([t5-small](https://huggingface.co/t5-small)) fine-tuned on 29,007 spanish and nahuatl sentences using 12890 samples collected from the web and 16117 samples from the Axolotl dataset.
|
11 |
|
|
|
|
|
12 |
|
13 |
## Usage
|
14 |
```python
|
15 |
from transformers import AutoModelForSeq2SeqLM
|
16 |
from transformers import AutoTokenizer
|
17 |
|
18 |
-
model = AutoModelForSeq2SeqLM.from_pretrained('
|
19 |
-
tokenizer = AutoTokenizer.from_pretrained('
|
20 |
|
21 |
model.eval()
|
22 |
sentence = 'muchas flores son blancas'
|
@@ -32,7 +34,7 @@ The model is evaluated on 400 validation sentences.
|
|
32 |
- Validation loss: 1.56
|
33 |
- BLEU: 0.13
|
34 |
|
35 |
-
_Note: Since the Axolotl corpus contains multiple misalignments, the real BLEU and Validation loss are slightly better._
|
36 |
|
37 |
|
38 |
## References
|
|
|
9 |
## Model description
|
10 |
This model is a T5 Transformer ([t5-small](https://huggingface.co/t5-small)) fine-tuned on 29,007 spanish and nahuatl sentences using 12890 samples collected from the web and 16117 samples from the Axolotl dataset.
|
11 |
|
12 |
+
The dataset is normalized using 'sep' normalization from [py-elotl](https://github.com/ElotlMX/py-elotl).
|
13 |
+
|
14 |
|
15 |
## Usage
|
16 |
```python
|
17 |
from transformers import AutoModelForSeq2SeqLM
|
18 |
from transformers import AutoTokenizer
|
19 |
|
20 |
+
model = AutoModelForSeq2SeqLM.from_pretrained('milmor/t5-small-spanish-nahuatl')
|
21 |
+
tokenizer = AutoTokenizer.from_pretrained('milmor/t5-small-spanish-nahuatl')
|
22 |
|
23 |
model.eval()
|
24 |
sentence = 'muchas flores son blancas'
|
|
|
34 |
- Validation loss: 1.56
|
35 |
- BLEU: 0.13
|
36 |
|
37 |
+
_Note: Since the Axolotl corpus contains multiple misalignments, the real BLEU and Validation loss are slightly better. This misalignments also introduce noise into the training._
|
38 |
|
39 |
|
40 |
## References
|