Model Description
- Model type: Multi-class classifier on top of a transformer
- Language(s) (NLP): Spanish varieties: Argentinian (ar), Chilean (cl), Mexican (mx), Spanish (es), and the rest (mix)
- License: GPL-3.0
- Finetuned from: XLM-RoBERTa large
- Preprocessing and tokenisation: the same as XLM-RoBERTa
We provide models for a 3-class (es, mx, mix), a 4-class (cl, es, mx, mix) and a 5-class problem (ar, cl, es, mx, mix). For each case, models with 3 different seeds and the versions with one and two splits of the training documents are included. See the documentation of docTransformer for more detailed information.
Model Sources
- Repository: https://github.com/CEREAL-es/CEREAL
- Paper: Coming soon!
- Data: Coming soon!
Use
Use the CEREAL classification models with docTransformer.
Example Usage
Use these models for evaluation, classification or explanation using integrated gradients:
Slurm
Evaluation (gold label available)
srun --ntasks 1 --gpus-per-task 1 python -u docClassifier.py --task evaluation -f trainedModel -o C4_cereal2splits_seed1.bin -b2 --sentence_batch_size 2 --split_documents True --test_dataset data/multivariant3all.test --plotConfusionFileName modelSplit2Seed3test.png
Classification (gold label unavailable)
srun --ntasks 1 --gpus-per-task 1 python -u docClassifier.py --task classification -f trainedModel -o C4_cereal2splits_seed1.bin -b1 --sentence_batch_size 2 --split_documents True --test_dataset ../es/es_meta_part_1.jsonl.unk
Explanation
srun --ntasks 1 --gpus-per-task 1 python -u docClassifier.py --task explanation -t data/testExample.mx -f trainedModel -o C4_cereal1split_seed1.bin -b1 --split_documents False --xai_threshold_percentile 90
Citation
BibTeX:
@InProceedings{espana-bonet-barron-cedeno-2024,
title = "Elote, Choclo and Mazorca: on the Varieties of Spanish",
author = "Espa{\~n}a-Bonet, Cristina and Barr{\'o}n-Cede{\~n}o, Alberto",
booktitle = "Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/",
pages = "--"
}
APA:
España-Bonet, Cristina and Barrón-Cedeño, Alberto. (2024, June). Elote, Choclo and Mazorca: on the Varieties of Spanish. In Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL 2024 (pp. -).
- Downloads last month
- 0