--- license: apache-2.0 ---

mT5 small spanish es

This is a Spanish fine-tuned version of Google's mT5-small model. https://huggingface.co/google/mt5-small # Datasets The datasets used for the fine-tuning Task Prefix Multinli (English) multi nli premise:[Text] hypo:[Text] Multinli (Spanish) multi nli premise:[Text] hypo:[Text] Pawx (English) pawx sentence1:[Text] sentence2:[Text] Pawx (Spanish) pawx sentence1:[Text] sentence2:[Text] Squad (English) question:[Text] context:[Text] Squad (Spanish) question:[Text] context:[Text] Translations (English-Spanish) translate English to Spanish:[Text] Translations (Spanish-English) translate Spanish to English:[Text] # Inference The following piece of code could be used to perfome the different model tasks. Translations from transformers import AutoTokenizer, AutoModelForSeq2SeqLM model_name = "HURIDOCS/mt5-small-spanish-es" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) task = "translate Spanish to English:Esta frase es para probar el modelo" input_ids = tokenizer( [task], return_tensors="pt", padding="max_length", truncation=True, max_length=512 )["input_ids"] output_ids = model.generate( input_ids=input_ids, max_length=84, no_repeat_ngram_size=2, num_beams=4 )[0] result_text = tokenizer.decode( output_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(result_text) Question answering from transformers import AutoTokenizer, AutoModelForSeq2SeqLM model_name = "HURIDOCS/mt5-small-spanish-es" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) task = '''question:En qué país se encuentra Normandía? context:Los normandos (normandos: Nourmann; Francés: Normandos; Normanni) fue el pueblo que en los siglos X y XI dio su nombre a Normandía, una región de Francia. Eran descendientes de invasores nórdicos ('normandos" viene de "Norseman") y piratas de Dinamarca, Islandia y Noruega que, bajo su líder Rollo, acordaron jurar lealtad al rey Carlos III de Francia Occidental. A través de generaciones de asimilación y mezcla con las poblaciones nativas francas y galas romanas, sus descendientes se fusionarían gradualmente con las culturas carolingias de Francia Occidental. La identidad cultural y étnica distintiva de los normandos surgió inicialmente en la primera mitad del siglo X, y continuó evolucionando durante los siglos siguientes.''' input_ids = tokenizer( [task], return_tensors="pt", padding="max_length", truncation=True, max_length=512 )["input_ids"] output_ids = model.generate( input_ids=input_ids, max_length=84, no_repeat_ngram_size=2, num_beams=4 )[0] result_text = tokenizer.decode( output_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(result_text) # Fine-tuning Check out the Transformers Libray examples https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering # Performance Spanish SQuAD v2 512 tokens Model Exact match F1 rank 1 mrm8488/distill-bert-base-spanish-wwm-cased 50.43% 71.45% rank 2 **mT5 small spanish es** 48.35% 62.03% rank 3 flan-t5-small 41.44% 56.48%