Edit model card

Model Card for mTk-AdversarialQA_en-SberQuAD_ru-1B

This model is a generative in-context few-shot learner specialized in Russian. It was trained on a combination of English AdversarialQA and Russian SberQuAD datasets.

You can find detailed information on Project Github & the referenced paper.

Model Details

Model Description

  • Developed by: Michal Stefanik & Marek Kadlcik, Masaryk University
  • Model type: mt5
  • Language(s) (NLP): en,ru
  • License: MIT
  • Finetuned from model: google/mt5-large

Model Sources

Uses

This model is intended to be used in a few-shot in-context learning format in the target language (Russian), or in the source language (English, see below). It was evaluated for unseen task learning (with k=3 demonstrations) in Russian: see the referenced paper for details.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("{this model path}")
tokenizer = AutoTokenizer.from_pretrained("{this model path}")
# Instead, use keywords "Вопрос", "Контекст" and "Отвечать" for Russian few-shot prompts
input_text = """
    Question: What is the customer's name? 
    Context: Origin: Barrack Obama, Customer id: Bill Moe. 
    Answer: Bill Moe, 
    Question: What is the customer's name? 
    Context: Customer id: Barrack Obama, if not deliverable, return to Bill Clinton. 
    Answer:
"""
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**inputs)
print("Answer:")
print(tokenizer.decode(outputs))

Training Details

Training this model can be reproduced by running pip install -r requirements.txt && python train_mt5_qa_en_AQA+ru_info.py . See the referenced script for hyperparameters and other training configurations.

Citation

If you use our models or other resources in your research, please cite our work as follows.

BibTeX:

@inproceedings{stefanik2023resources,
               author = {\v{S}tef\'{a}nik, Michal and Kadlčík, Marek and Gramacki, Piotr and Sojka, Petr},
               title = {Resources and Few-shot Learners for In-context Learning in Slavic Languages},
               booktitle = {Proceedings of the 9th Workshop on Slavic Natural Language Processing},
               publisher = {ACL},
               numpages = {9},
               url = {https://arxiv.org/abs/2304.01922},
}
Downloads last month
6
Safetensors
Model size
1.23B params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train fewshot-goes-multilingual/mTk-AdversarialQA_en-SberQuAD_ru-1B