michal-stefanik's picture
Readme: Github link
a35af78
metadata
tags:
  - generation
language:
  - multilingual
  - cs
  - en
widget:
  - text: >-
      Otázka: Jaký je důvod dotazu zákazníka?

      Kontext: Dobrý den, Žádáme zaslání nové smlouvy kvůli řešení pojistné
      události. Zašlete na tento mail nebo přímo do systému. S pozdravem Petra
      Hladká | disponentka servisu.

      Odpověď: řešení pojistné události

      Otázka: Jaký je důvod dotazu zákazníka?

      Kontext: Dobrý den, chtěla bych Vás požádat o zaslání kopie technického
      průkazu z důvodu jeho ztráty. S pozdravem Milan Tvrdý.

      Odpověď:
    example_title: 'k-shot: Requests (cs)'
  - text: >-
      Otázka: Jaké schopnosti daly magické předměty Jurovi Jánošíkovi? 

      Kontext: Podle slovenského lidového podání byl Juro Jánošík obdařen
      magickými předměty (kouzelná valaška, čarovný opasek), které mu dodávaly
      nadpřirozené schopnosti. Okrádal především šlechtice, trestal panské dráby
      a ze svého lupu vyděloval část pro chudé, tedy bohatým bral a chudým
      dával. 

      Odpověď:
    example_title: '0-shot: Answering (cs)'
  - text: >-
      Question: What is the score of this review? 
       Context: I did not like the plot at all. Not recommended. 
       Answer: 1 
       Question: What is the score of this review? 
       Context: I loved the performance. Can’t believe they did not use CGI for the finale. I think it’s my new favourite movie. 
      Answer: 5 

      Question: Is the score of this review 1, 2, 3, 4 or 5? 

      Context: The beginning was awesome, but at the end it felt a little
      rushed. I enjoyed the movie, but probably won’t rewatch soon. 

      Answer:
    example_title: 'k-shot: Reviews (en)'
  - text: >-
      Question: What is the customer's name? 

      Context: Origin: Barrack Obama, Customer id: Bill Moe. 

      Answer: Bill Moe, 

      Question: What is the customer's name? 

      Context: Customer id: Barrack Obama, if not deliverable, return to Bill
      Clinton. 

      Answer:
    example_title: 'k-shot: Request (en)'

Mt5-large for Few-shot Czech+English Generative Question Answering

This is the mt5-large model with an LM head for a generation of extractive answers, given a small set of 2-5 demonstrations (i.e. primes).

Few-shot (i.e. priming)

Note that this is primarily a few-shot model that expects a set of demonstrations of your task of interest, similarly to GPT-3. Rather than performing well on the conventional question answering, it aims to learn to extrapolate the pattern of given demonstrations to novel tasks, such as Named Entity Recognition or Keywords Extraction from a given pattern. However, it can be also used as conventional QA model (see examples).

Data & Training

The reproducible training script is available for any use on our Github.

This model was trained on a combination of AdversarialQA and Czech SQAD 3.0 Question Answering datasets.

To train the model to use the demonstrations, we've clustered the samples by the question-word(s) in English AdversarialQA and by the category in the Czech SQAD and used the examples of the same cluster as the demonstrations of the task in training.

We find that the specific algorithm of selection of these demonstrations is crucial for the model's ability to extrapolate to new tasks. We'll share more details in the following article; stay tuned!

For the Czech SQAD 3.0, original contexts (=whole Wikipedia websites) were limited to a maximum of 4000 characters per a sequence of prime demonstrations. Pre-processing script for Czech SQAD is available here.

For training the model (and hence intended also for the inference), we've used the following patterns of 2-7 demonstrations:

For English samples:

input:

Question: {Q1} Context: {C1} Answer: {A1}, 
Question: {Q2} Context: {C2} Answer: {A2}, 
[...possibly more demonstrations...] 

Question: {Q} Context: {C} Answer:`

=> target:

{A}

For Czech samples:

input:

Otázka: {Q1} Kontext: {C1} Odpověď: {A1}, 
Otázka: {Q2} Kontext: {C2} Odpověď: {A2}, 
[...possibly more demonstrations...] 

Otázka: {Q} Kontext: {C} Odpověď:`

=> target:

{A}

The best checkpoint was picked to maximize the model's zero-shot performance on unseen Named Entity Recognition from the out-of-distribution domain of texts and labels.

Intended uses & limitations

This model is purposed for a few-shot application on any text extraction task in English and Czech, where the prompt can be stated as a natural question. E.g. to use this model for extracting the entities of customer names from the text, prompt it with demonstrations in the following format:

input_text = """
    Question: What is the customer's name? 
    Context: Origin: Barrack Obama, Customer id: Bill Moe. 
    Answer: Bill Moe, 
    Question: What is the customer's name? 
    Context: Customer id: Barrack Obama, if not deliverable, return to Bill Clinton. 
    Answer:"""

Usage

Here is how to use this model to answer the question on a given context using 🤗 Transformers in PyTorch:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("gaussalgo/mt5-large-priming-QA_en-cs")
model = AutoModelForSeq2SeqLM.from_pretrained("gaussalgo/mt5-large-priming-QA_en-cs")

# For the expected format of input_text, see Intended use above
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**inputs)

print("Answer:")
print(tokenizer.decode(outputs))