|
--- |
|
license: mit |
|
language: |
|
- it |
|
--- |
|
|
|
This model is a fine-tuned version of [bart-it](https://huggingface.co/morenolq/bart-it) on a lfqa dataset (pubmed_qa, webgpt_comparisons, sapere.it, stackexchange_titlebody_best_voted_answer_jsonl, lfqa_preprocessed - partially translated) |
|
|
|
### Usage |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
model_name = "efederici/bart-lfqa-it" |
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSeq2SeqLM.from_pretrained(model_name) |
|
model = model.to(device) |
|
|
|
query = "<string>" |
|
|
|
documents = [ |
|
"<string>", |
|
"<string>", |
|
... |
|
] |
|
|
|
docs = "<p> " + " <p> ".join([d for d in documents]) |
|
q = "Q: {}\n\nC: {}".format(query, docs) |
|
|
|
input_qc = tokenizer(query_and_docs, truncation=True, padding=True, return_tensors="pt") |
|
|
|
generated_answers_encoded = model.generate( |
|
input_ids=input_qc["input_ids"].to(device), |
|
attention_mask=input_qc["attention_mask"].to(device), |
|
min_length=64, |
|
max_length=256, |
|
do_sample=False, |
|
early_stopping=True, |
|
num_beams=8, |
|
temperature=1.0, |
|
top_k=None, |
|
top_p=None, |
|
eos_token_id=tokenizer.eos_token_id, |
|
no_repeat_ngram_size=3, |
|
num_return_sequences=1 |
|
) |
|
|
|
output = tokenizer.batch_decode(generated_answers_encoded, skip_special_tokens=True,clean_up_tokenization_spaces=True)[0] |
|
print(output) |
|
``` |
|
|
|
### Author |
|
- Edoardo Federici: [Twitter](https://twitter.com/edofederici) | [LinkedIn](https://www.linkedin.com/in/edoardo-federici-01341b1b6) |