Edit model card

Romanian paraphrase

v2.0

Fine-tune t5-small-paraphrase-ro model for paraphrase. Since there is no Romanian dataset for paraphrasing, I had to create my own dataset. The dataset contains ~30k examples.

How to use

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("BlackKakapo/t5-small-paraphrase-ro-v2")
model = AutoModelForSeq2SeqLM.from_pretrained("BlackKakapo/t5-small-paraphrase-ro-v2")

Or

from transformers import T5ForConditionalGeneration, T5TokenizerFast 

model = T5ForConditionalGeneration.from_pretrained("BlackKakapo/t5-small-paraphrase-ro-v2")
tokenizer = T5TokenizerFast.from_pretrained("BlackKakapo/t5-small-paraphrase-ro-v2")

Generate

text = "Am impresia că fac multe greșeli."

encoding = tokenizer.encode_plus(text, pad_to_max_length=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"], encoding["attention_mask"]

beam_outputs = model.generate(
    input_ids=input_ids, 
    attention_mask=attention_masks,
    do_sample=True,
    max_length=256,
    top_k=20,
    top_p=0.9,
    early_stopping=False,
    num_return_sequences=5
)

final_outputs = []

for beam_output in beam_outputs:
    text_para = tokenizer.decode(beam_output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
    
    if text.lower() != text_para.lower() or text not in final_outputs:
        final_outputs.append(text_para)
        

print(final_outputs)      

Output

['Am impresia că fac multe erori.']
Downloads last month
9
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.