Porjaz's picture
Update README.md
9bec930 verified
|
raw
history blame
1.07 kB
metadata
license: cc-by-4.0
datasets:
  - wikimedia/wikipedia
language:
  - mk
base_model:
  - google/mt5-base

mt-5-base model fine tuned for restoration and recapitalization for Macedonian language. The model is fine-tuned on a subset of the Macedonian portion of Wikipedia.

Usage

from transformers import T5Tokenizer, T5ForConditionalGeneration
recap_model_name = "Macedonian-ASR/mt5-restore-capitalization-macedonian"
recap_tokenizer = T5Tokenizer.from_pretrained(recap_model_name)
recap_model = T5ForConditionalGeneration.from_pretrained(recap_model_name)
recap_model.to(device)

sentence = "скопје е главен град на македонија"
inputs = recap_tokenizer(["restore capitalization and punctuation: " + sentence], return_tensors="pt", padding=True).to(device)
outputs = recap_model.generate(**inputs, max_length=768, num_beams=5, early_stopping=True).squeeze(0)
recap_result = recap_tokenizer.decode(outputs, skip_special_tokens=True)
print(recap_result)
-> Скопје е главен град на Македонија.