T5-TTParser / README.md
KES's picture
Update README.md
3acf7ac
|
raw
history blame
1.25 kB
metadata
language: en
tags:
  - Trinidad and Tobago English Parser
  - text2text-generation
  - Caribe
license: cc-by-nc-sa-4.0
datasets:
  - Custom dataset
  - Creolised JFLEG

Trinidad and Tobago Creole Parser

This model was trained as a parser to Trinidad and Tobago English Creole.


Model

This model utilises T5-base pre-trained model. It was fine tuned using a combination of a custom dataset and creolised JFLEG dataset. JFLEG dataset was creolised using the file encoding feature of the Caribe library. For more on Caribbean dialect checkout the library Caribe.


Usage with Transformers


from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("KES/T5-TTParser")

model = AutoModelForSeq2SeqLM.from_pretrained("KES/T5-TTParser")

txt = "Ah have live with mi paremnts en London"
inputs = tokenizer("grammar:"+txt, truncation=True, return_tensors='pt')

output = model.generate(inputs['input_ids'], num_beams=4, max_length=512, early_stopping=True)
correction=tokenizer.batch_decode(output, skip_special_tokens=True)
print("".join(correction)) #Correction: Ah live with meh parents in London.