Keston Smith commited on
Commit
64ff8fd
1 Parent(s): cd4face

Adding Readme file

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+
3
+ language: en
4
+
5
+ tags:
6
+
7
+ - Trinidad and Tobago English Parser
8
+
9
+ - text2text-generation
10
+
11
+ - Caribe
12
+
13
+ license: cc-by-nc-sa-4.0
14
+
15
+ datasets:
16
+
17
+ - Custom dataset
18
+ - Creolised JFLEG
19
+
20
+ ---
21
+
22
+ # Model
23
+ This model utilises T5-base pre-trained model. It was fine tuned using a combination of a custom dataset and creolised [JFLEG](https://arxiv.org/abs/1702.04066) dataset. JFLEG dataset was creolised using the file encoding feature of the Caribe library. For more on Caribbean dialect checkout the library [Caribe](https://pypi.org/project/Caribe/).
24
+
25
+ ___
26
+
27
+
28
+ # Usage with Transformers
29
+
30
+ ```python
31
+
32
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
33
+
34
+ tokenizer = AutoTokenizer.from_pretrained("KES/T5-TTParser")
35
+
36
+ model = AutoModelForSeq2SeqLM.from_pretrained("KES/T5-TTParser")
37
+
38
+ txt = "Ah have live with mi paremnts en London"
39
+ inputs = tokenizer("grammar:"+txt, truncation=True, return_tensors='pt')
40
+
41
+ output = model.generate(inputs['input_ids'], num_beams=4, max_length=512, early_stopping=True)
42
+ correction=tokenizer.batch_decode(output, skip_special_tokens=True)
43
+ print("".join(correction)) #Correction: Ah live with meh parents in London.
44
+
45
+ ```