bhadresh-savani commited on
Commit
da77a54
1 Parent(s): 48c83e7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - wiki_split
4
+
5
+ widget:
6
+ - text: "Mary likes to play football in her freetime whenever she meets with her friends that are very nice people."
7
+
8
+ license: mit
9
+ ---
10
+ # T5 model for sentence splitting in English
11
+
12
+ Sentence Split is the task of dividing a long sentence into multiple sentences.
13
+ E.g.:
14
+ ```
15
+ Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.
16
+ ```
17
+ could be split into
18
+ ```
19
+ Mary likes to play football in her freetime whenever she meets with her friends.
20
+ ```
21
+ ```
22
+ Her friends are very nice people.
23
+ ```
24
+
25
+ ## How to use it in your code:
26
+ ```python
27
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
28
+ tokenizer = AutoTokenizer.from_pretrained("flax-community/t5-v1_1-base-wikisplit")
29
+ model = AutoModelForSeq2SeqLM.from_pretrained("flax-community/t5-v1_1-base-wikisplit")
30
+
31
+ complex_sentence = "This comedy drama is produced by Tidy , the company she co-founded in 2008 with her husband David Peet , who is managing director ."
32
+ sample_tokenized = tokenizer(complex_sentence, return_tensors="pt")
33
+
34
+ answer = model.generate(sample_tokenized['input_ids'], attention_mask = sample_tokenized['attention_mask'], max_length=256, num_beams=5)
35
+ gene_sentence = tokenizer.decode(answer[0], skip_special_tokens=True)
36
+ gene_sentence
37
+
38
+ """
39
+ Output:
40
+ This comedy drama is produced by Tidy. She co-founded Tidy in 2008 with her husband David Peet, who is managing director.
41
+ """
42
+ ```
43
+ ## Datasets:
44
+ [Wiki_Split](https://research.google/tools/datasets/wiki-split/)
45
+
46
+ ## Current Basline from [paper](https://arxiv.org/abs/1907.12461)
47
+ ![baseline](./baseline.png)
48
+
49
+ ## Our Results:
50
+ | Model | Exact | SARI | BLEU |
51
+ | --- | --- | --- | --- |
52
+ | t5-base-wikisplit | 17.93 | 67.5438 | 76.9 |
53
+ | t5-v1_1-base-wikisplit | 16.84 | 66.38 | 76.32 |