File size: 1,872 Bytes
da77a54
 
 
 
 
9e2ba0e
da77a54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e2ba0e
 
6f35883
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
datasets:
- wiki_split

widget:
- text: "Mary likes to play football in her freetime whenever she meets with her friends that are very nice people."    
---
# T5 model for sentence splitting in English

Sentence Split is the task of dividing a long sentence into multiple sentences. 
E.g.:
```
Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.
```
could be split into
```
Mary likes to play football in her freetime whenever she meets with her friends.
```
```
Her friends are very nice people.
```

## How to use it in your code:
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("flax-community/t5-v1_1-base-wikisplit")
model = AutoModelForSeq2SeqLM.from_pretrained("flax-community/t5-v1_1-base-wikisplit")

complex_sentence = "This comedy drama is produced by Tidy , the company she co-founded in 2008 with her husband David Peet , who is managing director ."
sample_tokenized = tokenizer(complex_sentence, return_tensors="pt")

answer = model.generate(sample_tokenized['input_ids'], attention_mask = sample_tokenized['attention_mask'], max_length=256, num_beams=5)
gene_sentence = tokenizer.decode(answer[0], skip_special_tokens=True)
gene_sentence

"""
Output:
This comedy drama is produced by Tidy. She co-founded Tidy in 2008 with her husband David Peet, who is managing director.
"""
```
## Datasets:
[Wiki_Split](https://research.google/tools/datasets/wiki-split/)

## Current Basline from [paper](https://arxiv.org/abs/1907.12461)
![baseline](./baseline.png)

## Our Results:
| Model | Exact | SARI | BLEU |
| --- | --- | --- | --- |
| t5-base-wikisplit |  17.93 | 67.5438 | 76.9 |
| t5-v1_1-base-wikisplit | 16.84 | 66.38 | 76.32 |
| byt5-base-wikisplit | 11.3582 | 67.2685 | 73.1682 |
| t5-large-wikisplit | 18.4896 | 67.9555 | 77.12 |