bhadresh-savani commited on
Commit
7ac9816
2 Parent(s): 7b07637 219dea3

Merge branch 'main' of https://huggingface.co/flax-community/t5-large-wikisplit into main

Browse files
Files changed (2) hide show
  1. README.md +45 -0
  2. config.json +1 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - wiki_split
4
+ widget:
5
+ - text: "Mary likes to play football in her freetime whenever she meets with her friends that are very nice people."
6
+ ---
7
+ # T5 model for sentence splitting in English
8
+ Sentence Split is the task of dividing a long sentence into multiple sentences.
9
+ E.g.:
10
+ ```
11
+ Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.
12
+ ```
13
+ could be split into
14
+ ```
15
+ Mary likes to play football in her freetime whenever she meets with her friends.
16
+ ```
17
+ ```
18
+ Her friends are very nice people.
19
+ ```
20
+ ## How to use it in your code:
21
+ ```python
22
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
23
+ tokenizer = AutoTokenizer.from_pretrained("flax-community/t5-large-wikisplit")
24
+ model = AutoModelForSeq2SeqLM.from_pretrained("flax-community/t5-large-wikisplit")
25
+ complex_sentence = "This comedy drama is produced by Tidy , the company she co-founded in 2008 with her husband David Peet , who is managing director ."
26
+ sample_tokenized = tokenizer(complex_sentence, return_tensors="pt")
27
+ answer = model.generate(sample_tokenized['input_ids'], attention_mask = sample_tokenized['attention_mask'], max_length=256, num_beams=5)
28
+ gene_sentence = tokenizer.decode(answer[0], skip_special_tokens=True)
29
+ gene_sentence
30
+ """
31
+ Output:
32
+ This comedy drama is produced by Tidy. She co-founded Tidy in 2008 with her husband David Peet, who is managing director.
33
+ """
34
+ ```
35
+ ## Datasets:
36
+ [Wiki_Split](https://research.google/tools/datasets/wiki-split/)
37
+ ## Current Basline from [paper](https://arxiv.org/abs/1907.12461)
38
+ ![baseline](./baseline.png)
39
+ ## Our Results:
40
+ | Model | Exact | SARI | BLEU |
41
+ | --- | --- | --- | --- |
42
+ | t5-base-wikisplit | 17.93 | 67.5438 | 76.9 |
43
+ | t5-v1_1-base-wikisplit | 16.84 | 66.38 | 76.32 |
44
+ | byt5-base-wikisplit | 11.3582 | 67.2685 | 73.1682 |
45
+ | t5-large-wikisplit | 18.4295 | 67.882 | 77.1122 |
config.json CHANGED
@@ -6,6 +6,7 @@
6
  "d_ff": 4096,
7
  "d_kv": 64,
8
  "d_model": 1024,
 
9
  "decoder_start_token_id": 0,
10
  "dropout_rate": 0.1,
11
  "eos_token_id": 1,
 
6
  "d_ff": 4096,
7
  "d_kv": 64,
8
  "d_model": 1024,
9
+ "max_length": 256,
10
  "decoder_start_token_id": 0,
11
  "dropout_rate": 0.1,
12
  "eos_token_id": 1,