mlkorra commited on
Commit
3f6af0f
1 Parent(s): b3beee3

Update App with About Section

Browse files
About/accomplishments.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
1
+ ## Accomplishment
2
+
3
+ * All of our models are having better result for two metrics(Exact and SARI scores) than baseline models
4
+ * Our t5-base-wikisplit and t5-v1_1-base-wikisplit model are achieving comparative results with half model size or weights that will enable faster inference
5
+ * We added [wikisplit](https://huggingface.co/metrics/wiki_split) metrics which is freely available at huggingface datasets. It will be easy to calculate relevent scores for this task from now on
About/applications.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
1
+ ## Application
2
+ * Sentence Simplification
3
+ * Data Augmentation
4
+ * Sentence Rephrase
5
+ * Tweets Splitter - split long tweets into sub-tweets to maintain 140 character limit.
About/contributors.md ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ ## Contributors
2
+ * [Bhadresh Savani](www.linkedin.com/in/bhadreshsavani)
3
+ * [Rahul Dev](https://twitter.com/mlkorra)
About/credits.md ADDED
@@ -0,0 +1,2 @@
 
 
1
+ ## Credits
2
+ Huge thanks to Huggingface 🤗 & Google Jax/Flax team for such a wonderful community week. Especially for providing such massive computing resource. Big thanks to [Suraj Patil](https://huggingface.co/valhalla) & [Patrick von Platen](https://huggingface.co/patrickvonplaten) for solving our issues and mentoring during the whole community week.
About/datasets.md ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ ## Datasets used
2
+ * [Wiki Split](https://research.google/tools/datasets/wiki-split/)
3
+ * [Web Split](https://github.com/shashiongithub/Split-and-Rephrase)
About/gitrepo.md ADDED
@@ -0,0 +1,2 @@
 
 
1
+ ## Github Repo
2
+ * [t5-sentence-split](https://github.com/bhadreshpsavani/t5-sentence-split)
About/results.md ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
1
+ ## Our Results
2
+
3
+ | Model | Exact | SARI | BLEU |
4
+ | --- | --- | --- | --- |
5
+ | [t5-base-wikisplit](https://huggingface.co/flax-community/t5-base-wikisplit) | 17.93 | 67.5438 | 76.9 |
6
+ | [t5-v1_1-base-wikisplit](https://huggingface.co/flax-community/t5-v1_1-base-wikisplit) | 18.1207 | 67.4873 | 76.9478 |
7
+ | [byt5-base-wikisplit](https://huggingface.co/flax-community/byt5-base-wikisplit) | 11.3582 | 67.2685 | 73.1682 |
8
+ | [t5-large-wikisplit](https://huggingface.co/flax-community/t5-large-wikisplit) | 18.6632 | 68.0501 | 77.1881 |
app.py CHANGED
@@ -1,75 +1,13 @@
1
  import streamlit as st
2
- from transformers import AutoTokenizer,AutoModelForSeq2SeqLM
3
- import random
4
-
5
-
6
- @st.cache(show_spinner=False)
7
- def load_model(input_complex_sentence,model):
8
-
9
- base_path = "flax-community/"
10
- model_path = base_path + model
11
- tokenizer = AutoTokenizer.from_pretrained(model_path)
12
- model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
13
-
14
- tokenized_sentence = tokenizer(input_complex_sentence,return_tensors="pt")
15
- result = model.generate(tokenized_sentence['input_ids'],attention_mask = tokenized_sentence['attention_mask'],max_length=256,num_beams=5)
16
- generated_sentence = tokenizer.decode(result[0],skip_special_tokens=True)
17
-
18
- return generated_sentence
19
 
20
  def main():
21
-
22
- st.sidebar.title("🧠Sentence Simplifier")
23
- st.title("Sentence Split in English using T5 Variants")
24
- st.write("Sentence Split is the task of **dividing a long Complex Sentence into Simple Sentences**")
25
-
26
- st.sidebar.write("## UI Options")
27
- model = st.sidebar.selectbox(
28
- "Please Choose the Model",
29
- ("t5-base-wikisplit","t5-v1_1-base-wikisplit", "byt5-base-wikisplit","t5-large-wikisplit"))
30
-
31
- change_example = st.sidebar.checkbox("Try Random Examples")
32
-
33
- st.sidebar.write('''
34
- ## Applications:
35
- * Sentence Simplification
36
- * Data Augmentation
37
- * Sentence Rephrase
38
- ''')
39
-
40
-
41
- st.sidebar.write("[More Exploration](https://github.com/bhadreshpsavani/t5-sentence-split)")
42
-
43
- examples = [
44
- "Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.",
45
- "It broadcasts on AM frequency 1600 kHz and is under ownership of Multicultural Broadcasting with studios in Surrey , British Columbia .",
46
- "On March 1 , the Blackhawks played in their 2nd outdoor game in franchise history at Soldier Field in part of the new NHL Stadium Series ",
47
- "'' The Rain Song '' is a love ballad , over 7 minutes in length , and is considered by singer Robert Plant to be his best overall vocal performance .",
48
- "The resulting knowledge about human kinesiology and sport nutrition combined with his distinctive posing styles makes Kamali a sought out bodybuilder for seminars and guest appearances and has been featured in many bodybuilding articles , as well as being on the cover of MUSCLEMAG magazine .",
49
- "The East London Line closed on 22 December 2007 and reopened on 27 April 2010 , becoming part of the new London Overground system .",
50
- "' Bandolier - Budgie ' , a free iTunes app for iPad , iPhone and iPod touch , released in December 2011 , tells the story of the making of Bandolier in the band 's own words - including an extensive audio interview with Burke Shelley .",
51
- "' Eden Black ' was grown from seed in the late 1980s by Stephen Morley , under his conditions it produces pitchers that are almost completley black .",
52
- "' Wilson should extend his stint on The Voice to renew public interest in the band ; given that they 're pulling out all the stops , they deserve all the acclaim that surrounded them for their first two albums .",
53
- "'' '' New York Mining Disaster 1941 '' '' was the second EP released by the Bee Gees in 1967 on the Spin Records , like their first EP , it was released only in Australia .",
54
- "'' ADAPTOGENS : Herbs for Strength , Stamina , and Stress Relief , '' Healing Arts Press , 2007 - contains a detailed monograph on Schisandra chinensis as well as highlights health benefits ."
55
- ]
56
-
57
- if change_example:
58
- example = examples[random.randint(0, len(examples)-1)]
59
- input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
60
- split = st.button('Change and Split✂️')
61
- else:
62
- example=examples[0]
63
- input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
64
- split = st.button('Split✂️')
65
-
66
- if split:
67
- with st.spinner("Spliting Sentence...🧠"):
68
- generated_sentence = load_model(input_complex_sentence, model)
69
- sentence1, sentence2, _ = generated_sentence.split(".")
70
- st.write("**Sentence1:** "+sentence1+".")
71
- st.write("**Sentence2:** "+sentence2+".")
72
-
73
 
74
  if __name__ == "__main__":
75
- main()
1
  import streamlit as st
2
+ from pages import inference,about
3
+ from navigate import Navigate
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  def main():
6
+
7
+ app = Navigate()
8
+ app.add_app("Inference", inference.load_page)
9
+ app.add_app("About", about.load_page)
10
+ app.run()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  if __name__ == "__main__":
13
+ main()
images/baseline.png ADDED
navigate.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ class Navigate:
4
+ def __init__(self):
5
+ self.apps = []
6
+ def add_app(self, title, func):
7
+ self.apps.append({"title": title, "function": func})
8
+ def run(self):
9
+ #st.sidebar.header("Sections")
10
+ app = st.sidebar.radio("", self.apps, format_func=lambda app: app["title"])
11
+ app["function"]()
pages/__init__.py ADDED
File without changes
pages/__pycache__/__init__.cpython-38.pyc ADDED
Binary file (172 Bytes). View file
pages/__pycache__/about.cpython-38.pyc ADDED
Binary file (1.89 kB). View file
pages/__pycache__/inference.cpython-38.pyc ADDED
Binary file (3.85 kB). View file
pages/about.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import os
3
+
4
+ def read_markdown(path, folder="./About/"):
5
+ with open(os.path.join(folder, path)) as f:
6
+ return f.read()
7
+
8
+ def load_page():
9
+
10
+ st.markdown(""" # T5 for Sentence Split in English """)
11
+ st.markdown(""" ### Sentence Split is task of dividing complex sentence in two simple sentences """)
12
+
13
+ st.markdown(""" ## Goal """)
14
+ st.markdown(""" To make best sentence split model available till now """)
15
+
16
+ st.markdown(""" ## How to use the Model """)
17
+ st.markdown("""
18
+
19
+ ```python
20
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
21
+ tokenizer = AutoTokenizer.from_pretrained("flax-community/t5-base-wikisplit")
22
+ model = AutoModelForSeq2SeqLM.from_pretrained("flax-community/t5-base-wikisplit")
23
+
24
+ complex_sentence = "This comedy drama is produced by Tidy , the company she co-founded in 2008 with her husband David Peet , who is managing director ."
25
+ sample_tokenized = tokenizer(complex_sentence, return_tensors="pt")
26
+
27
+ answer = model.generate(sample_tokenized['input_ids'], attention_mask = sample_tokenized['attention_mask'], max_length=256, num_beams=5)
28
+ gene_sentence = tokenizer.decode(answer[0], skip_special_tokens=True)
29
+ gene_sentence
30
+
31
+ \"""
32
+ Output:
33
+ This comedy drama is produced by Tidy. She co-founded Tidy in 2008 with her husband David Peet, who is managing director.
34
+ \"""
35
+
36
+ ``` """)
37
+ st.markdown(read_markdown("datasets.md"))
38
+ st.markdown(read_markdown("applications.md"))
39
+ st.markdown(read_markdown("results.md"))
40
+ st.markdown(read_markdown("accomplishments.md"))
41
+ st.markdown(read_markdown("gitrepo.md"))
42
+ st.markdown(read_markdown("contributors.md"))
43
+ st.markdown(read_markdown("credits.md"))
pages/inference.py ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from transformers import AutoTokenizer,AutoModelForSeq2SeqLM
3
+ import random
4
+
5
+
6
+ @st.cache(show_spinner=False)
7
+ def load_model(input_complex_sentence,model):
8
+
9
+ base_path = "flax-community/"
10
+ model_path = base_path + model
11
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
12
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
13
+
14
+ tokenized_sentence = tokenizer(input_complex_sentence,return_tensors="pt")
15
+ result = model.generate(tokenized_sentence['input_ids'],attention_mask = tokenized_sentence['attention_mask'],max_length=256,num_beams=5)
16
+ generated_sentence = tokenizer.decode(result[0],skip_special_tokens=True)
17
+
18
+ return generated_sentence
19
+
20
+ def load_page():
21
+
22
+ st.sidebar.title("🧠Sentence Simplifier")
23
+ st.title("Sentence Split in English using T5 Variants")
24
+ st.write("Sentence Split is the task of **dividing a long Complex Sentence into Simple Sentences**")
25
+
26
+ st.sidebar.write("## UI Options")
27
+ model = st.sidebar.selectbox(
28
+ "Please Choose the Model",
29
+ ("t5-base-wikisplit","t5-v1_1-base-wikisplit", "byt5-base-wikisplit","t5-large-wikisplit"))
30
+
31
+ change_example = st.sidebar.checkbox("Try Random Examples")
32
+
33
+ examples = [
34
+ "Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.",
35
+ "It broadcasts on AM frequency 1600 kHz and is under ownership of Multicultural Broadcasting with studios in Surrey , British Columbia .",
36
+ "On March 1 , the Blackhawks played in their 2nd outdoor game in franchise history at Soldier Field in part of the new NHL Stadium Series ",
37
+ "'' The Rain Song '' is a love ballad , over 7 minutes in length , and is considered by singer Robert Plant to be his best overall vocal performance .",
38
+ "The resulting knowledge about human kinesiology and sport nutrition combined with his distinctive posing styles makes Kamali a sought out bodybuilder for seminars and guest appearances and has been featured in many bodybuilding articles , as well as being on the cover of MUSCLEMAG magazine .",
39
+ "The East London Line closed on 22 December 2007 and reopened on 27 April 2010 , becoming part of the new London Overground system .",
40
+ "' Bandolier - Budgie ' , a free iTunes app for iPad , iPhone and iPod touch , released in December 2011 , tells the story of the making of Bandolier in the band 's own words - including an extensive audio interview with Burke Shelley .",
41
+ "' Eden Black ' was grown from seed in the late 1980s by Stephen Morley , under his conditions it produces pitchers that are almost completley black .",
42
+ "' Wilson should extend his stint on The Voice to renew public interest in the band ; given that they 're pulling out all the stops , they deserve all the acclaim that surrounded them for their first two albums .",
43
+ "'' '' New York Mining Disaster 1941 '' '' was the second EP released by the Bee Gees in 1967 on the Spin Records , like their first EP , it was released only in Australia .",
44
+ "'' ADAPTOGENS : Herbs for Strength , Stamina , and Stress Relief , '' Healing Arts Press , 2007 - contains a detailed monograph on Schisandra chinensis as well as highlights health benefits ."
45
+ ]
46
+
47
+ if change_example:
48
+ example = examples[random.randint(0, len(examples)-1)]
49
+ input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
50
+ split = st.button('Change and Split✂️')
51
+ else:
52
+ example=examples[0]
53
+ input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
54
+ split = st.button('Split✂️')
55
+
56
+ if split:
57
+ with st.spinner("Spliting Sentence...🧠"):
58
+ generated_sentence = load_model(input_complex_sentence, model)
59
+ sentence1, sentence2, _ = generated_sentence.split(".")
60
+ st.write("**Sentence1:** "+sentence1+".")
61
+ st.write("**Sentence2:** "+sentence2+".")