Spaces:

flax-community
/

SentenceSimplifier

Runtime error

App Files Files Community

mlkorra commited on Jul 19, 2021

Commit

3f6af0f

•

1 Parent(s): b3beee3

Update App with About Section

Browse files

Files changed (16) hide show

About/accomplishments.md +5 -0
About/applications.md +5 -0
About/contributors.md +3 -0
About/credits.md +2 -0
About/datasets.md +3 -0
About/gitrepo.md +2 -0
About/results.md +8 -0
app.py +8 -70
images/baseline.png +0 -0
navigate.py +11 -0
pages/__init__.py +0 -0
pages/__pycache__/__init__.cpython-38.pyc +0 -0
pages/__pycache__/about.cpython-38.pyc +0 -0
pages/__pycache__/inference.cpython-38.pyc +0 -0
pages/about.py +43 -0
pages/inference.py +61 -0

About/accomplishments.md ADDED Viewed

	@@ -0,0 +1,5 @@

+## Accomplishment
+* All of our models are having better result for two metrics(Exact and SARI scores) than baseline models
+* Our t5-base-wikisplit and t5-v1_1-base-wikisplit model are achieving comparative results with half model size or weights that will enable faster inference
+* We added [wikisplit](https://huggingface.co/metrics/wiki_split) metrics which is freely available at huggingface datasets. It will be easy to calculate relevent scores for this task from now on

About/applications.md ADDED Viewed

	@@ -0,0 +1,5 @@

+## Application
+* Sentence Simplification
+* Data Augmentation
+* Sentence Rephrase
+* Tweets Splitter - split long tweets into sub-tweets to maintain 140 character limit.

About/contributors.md ADDED Viewed

	@@ -0,0 +1,3 @@

+## Contributors
+* [Bhadresh Savani](www.linkedin.com/in/bhadreshsavani)
+* [Rahul Dev](https://twitter.com/mlkorra)

About/credits.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ ## Credits
2	+ Huge thanks to Huggingface 🤗 & Google Jax/Flax team for such a wonderful community week. Especially for providing such massive computing resource. Big thanks to [Suraj Patil](https://huggingface.co/valhalla) & [Patrick von Platen](https://huggingface.co/patrickvonplaten) for solving our issues and mentoring during the whole community week.

About/datasets.md ADDED Viewed

	@@ -0,0 +1,3 @@

+## Datasets used
+* [Wiki Split](https://research.google/tools/datasets/wiki-split/)
+* [Web Split](https://github.com/shashiongithub/Split-and-Rephrase)

About/gitrepo.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ ## Github Repo
2	+ * [t5-sentence-split](https://github.com/bhadreshpsavani/t5-sentence-split)

About/results.md ADDED Viewed

	@@ -0,0 +1,8 @@

+## Our Results
+| Model | Exact | SARI | BLEU |
+| --- | --- | --- | --- |
+| [t5-base-wikisplit](https://huggingface.co/flax-community/t5-base-wikisplit) |  17.93 | 67.5438 | 76.9 |
+| [t5-v1_1-base-wikisplit](https://huggingface.co/flax-community/t5-v1_1-base-wikisplit) | 18.1207 | 67.4873 | 76.9478 |
+| [byt5-base-wikisplit](https://huggingface.co/flax-community/byt5-base-wikisplit) | 11.3582 | 67.2685 | 73.1682 |
+| [t5-large-wikisplit](https://huggingface.co/flax-community/t5-large-wikisplit) | 18.6632 | 68.0501 | 77.1881 |

app.py CHANGED Viewed

@@ -1,75 +1,13 @@
 import streamlit as st
-from transformers import AutoTokenizer,AutoModelForSeq2SeqLM
-import random
-@st.cache(show_spinner=False)
-def load_model(input_complex_sentence,model):
-	base_path = "flax-community/"
-	model_path = base_path + model
-	tokenizer = AutoTokenizer.from_pretrained(model_path)
-	model     = AutoModelForSeq2SeqLM.from_pretrained(model_path)
-	tokenized_sentence = tokenizer(input_complex_sentence,return_tensors="pt")
-	result = model.generate(tokenized_sentence['input_ids'],attention_mask = tokenized_sentence['attention_mask'],max_length=256,num_beams=5)
-	generated_sentence = tokenizer.decode(result[0],skip_special_tokens=True)
-	return generated_sentence
 def main():
-	st.sidebar.title("🧠Sentence Simplifier")
-	st.title("Sentence Split in English using T5 Variants")
-	st.write("Sentence Split is the task of **dividing a long Complex Sentence into Simple Sentences**")
-	st.sidebar.write("## UI Options")
-	model = st.sidebar.selectbox(
-				  "Please Choose the Model",
-				   ("t5-base-wikisplit","t5-v1_1-base-wikisplit", "byt5-base-wikisplit","t5-large-wikisplit"))
-	change_example = st.sidebar.checkbox("Try Random Examples")
-	st.sidebar.write('''
-		## Applications:
-		* Sentence Simplification
-		* Data Augmentation
-		* Sentence Rephrase
-	''')
-	st.sidebar.write("[More Exploration](https://github.com/bhadreshpsavani/t5-sentence-split)")
-	examples = [
-		"Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.",
-		"It broadcasts on AM frequency 1600 kHz and is under ownership of Multicultural Broadcasting with studios in Surrey , British Columbia .",
-		"On March 1 , the Blackhawks played in their 2nd outdoor game in franchise history at Soldier Field in part of the new NHL Stadium Series ",
-		"'' The Rain Song '' is a love ballad , over 7 minutes in length , and is considered by singer Robert Plant to be his best overall vocal performance .",
-		"The resulting knowledge about human kinesiology and sport nutrition combined with his distinctive posing styles makes Kamali a sought out bodybuilder for seminars and guest appearances and has been featured in many bodybuilding articles , as well as being on the cover of MUSCLEMAG magazine .",
-		"The East London Line closed on 22 December 2007 and reopened on 27 April 2010 , becoming part of the new London Overground system .",
-		"' Bandolier - Budgie ' , a free iTunes app for iPad , iPhone and iPod touch , released in December 2011 , tells the story of the making of Bandolier in the band 's own words - including an extensive audio interview with Burke Shelley .",
-		"' Eden Black ' was grown from seed in the late 1980s by Stephen Morley , under his conditions it produces pitchers that are almost completley black .",
-		"' Wilson should extend his stint on The Voice to renew public interest in the band ; given that they 're pulling out all the stops , they deserve all the acclaim that surrounded them for their first two albums .",
-		"'' '' New York Mining Disaster 1941 '' '' was the second EP released by the Bee Gees in 1967 on the Spin Records , like their first EP , it was released only in Australia .",
-		"'' ADAPTOGENS : Herbs for Strength , Stamina , and Stress Relief , '' Healing Arts Press , 2007 - contains a detailed monograph on Schisandra chinensis as well as highlights health benefits ."
-	]
-	if 	change_example:
-		example = examples[random.randint(0, len(examples)-1)]
-		input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
-		split = st.button('Change and Split✂️')
-	else:
-		example=examples[0]
-		input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
-		split = st.button('Split✂️')
-	if split:
-		with st.spinner("Spliting Sentence...🧠"):
-			generated_sentence = load_model(input_complex_sentence, model)
-		sentence1, sentence2, _ = generated_sentence.split(".")
-		st.write("**Sentence1:** "+sentence1+".")
-		st.write("**Sentence2:** "+sentence2+".")
 if __name__ == "__main__":
-	main()

 import streamlit as st
+from pages import inference,about
+from navigate import Navigate
 def main():
+    app = Navigate()
+    app.add_app("Inference", inference.load_page)
+    app.add_app("About", about.load_page)
+    app.run()
 if __name__ == "__main__":
+    main()

images/baseline.png ADDED Viewed

navigate.py ADDED Viewed

	@@ -0,0 +1,11 @@

+import streamlit as st
+class Navigate:
+    def __init__(self):
+        self.apps = []
+    def add_app(self, title, func):
+        self.apps.append({"title": title, "function": func})
+    def run(self):
+        #st.sidebar.header("Sections")
+        app = st.sidebar.radio("", self.apps, format_func=lambda app: app["title"])
+        app["function"]()

pages/__init__.py ADDED Viewed

File without changes

pages/__pycache__/__init__.cpython-38.pyc ADDED Viewed

Binary file (172 Bytes). View file

pages/__pycache__/about.cpython-38.pyc ADDED Viewed

Binary file (1.89 kB). View file

pages/__pycache__/inference.cpython-38.pyc ADDED Viewed

Binary file (3.85 kB). View file

pages/about.py ADDED Viewed

	@@ -0,0 +1,43 @@

+import streamlit as st
+import os
+def read_markdown(path, folder="./About/"):
+	with open(os.path.join(folder, path)) as f:
+		return f.read()
+def load_page():
+	st.markdown(""" # T5 for Sentence Split in English """)
+	st.markdown(""" ### Sentence Split is task of dividing complex sentence in two simple sentences """)
+	st.markdown(""" ## Goal """)
+	st.markdown(""" To make best sentence split model available till now """)
+	st.markdown(""" ## How to use the Model """)
+	st.markdown("""
+	```python
+	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+	tokenizer = AutoTokenizer.from_pretrained("flax-community/t5-base-wikisplit")
+	model = AutoModelForSeq2SeqLM.from_pretrained("flax-community/t5-base-wikisplit")
+	complex_sentence = "This comedy drama is produced by Tidy , the company she co-founded in 2008 with her husband David Peet , who is managing director ."
+	sample_tokenized = tokenizer(complex_sentence, return_tensors="pt")
+	answer = model.generate(sample_tokenized['input_ids'], attention_mask = sample_tokenized['attention_mask'], max_length=256, num_beams=5)
+	gene_sentence = tokenizer.decode(answer[0], skip_special_tokens=True)
+	gene_sentence
+	\"""
+	Output:
+	This comedy drama is produced by Tidy. She co-founded Tidy in 2008 with her husband David Peet, who is managing director.
+	\"""
+	``` """)
+	st.markdown(read_markdown("datasets.md"))
+	st.markdown(read_markdown("applications.md"))
+	st.markdown(read_markdown("results.md"))
+	st.markdown(read_markdown("accomplishments.md"))
+	st.markdown(read_markdown("gitrepo.md"))
+	st.markdown(read_markdown("contributors.md"))
+	st.markdown(read_markdown("credits.md"))

pages/inference.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import streamlit as st
+from transformers import AutoTokenizer,AutoModelForSeq2SeqLM
+import random
+@st.cache(show_spinner=False)
+def load_model(input_complex_sentence,model):
+	base_path = "flax-community/"
+	model_path = base_path + model
+	tokenizer = AutoTokenizer.from_pretrained(model_path)
+	model     = AutoModelForSeq2SeqLM.from_pretrained(model_path)
+	tokenized_sentence = tokenizer(input_complex_sentence,return_tensors="pt")
+	result = model.generate(tokenized_sentence['input_ids'],attention_mask = tokenized_sentence['attention_mask'],max_length=256,num_beams=5)
+	generated_sentence = tokenizer.decode(result[0],skip_special_tokens=True)
+	return generated_sentence
+def load_page():
+	st.sidebar.title("🧠Sentence Simplifier")
+	st.title("Sentence Split in English using T5 Variants")
+	st.write("Sentence Split is the task of **dividing a long Complex Sentence into Simple Sentences**")
+	st.sidebar.write("## UI Options")
+	model = st.sidebar.selectbox(
+				  "Please Choose the Model",
+				   ("t5-base-wikisplit","t5-v1_1-base-wikisplit", "byt5-base-wikisplit","t5-large-wikisplit"))
+	change_example = st.sidebar.checkbox("Try Random Examples")
+	examples = [
+		"Mary likes to play football in her freetime whenever she meets with her friends that are very nice people.",
+		"It broadcasts on AM frequency 1600 kHz and is under ownership of Multicultural Broadcasting with studios in Surrey , British Columbia .",
+		"On March 1 , the Blackhawks played in their 2nd outdoor game in franchise history at Soldier Field in part of the new NHL Stadium Series ",
+		"'' The Rain Song '' is a love ballad , over 7 minutes in length , and is considered by singer Robert Plant to be his best overall vocal performance .",
+		"The resulting knowledge about human kinesiology and sport nutrition combined with his distinctive posing styles makes Kamali a sought out bodybuilder for seminars and guest appearances and has been featured in many bodybuilding articles , as well as being on the cover of MUSCLEMAG magazine .",
+		"The East London Line closed on 22 December 2007 and reopened on 27 April 2010 , becoming part of the new London Overground system .",
+		"' Bandolier - Budgie ' , a free iTunes app for iPad , iPhone and iPod touch , released in December 2011 , tells the story of the making of Bandolier in the band 's own words - including an extensive audio interview with Burke Shelley .",
+		"' Eden Black ' was grown from seed in the late 1980s by Stephen Morley , under his conditions it produces pitchers that are almost completley black .",
+		"' Wilson should extend his stint on The Voice to renew public interest in the band ; given that they 're pulling out all the stops , they deserve all the acclaim that surrounded them for their first two albums .",
+		"'' '' New York Mining Disaster 1941 '' '' was the second EP released by the Bee Gees in 1967 on the Spin Records , like their first EP , it was released only in Australia .",
+		"'' ADAPTOGENS : Herbs for Strength , Stamina , and Stress Relief , '' Healing Arts Press , 2007 - contains a detailed monograph on Schisandra chinensis as well as highlights health benefits ."
+	]
+	if 	change_example:
+		example = examples[random.randint(0, len(examples)-1)]
+		input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
+		split = st.button('Change and Split✂️')
+	else:
+		example=examples[0]
+		input_complex_sentence = st.text_area("Please type a Complex Sentence to split",example)
+		split = st.button('Split✂️')
+	if split:
+		with st.spinner("Spliting Sentence...🧠"):
+			generated_sentence = load_model(input_complex_sentence, model)
+		sentence1, sentence2, _ = generated_sentence.split(".")
+		st.write("**Sentence1:** "+sentence1+".")
+		st.write("**Sentence2:** "+sentence2+".")