--- license: bigscience-openrail-m tags: - split and rephrase widget: - text: >- Cystic Fibrosis (CF) is an autosomal recessive disorder that affects multiple organs, which is common in the Caucasian population, symptomatically affecting 1 in 2500 newborns in the UK, and more than 80,000 individuals globally. datasets: - wiki_split - web_split language: - en --- # T5 model for splitting complex sentences to simple sentences in English Split-and-rephrase is the task of splitting a complex input sentence into shorter sentences while preserving meaning. (Narayan et al., 2017) E.g.: ``` Cystic Fibrosis (CF) is an autosomal recessive disorder that affects multiple organs, which is common in the Caucasian population, symptomatically affecting 1 in 2500 newborns in the UK, and more than 80,000 individuals globally. ``` could be split into ``` Cystic Fibrosis is an autosomal recessive disorder that affects multiple organs. ``` ``` Cystic Fibrosis is common in the Caucasian population. ``` ``` Cystic Fibrosis affects 1 in 2500 newborns in the UK. ``` ``` Cystic Fibrosis affects more than 80,000 individuals globally. ``` ## How to use it in your code: ```python from transformers import T5Tokenizer, T5ForConditionalGeneration checkpoint="unikei/t5-base-split-and-rephrase" tokenizer = T5Tokenizer.from_pretrained(checkpoint) model = T5ForConditionalGeneration.from_pretrained(checkpoint) complex_sentence = "Cystic Fibrosis (CF) is an autosomal recessive disorder that \ affects multiple organs, which is common in the Caucasian \ population, symptomatically affecting 1 in 2500 newborns in \ the UK, and more than 80,000 individuals globally." complex_tokenized = tokenizer(complex_sentence, padding="max_length", truncation=True, max_length=256, return_tensors='pt') simple_tokenized = model.generate(complex_tokenized['input_ids'], attention_mask = complex_tokenized['attention_mask'], max_length=256, num_beams=5) simple_sentences = tokenizer.batch_decode(simple_tokenized, skip_special_tokens=True) print(simple_sentences) """ Output: Cystic Fibrosis is an autosomal recessive disorder that affects multiple organs. Cystic Fibrosis is common in the Caucasian population. Cystic Fibrosis affects 1 in 2500 newborns in the UK. Cystic Fibrosis affects more than 80,000 individuals globally. """ ```