# <center> demo for `pszemraj/flan-t5-large-grammar-synthesis`


- this notebook runs on CPU by default. use google or chatGPT to figure out how to change that if you want to run on GPU
- some details on usage
    - this model was trained on several (1-8) sentences at a time. 
    - by default, it **will not** work well for super low token counts (like 4) or super long texts
    - I would recommend using it in batches of 4-128 tokens at a time
    - an **extension** of this notebook for **batch inference** on longer texts is available [here](https://colab.research.google.com/gist/pszemraj/6e961b08970f98479511bb1e17cdb4f0/batch-grammar-check-correct-demo.ipynb)
- [link to model card](https://huggingface.co/pszemraj/flan-t5-large-grammar-synthesis)


---



In [None]:
#@markdown add auto-Colab formatting with `IPython.display`
from IPython.display import HTML, display
# colab formatting
def set_css():
    display(
        HTML(
            """
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  """
        )
    )

get_ipython().events.register("pre_run_cell", set_css)

In [1]:
pip install -U -q transformers accelerate

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.3/215.3 KB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.8/199.8 KB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from transformers import pipeline

corrector = pipeline(
    "text2text-generation",
    "pszemraj/flan-t5-large-grammar-synthesis",
)


Downloading (…)lve/main/config.json:   0%|          | 0.00/892 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.56k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

In [3]:
raw_text = "Jamie 2the store yesturday to bye some food. I needd milk, bread, andafew otter things. The $$tore was reely crowed and I had a hard time finding everyting I needed. I finaly madeit t0 dacheck 0ut line and payed for my stuff." #@param {type:"string"}
print(f"input text:\n\t{raw_text}")

input text:
	Jamie 2the store yesturday to bye some food. I needd milk, bread, andafew otter things. The $$tore was reely crowed and I had a hard time finding everyting I needed. I finaly madeit t0 dacheck 0ut line and payed for my stuff.


In [4]:
params = {
    'max_length':64,
    'repetition_penalty':1.05,
    'early_stopping':True,
    'num_beams':4
}

In [5]:
%%time

results = corrector(raw_text, **params)
print(results[0]['generated_text'], "\n"*2)

Jamie went to the store yesterday to buy some food. I needed milk, bread, and a few other things. The store was very busy and I had a hard time finding everything I needed. I finally made it to the cashier and paid for my stuff. 


CPU times: user 26 s, sys: 1.95 ms, total: 26 s
Wall time: 28.6 s


it's much faster on GPU + the inference parameters can be adjusted (less beams = faster, but worse quality).

There is also now an [XL model here](https://huggingface.co/pszemraj/flan-t5-xl-grammar-synthesis)