--- title: ERAV2S20 Tokenizer emoji: 👁 colorFrom: pink colorTo: green sdk: gradio sdk_version: 4.36.1 app_file: app.py pinned: false license: mit --- # App to demo tokenizer for devanagari(hindi language) script. ### Tokenizer uses a regex that splits the hindi words with numbers, puntuations and spaces. Regex also takes care of english alphabets and numerals. Tokenizer is trained specially on devanagari texts and if trained on english text will work just as good. ## Preview: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66234870c8920ec3516d01bc/ZiEMLp1eLhWjNUrlb4ugJ.png)