--- license: apache-2.0 language: - bn metrics: - wer - cer tags: - seq2seq - ipa - bengali - byt5 widget: - text: আমি সে বাবুর মামু বাড়ি গিছিলাম। example_title: Narail Text - text: এখন এই কুলো তার শেষ অই কুলো তার শেষ। example_title: Rangpur Text - text: খয়দে সিআরের এইল্লা কি অবস্থা! example_title: Chittagong Text - text: আটাইশ করছিলাম দের কানি ক্ষেত, ইবার মাইর কাইছি। example_title: Kishoreganj Text - text: তারা তো ওই খারাপ খেইলাই আসে না। example_title: Narsingdi Text - text: আর সব থেকে ফানি কথা হইতেছে দেখ? example_title: Tangail Text --- # Regional bengali text to IPA transcription - byT5-small This is a fine-tuned version of the [google/byt5-small](https://huggingface.co/google/byt5-small) for the task of generating IPA transcriptions from regional bengali text. This was done on the dataset of the competition [“ভাষামূল: মুখের ভাষার খোঁজে“](https://www.kaggle.com/competitions/regipa/overview) by Bengali.AI. Model performance: - **Word error rate (wer)**: 0.0124279344454407 - **Char error rate (cer)**: 0.00427635805681347 Supported district tokens: - Kishoreganj - Narail - Narsingdi - Chittagong - Rangpur - Tangail --- ## Loading & using the model ```python # Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("teamapocalypseml/ben2ipa-byt5small") model = AutoModelForSeq2SeqLM.from_pretrained("teamapocalypseml/ben2ipa-byt5small") """ The format of the input text MUST BE: """ text = " bengali_text_here" text_ids = tokenizer(text, return_tensors='pt').input_ids model(text_ids) ``` ## Using the pipeline ```python # Use a pipeline as a high-level helper from transformers import pipeline device = "cuda" if torch.cuda.is_available() else "cpu" pipe = pipeline("text2text-generation", model="teamapocalypseml/ben2ipa-byt5small", device=device) """ `texts` must be in the format of: """ outputs = pipe(texts, max_length=1024, batch_size=batch_size) ``` ## Credits Done by [S M Jishanul Islam](https://github.com/S-M-J-I), [Sadia Ahmmed](https://huggingface.co/sadiaahmmed), [Sahid Hossain Mustakim](https://huggingface.co/rhsm15)