teamapocalypseml
/

regben2ipa-mt5-base

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

smji commited on Apr 6

Commit

7d27c51

•

1 Parent(s): b8fcc76

Update README.md

Files changed (1) hide show

README.md +73 -0

README.md CHANGED Viewed

@@ -1,3 +1,76 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- bn
+metrics:
+- wer
+- cer
+tags:
+- seq2seq
+- ipa
+- bengali
+- byt5
+widget:
+- text: <Narail> আমি সে বাবুর মামু বাড়ি গিছিলাম।
+  example_title: Narail Text
+- text: <Rangpur> এখন এই কুলো তার শেষ অই কুলো তার শেষ।
+  example_title: Rangpur Text
+- text: <Chittagong> খয়দে সিআরের এইল্লা কি অবস্থা!
+  example_title: Chittagong Text
+- text: <Kishoreganj> আটাইশ করছিলাম দের কানি ক্ষেত, ইবার মাইর কাইছি।
+  example_title: Kishoreganj Text
+- text: <Narsingdi> তারা তো ওই খারাপ খেইলাই আসে না।
+  example_title: Narsingdi Text
+- text: <Tangail> আর সব থেকে ফানি কথা হইতেছে দেখ?
+  example_title: Tangail Text
 ---
+# Regional bengali text to IPA transcription - umt5-base
+This is a fine-tuned version of the [google/umt5-base](https://huggingface.co/google/mt5-base) for the task of generating IPA transcriptions from regional bengali text.
+This was done on the dataset of the competition [“ভাষামূল: মুখের ভাষার খোঁজে“](https://www.kaggle.com/competitions/regipa/overview) by Bengali.AI.
+Scores achieved till now (test scores):
+- **Word error rate (wer)**: 0.27792885899543700
+- **Char error rate (cer)**: 0.05638457089662550
+Supported district tokens:
+- Kishoreganj
+- Narail
+- Narsingdi
+- Chittagong
+- Rangpur
+- Tangail
+---
+## Loading & using the model
+```python
+# Load model directly
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+tokenizer = AutoTokenizer.from_pretrained("teamapocalypseml/ben2ipa-mt5base")
+model = AutoModelForSeq2SeqLM.from_pretrained("teamapocalypseml/ben2ipa-mt5base")
+"""
+  The format of the input text MUST BE: <district> <bengali_text>
+"""
+text = "<district> bengali_text_here"
+text_ids = tokenizer(text, return_tensors='pt').input_ids
+model(text_ids)
+```
+## Using the pipeline
+```python
+# Use a pipeline as a high-level helper
+from transformers import pipeline
+device = "cuda" if torch.cuda.is_available() else "cpu"
+pipe = pipeline("text2text-generation", model="teamapocalypseml/ben2ipa-mt5base", device=device)
+"""
+  `texts` must be in the format of: <district> <contents>
+"""
+outputs = pipe(texts, max_length=512, batch_size=batch_size)
+```
+## Credits
+Done by [S M Jishanul Islam](https://huggingface.co/smji), [Sadia Ahmmed](https://huggingface.co/sadiaahmmed), [Sahid Hossain Mustakim](https://huggingface.co/rhsm15)