Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Bengali to English Word Aligner

Finetuned Model for Bengali to English Word which was build on bert-base-multilingual-cased

Quick Start

Initialize to use it in your project

tokenizer = AutoTokenizer.from_pretrained("musfiqdehan/bengali-english-word-aligner")
model = AutoModel.from_pretrained("musfiqdehan/bengali-english-word-aligner")

Bengali-English Word Alignment

Open In Colab

Kaggle

Install Dependencies

!pip install -U data-preprocessors
!pip install -U bangla-postagger

Import Necessary Libraries

from pprint import pprint
from data_preprocessors import text_preprocessor as tp
from bangla_postagger import (en_postaggers as ep,
                              bn_en_mapper as bem,
                              translators as trans)

Testing Word Mapping and Alignment

src = "আমি ভাত খাই না, রুটি খাই।"
tgt = "I do not eat rice, I eat bread."

# Give one space before and after punctuation
# for easy tokenization
src = tp.space_punc(src)
tgt = tp.space_punc(tgt)

print("Word Mapping:")
mapping = bem.get_word_mapping(
    source=src, target=tgt, model_path="musfiqdehan/bengali-english-word-aligner")
pprint(mapping)

Output

Word Mapping:
['bn:(আমি) -> en:(I)',
 'bn:(ভাত) -> en:(rice)',
 'bn:(খাই) -> en:(do)',
 'bn:(খাই) -> en:(eat)',
 'bn:(না) -> en:(not)',
 'bn:(,) -> en:(,)',
 'bn:(রুটি) -> en:(bread)',
 'bn:(খাই) -> en:(eat)',
 'bn:(।) -> en:(.)']
Downloads last month
0