YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
English-Chinese Translator with LSTM and Seq-to-Seq Models
Welcome to my project! ๐ Here, I've built a translator that converts English to Traditional Chinese and vice versa using advanced deep learning models. Itโs an exciting dive into the world of natural language processing (NLP) and machine translation.
Project Overview
This project includes:
Seq-to-Seq Model:
A sequence-to-sequence model with encoder-decoder architecture for language translation.LSTM Model:
An LSTM-based translation model designed to handle sequential data and long-term dependencies.Evaluation Metrics:
- BLEU Score: Measures the quality of machine translation by comparing model output with reference translations.
- ChrF Score: Focuses on character-level similarity, particularly useful for non-alphabetic languages like Chinese.
Dataset:
A bilingual dataset of 1000+ English-Traditional Chinese sentence pairs, split for training, validation, and testing.
How It Works
Data Preparation:
- Sentences are tokenized and padded to maintain uniformity.
- The dataset is split into training and validation sets.
Model Training:
- Two models are trained:
- LSTM Model for baseline performance.
- Seq-to-Seq Model with an attention mechanism for enhanced results.
- Two models are trained:
Translation Process:
- Input a sentence in English or Chinese.
- The model generates a translation in the target language.
Evaluation:
- Use BLEU and ChrF scores to validate model performance.
- Plot training and validation loss curves to monitor learning.
What I Learned
- Designing and implementing Seq-to-Seq and LSTM models for translation.
- Working with bilingual datasets and tokenizing for both English and Chinese.
- Understanding and applying BLEU and ChrF scores to evaluate translation quality.
- Managing challenges of long sequences and context switching in language models.
Results
- BLEU Score: [Add the result here]
- ChrF Score: [Add the result here]
The scores show promising results, with potential for further optimization.
Future Work
Experimenting with Transformers:
Implementing models like BERT or GPT to enhance translation quality.Expanding Dataset:
Adding more sentence pairs to improve fluency and context handling.Multi-language Translation:
Extending support to other languages like Spanish or French.
Usage
- Clone this repository:
git clone <repository-link> - Install required libraries:
pip install -r requirements.txt - Run the training scripts:
python Seq_to_seq_code.ipynb python LSTM_code.ipynb - Input your test sentences and generate translations.
Thanks for exploring my project! ๐ Feel free to fork the repo, try it out, and share your feedback. ๐