English-Chinese Translator with LSTM and Seq-to-Seq Models

Welcome to my project! 🎉 Here, I've built a translator that converts English to Traditional Chinese and vice versa using advanced deep learning models. It’s an exciting dive into the world of natural language processing (NLP) and machine translation.

Project Overview

This project includes:

Seq-to-Seq Model:
A sequence-to-sequence model with encoder-decoder architecture for language translation.
LSTM Model:
An LSTM-based translation model designed to handle sequential data and long-term dependencies.
Evaluation Metrics:
- BLEU Score: Measures the quality of machine translation by comparing model output with reference translations.
- ChrF Score: Focuses on character-level similarity, particularly useful for non-alphabetic languages like Chinese.
Dataset:
A bilingual dataset of 1000+ English-Traditional Chinese sentence pairs, split for training, validation, and testing.

How It Works

Data Preparation:
- Sentences are tokenized and padded to maintain uniformity.
- The dataset is split into training and validation sets.
Model Training:
- Two models are trained:
  - LSTM Model for baseline performance.
  - Seq-to-Seq Model with an attention mechanism for enhanced results.
Translation Process:
- Input a sentence in English or Chinese.
- The model generates a translation in the target language.
Evaluation:
- Use BLEU and ChrF scores to validate model performance.
- Plot training and validation loss curves to monitor learning.

What I Learned

Designing and implementing Seq-to-Seq and LSTM models for translation.
Working with bilingual datasets and tokenizing for both English and Chinese.
Understanding and applying BLEU and ChrF scores to evaluate translation quality.
Managing challenges of long sequences and context switching in language models.

Results

BLEU Score: [Add the result here]
ChrF Score: [Add the result here]

The scores show promising results, with potential for further optimization.

Future Work

Experimenting with Transformers:
Implementing models like BERT or GPT to enhance translation quality.
Expanding Dataset:
Adding more sentence pairs to improve fluency and context handling.
Multi-language Translation:
Extending support to other languages like Spanish or French.

Usage

Clone this repository:
```
git clone <repository-link>
```
Install required libraries:
```
pip install -r requirements.txt
```

Run the training scripts:

python Seq_to_seq_code.ipynb
python LSTM_code.ipynb

Input your test sentences and generate translations.

Thanks for exploring my project! 🌟 Feel free to fork the repo, try it out, and share your feedback. 😊

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support