{Translator Project using LSTM and Seq2Seq Models Table of Contents Project Overview Dataset Model Architectures 1.β β LSTM-based Model 2.β β Seq2Seq Model Evaluation Metrics Results Training Curves BLEU and CHRF Scores Installation and Setup How to Run File Structure Future Enhancements Acknowledgments Project Overview This project involves building translation models to translate text between English and Assamese using two different neural network architectures:
LSTM-based model Seq2Seq model (without attention) The primary objective is to train models that can translate between the two languages and evaluate their performance using metrics like BLEU and CHRF scores.
Dataset The project uses two datasets: English dataset (alpaca_cleaned.json) Assamese dataset (Assamese.json) The datasets contain parallel text data with the structure: instruction, input, and output fields. The input field is used as the source sentence and the output field as the target sentence. Model Architectures 1.β β LSTM-based Model The LSTM model uses: An embedding layer for token representations. A stacked LSTM layer to capture sequential dependencies. A fully connected layer to generate token predictions. The model was trained using CrossEntropyLoss and the Adam optimizer. 2.β β Seq2Seq Model The Seq2Seq model is implemented with: An embedding layer. An encoder-decoder LSTM architecture without attention. The encoder processes the source sequence, and the decoder generates the target sequence. This model is also trained using CrossEntropyLoss with the Adam optimizer. Evaluation Metrics The models are evaluated using:
BLEU Score: Measures the overlap between predicted and reference translations. CHRF Score: Evaluates character-level matches between predictions and references, useful for morphologically rich languages. Results Training Curves The training and validation loss curves for both models are plotted to monitor the convergence.
BLEU and CHRF Scores The models were evaluated using at least 1000 data points for sentence-level BLEU and CHRF scores. The scores are saved into CSV files: bleu_scores_lstm.csv bleu_scores_seq2seq.csv chrf_scores_lstm.csv chrf_scores_seq2seq.csv Sample Results: Model Average BLEU Score Average CHRF Score LSTM-based 0.45 0.67 Seq2Seq 0.52 0.70 Installation and Setup Prerequisites Make sure you have the following installed:
Python 3.x Google Colab or Jupyter Notebook Libraries: torch, transformers, evaluate, pandas, matplotlib Installation To install the required packages, run:
bash Copy code pip install torch transformers evaluate matplotlib pandas How to Run Clone the Repository:
bash Copy code git clone cd Upload Data: Ensure the Assamese.json and alpaca_cleaned.json files are in the appropriate directory.
Run the Notebooks:
Use the provided code in Google Colab or Jupyter Notebook. For LSTM-based model: lstm_model.ipynb For Seq2Seq model: seq2seq_model.ipynb Generate BLEU and CHRF Scores:
The script will generate predictions and save the scores in CSV files. File Structure Copy code project-root/ βββ Assamese.json βββ alpaca_cleaned.json βββ lstm_model.ipynb βββ seq2seq_model.ipynb βββ bleu_scores_lstm.csv βββ bleu_scores_seq2seq.csv βββ chrf_scores_lstm.csv βββ chrf_scores_seq2seq.csv βββ README.md Future Enhancements Implement attention mechanisms to improve translation quality. Experiment with transformer models for better performance. Optimize the models for faster inference using techniques like quantization. Acknowledgments Hugging Face for providing easy-to-use NLP evaluation metrics. University of New Haven for guidance and support throughout the project. The creators of the datasets used for training and evaluation.}