LSTM Model for Next Token Prediction (English-Hausa Translation)

Overview

This is an LSTM-based neural network model developed for next token prediction in a translation task from English to Hausa. This model was trained as part of an assignment to demonstrate proficiency in building language models using LSTM architectures.

Assignment Context

This model was created as part of a course assignment, focusing on using LSTM to predict the next token in bilingual English-Hausa translation data. The task was to:

  1. Train an LSTM model.
  2. Evaluate the model with BLEU and ChrF scores.
  3. Push the model to Hugging Face Hub.

Model Architecture

  • Type: LSTM
  • Hidden Dimensions: 256
  • Layers: 2
  • Embedding Dimension: Based on tokenizer vocabulary size
  • Loss Function: Cross-Entropy Loss with padding token ignored
  • Optimizer: Adam with learning rate of 0.001

Training Details

  • Dataset: opus100 English-Hausa dataset
  • Epochs: 10
  • Batch Size: 32
  • Training Loss Progress: Decreasing steadily across epochs
  • Validation Loss: Monitored to prevent overfitting

Evaluation Metrics

Epoch BLEU Score ChrF Score
1 0.0271 15.35
10 0.0998 32.03

The model achieved its best performance with a BLEU score of 0.0998 and a ChrF score of 32.03 after 10 epochs.

Usage

To use the model, you can load it directly with the Hugging Face Transformers library:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("AppalanaiduSaketi/LSTM-model-based-translator")
model = AutoModel.from_pretrained("AppalanaiduSaketi/LSTM-model-based-translator")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support