LSTM Model for Next Token Prediction (English-Hausa Translation)

Overview

This is an LSTM-based neural network model developed for next token prediction in a translation task from English to Hausa. This model was trained as part of an assignment to demonstrate proficiency in building language models using LSTM architectures.

Assignment Context

This model was created as part of a course assignment, focusing on using LSTM to predict the next token in bilingual English-Hausa translation data. The task was to:

Train an LSTM model.
Evaluate the model with BLEU and ChrF scores.
Push the model to Hugging Face Hub.

Model Architecture

Type: LSTM
Hidden Dimensions: 256
Layers: 2
Embedding Dimension: Based on tokenizer vocabulary size
Loss Function: Cross-Entropy Loss with padding token ignored
Optimizer: Adam with learning rate of 0.001

Training Details

Dataset: opus100 English-Hausa dataset
Epochs: 10
Batch Size: 32
Training Loss Progress: Decreasing steadily across epochs
Validation Loss: Monitored to prevent overfitting

Evaluation Metrics

Epoch	BLEU Score	ChrF Score
1	0.0271	15.35
10	0.0998	32.03

The model achieved its best performance with a BLEU score of 0.0998 and a ChrF score of 32.03 after 10 epochs.

Usage

To use the model, you can load it directly with the Hugging Face Transformers library:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("AppalanaiduSaketi/LSTM-model-based-translator")
model = AutoModel.from_pretrained("AppalanaiduSaketi/LSTM-model-based-translator")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support