LSTM Model for Next Token Prediction (English-Hausa Translation)
Overview
This is an LSTM-based neural network model developed for next token prediction in a translation task from English to Hausa. This model was trained as part of an assignment to demonstrate proficiency in building language models using LSTM architectures.
Assignment Context
This model was created as part of a course assignment, focusing on using LSTM to predict the next token in bilingual English-Hausa translation data. The task was to:
- Train an LSTM model.
- Evaluate the model with BLEU and ChrF scores.
- Push the model to Hugging Face Hub.
Model Architecture
- Type: LSTM
- Hidden Dimensions: 256
- Layers: 2
- Embedding Dimension: Based on tokenizer vocabulary size
- Loss Function: Cross-Entropy Loss with padding token ignored
- Optimizer: Adam with learning rate of 0.001
Training Details
- Dataset: opus100 English-Hausa dataset
- Epochs: 10
- Batch Size: 32
- Training Loss Progress: Decreasing steadily across epochs
- Validation Loss: Monitored to prevent overfitting
Evaluation Metrics
| Epoch | BLEU Score | ChrF Score |
|---|---|---|
| 1 | 0.0271 | 15.35 |
| 10 | 0.0998 | 32.03 |
The model achieved its best performance with a BLEU score of 0.0998 and a ChrF score of 32.03 after 10 epochs.
Usage
To use the model, you can load it directly with the Hugging Face Transformers library:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("AppalanaiduSaketi/LSTM-model-based-translator")
model = AutoModel.from_pretrained("AppalanaiduSaketi/LSTM-model-based-translator")
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support