For Seq2Seq Model Repository
Filename: README.md
Seq2Seq Model for Next Token Prediction (English-Hausa Translation)
Overview
This is a Seq2Seq model developed for next token prediction in a translation task from English to Hausa. It was created to explore and implement Seq2Seq architecture as part of a course assignment.
Assignment Context
This model was developed as part of an assignment that required:
- Training a Seq2Seq model for next token prediction.
- Comparing performance with an LSTM model using BLEU and ChrF scores.
- Publishing the model on Hugging Face Hub.
Model Architecture
- Type: Seq2Seq (Encoder-Decoder)
- Hidden Dimensions: 256
- Layers: 2
- Embedding Dimension: Based on tokenizer vocabulary size
- Loss Function: Cross-Entropy Loss with padding token ignored
- Optimizer: Adam with learning rate of 0.001
Training Details
- Dataset: opus100 English-Hausa dataset
- Epochs: 10
- Batch Size: 32
- Training Loss Progress: Converging near zero
- Validation Loss: Minimal change due to effective Seq2Seq architecture
Evaluation Metrics
| Epoch | BLEU Score | ChrF Score |
|---|---|---|
| 1 | 0.0998 | 32.03 |
| 10 | 0.0998 | 32.03 |
The Seq2Seq model demonstrated stable performance, achieving a BLEU score of 0.0998 and a ChrF score of 32.03 consistently across epochs.
Usage
To load and use the model in your project, use the following code:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("/AppalanaiduSaketi/LSTM-model-based-translator")
model = AutoModel.from_pretrained("AppalanaiduSaketi/LSTM-model-based-translator")
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support