YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Neural Network-Based Language Model for Next Token Prediction
Overview
This project implements a Neural Network-based language model designed for next token prediction in English and Divehi. The model uses an LSTM architecture to predict the next word in a sequence. The dataset consists of 20,000 samples including English and Divehi texts.
The model is trained using checkpointing and generates text for both languages.
Model Architecture
- Model Type: LSTM (Long Short-Term Memory)
- Embedding Size: 128
- Hidden Units: 256
- Number of Layers: 2
Files in the Repository
Datasets: Datasets for the modelcheckpoint.pth: Checkpoint saved during trainingmain.py: Script used for training and inferenceREADME.md: This file
Training
The model was trained for 20 epochs, and the training and validation losses were monitored. Checkpoints were saved to ensure the model can be resumed at any point.
How to Use
You can load the model and generate text using the following code:
import torch
from model import LSTMLanguageModel # Import your model class
# Load the model
model = LSTMLanguageModel(vocab_size, embedding_dim, hidden_dim, num_layers)
model.load_state_dict(torch.load('model.pth'))
model.eval()
# Generate text
start_text = "Once upon a time"
generated_text = generate_text(model, tokenizer, start_text)
print(generated_text)
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support