Neural Network-Based Language Model for Next Token Prediction

Overview

This project implements a Neural Network-based language model designed for next token prediction in English and Divehi. The model uses an LSTM architecture to predict the next word in a sequence. The dataset consists of 20,000 samples including English and Divehi texts.

The model is trained using checkpointing and generates text for both languages.

Model Architecture

Model Type: LSTM (Long Short-Term Memory)
Embedding Size: 128
Hidden Units: 256
Number of Layers: 2

Files in the Repository

Datasets: Datasets for the model
checkpoint.pth: Checkpoint saved during training
main.py: Script used for training and inference
README.md: This file

Training

The model was trained for 20 epochs, and the training and validation losses were monitored. Checkpoints were saved to ensure the model can be resumed at any point.

How to Use

You can load the model and generate text using the following code:

import torch
from model import LSTMLanguageModel  # Import your model class

# Load the model
model = LSTMLanguageModel(vocab_size, embedding_dim, hidden_dim, num_layers)
model.load_state_dict(torch.load('model.pth'))
model.eval()

# Generate text
start_text = "Once upon a time"
generated_text = generate_text(model, tokenizer, start_text)
print(generated_text)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support