|
# Neural Network-Based Language Model for Next Token Prediction |
|
|
|
## Overview |
|
This project implements a neural network-based language model designed for next-token prediction using two languages: English and Icelandic. The model is built without the use of transformer or encoder-decoder architectures, focusing instead on traditional neural network techniques. |
|
|
|
###Table of Contents: |
|
|
|
Installation |
|
Usage |
|
Model Architecture |
|
Training |
|
Text Generation |
|
Results |
|
License |
|
|
|
###Installation: |
|
|
|
To run this project, you need to have Python installed along with the following libraries: |
|
pip install torch numpy pandas huggingface_hub |
|
|
|
###Usage |
|
|
|
Upload or open the notebook in Google Colab. |
|
Navigate to Google Colab and open the notebook. |
|
Run all cells sequentially to load the models, configure the text generation process, and view outputs. |
|
Modify the seed text to generate different text sequences. You can provide your own input to see how the model generates text in response. |
|
|
|
##Model Architecture |
|
|
|
The model used in this notebook is based on Recurrent Neural Networks (RNN) or Long Short-Term Memory (LSTM) networks, which are commonly used for sequence prediction tasks like text generation. The architecture consists of: |
|
Embedding Layer: Converts input words into dense vectors of fixed size. |
|
LSTM/GRU Layers: These handle sequential data and maintain long-range dependencies between words. |
|
Dense Output Layer: Generates predictions for the next word in the sequence. |
|
This architecture helps the model learn from previous words and predict the next one in the sequence effectively. |
|
|
|
##Training |
|
|
|
The model used for this notebook is pre-trained, meaning it has already been trained on a large dataset for both English and Icelandic text generation. |
|
However, if you wish to re-train the model or fine-tune it for your own data, you can do so by adding a training loop in the notebook. Ensure you have a dataset and adjust the training parameters (like batch size, epochs, and learning rate). |
|
Here’s a basic outline of how the training could be set up: |
|
Preprocess your text data into sequences. |
|
Split the data into training and validation sets. |
|
Train the model using the sequences, optimizing for the loss function. |
|
Save the model after training for future use. |
|
|
|
##Text Generation |
|
|
|
In this notebook, the model is used for text generation. It works by taking an initial seed text (a starting sequence) and predicting the next word repeatedly to generate a longer sequence. |
|
|
|
Steps for text generation: |
|
Provide a seed text in English or Icelandic. |
|
Run the code cell to generate text based on the provided input. |
|
The output will be displayed as a continuation of the seed text. |
|
|
|
Example: |
|
English Seed Text: "Today is a good day" |
|
Generated Output: "Today is a good day to explore the new opportunities available." |
|
Icelandic Seed Text: "þetta mun auka" |
|
Generated Output: "þetta mun auka áberandi í utan eins og vieigandi..." |
|
|
|
##License |
|
|
|
License |
|
This notebook is available for educational purposes. Feel free to modify and use it as needed for your own experiments or projects. However, the pre-trained models and certain dependencies may have their own licenses, so ensure you comply with their usage policies. |
|
|
|
##Results |
|
|
|
The training curves for both loss and validation loss are provided in the submission. |
|
The model's performance is evaluated based on the generated text quality and perplexity score during training. |