GPT_From_Scratch / README.md
mkthoma's picture
readme update
dd785f5
metadata
title: GPT From Scratch
emoji: 
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 4.4.0
app_file: app.py
pinned: false
license: mit

GPT from scratch

This repo contains code to train a GPT from scratch. The dataset is taken from the RedPajama 1 trillion data. Only samples from this are taken and used for the training purposes. The implementation of the transformer is similar to the LitGPT.

The trained model has a parameter count of about 160M. The final training loss was found to be 3.2154.

image

The training details can be found in the attached notebooks. The initial training was stopped when the loss was around 4.

image

Using the checkpoint, the training was resumed and stopped when it went below 3.5.

Github link - https://github.com/mkthoma/gpt_from_scratch