jploski's picture
Update README.md
566a7e5
|
raw
history blame
1.42 kB
metadata
tags:
  - generated_from_trainer
model-index:
  - name: mpt-mini-shakespeare
    results: []

mpt-mini-shakespeare

This model was trained from scratch on https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt.

Model description

The configuration and code is adapted from mosaicml/mpt-7b-storywriter, with configuration parameters changed to make it a very tiny model.

Intended uses & limitations

Intended just to aid debugging efforts of a GGML port of mpt-7b-storywriter.

Training and evaluation data

More information needed

Training procedure

Just use the single tinyshakespeare text file as both training and validation set (splitting into paragraphs).

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 1

Training results

Mediocre, as expected.

Framework versions

  • Transformers 4.28.0
  • Pytorch 2.0.1+cu117
  • Datasets 2.12.0
  • Tokenizers 0.13.3