Edit model card

UPDATE (2023-09-23):

This model is obsolete. Thanks to quantization you can run AI Dungeon 2 Classic (a 1.5B model) under equivalent hardware. See here.


AID-Neo-125M

Model description

This model was inspired by -- and finetuned on the same dataset of -- KoboldAI's GPT-Neo-125M-AID (Mia) model: the AI Dungeon dataset (text_adventures.txt). This was to fix a possible oversight in the original model, which was trained with an unfortunate bug. You could technically consider it a "retraining" of the same model using different software.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0
Downloads last month
253
Safetensors
Model size
176M params
Tensor type
F32
·
U8
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.