mrsteyk's picture
0iq moment - no batching was used
9c6d908
metadata
license:
  - creativeml-openrail-m
language:
  - en
tags:
  - generated_from_trainer
  - text generation
  - pytorch
  - casual-lm
metrics:
  - accuracy
model-index:
  - name: openchatgpt-neox-r1
    results: []

openchatgpt-neox-r1

This model is a fine-tuned version of EleutherAI/pythia-125m-deduped on the openchatgpt safe-r1 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3585
  • Accuracy: 0.9169

Model description

Finetune based on the inner workings of ChatGPT. I won't elaborate on that. You must have a faint idea of how prompt is made for it to be effective.

This is effectively a schizophrenic idea that met the light of day. Practically a collab of 3 students in a virtual shed.

BTW, Pythia is so much better omg.

Intended uses & limitations

Intended uses & limitations fall in line with OpenAI's. Dataset used consists of safe texts (i.e. not highly sexual/erotica type stuff). NSFW version of the dataset is not planned to exist at the moment.

Keep in mind that this is a 125m version of GPT-NeoX (Pythia). My 1050Ti Mobile couldn't even handle that without gradient thingmabobs, 8BitAdam was also used. If anyone knows how to effectively finetune larger models on free colabs - feel free to let me know. Pile tokenizer also has one downside compared to native GPT-2/3 - Assistant is not 1 token, but 2.

Training and evaluation data

Data was split in ratio of 95%/5%. Preproccess included removing mentions of OpenAI wherever it was not deemed appropriete (GPT-2 has one of the appropriete mentions). Whole dataset consists of just shy off 3k input-output pairs. One input has multiple outputs (read as: one message has multiple variants of an answer). <<<1% (3 total) are curated lines (i.e. a huge mistake was spotted that needed corrections). At least 3 lines (<<<1% of line count, but more of byte count) are broken.

Heavy bias on IT.

Training procedure

Input and output were straight up concatenated due to the nature of how ChatGPT works.

EOS was being used after the final separator.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.1311 1.0 1377 1.3116 0.9127
0.6691 2.0 2754 1.2978 0.9160
0.3463 3.0 4131 1.3585 0.9169

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.1+cu116
  • Datasets 2.8.0
  • Tokenizers 0.13.2