--- license: apache-2.0 tags: - generated_from_trainer - text-generation - opt - non-commercial - dialogue - chatbot widget: - text: "If you could live anywhere, where would it be? peter szemraj:" example_title: "live anywhere" - text: "What would you sing at Karaoke night? peter szemraj:" example_title: "Karaoke" - text: "If you could hire someone to help you, would it be with cleaning, cooking, or yard work? peter szemraj:" example_title: "help" - text: "What form of public transportation do you prefer? (air, boat, train, bus, car, etc.) peter szemraj:" example_title: "transportation" - text: "What's your favorite zoo animal? peter szemraj:" example_title: "animal" - text: "Do you like or dislike surprises? Why or why not? peter szemraj:" example_title: "surprises" - text: "What celebrity would you like to meet at Starbucks for a cup of coffee? peter szemraj:" example_title: "celebrity " inference: parameters: min_length: 2 max_length: 64 length_penalty: 0.7 temperature: 0.3 no_repeat_ngram_size: 2 top_k: 20 do_sample: True repetition_penalty: 4.5 --- # pszemraj/opt-peter-1.3B This model is a fine-tuned version of [pszemraj/opt-peter-1.3B-1E](https://huggingface.co/pszemraj/opt-peter-1.3B-1E) on 80k Whatsapp/iMessages (mine). It achieves the following results on the evaluation set, after training for 1 epoch (_on top of the 1E checkpoint linked above_): - eval_loss: 3.4220 - eval_runtime: 954.9678 - eval_samples_per_second: 9.114 - eval_steps_per_second: 2.279 - epoch: 1.0 - step: 1235 ## Model description - Exploring to see how OPT does in terms of dialogue/conversational applications :) - Seems to do a lot better than GPT-Neo with similar training parameters ## Intended uses & limitations - OPT has a license that does not allow for commercial use, see original for details - **any statements or claims made by this model do not reflect actual claims/statements by me** ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 16 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.01 - num_epochs: 2 ### Framework versions - Transformers 4.19.2 - Pytorch 1.11.0+cu113 - Tokenizers 0.12.1