--- license: apache-2.0 tags: - generated_from_trainer - text-generation - opt - non-commercial widget: - text: "If you could live anywhere, where would it be? peter szemraj:" example_title: "live anywhere" - text: "What would you sing at Karaoke night? peter szemraj:" example_title: "Karaoke" - text: "If you could hire someone to help you, would it be with cleaning, cooking, or yard work? peter szemraj:" example_title: "help" - text: "What form of public transportation do you prefer? (air, boat, train, bus, car, etc.) peter szemraj:" example_title: "transportation" - text: "What's your favorite zoo animal? peter szemraj:" example_title: "animal" - text: "Do you like or dislike surprises? Why or why not? peter szemraj:" example_title: "surprises" - text: "What celebrity would you like to meet at Starbucks for a cup of coffee? peter szemraj:" example_title: "celebrity " inference: parameters: min_length: 2 max_length: 64 length_penalty: 0.7 temperature: 0.65 no_repeat_ngram_size: 2 top_k: 20 do_sample: True repetition_penalty: 4.5 --- # pszemraj/opt-peter-2.7B This model is a fine-tuned version of [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) on about 80k whatsapp/text messages (mine). Please use responsibly :) ## Model description - Exploring to see how OPT does in terms of dialogue/conversational applications :) - Seems to do a lot better than GPT-Neo with similar training parameters ## Intended uses & limitations > The base model has a custom license which propogates to this one. Most importantly, it cannot be used commercially. Read more here: [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) - the model is probably too large to use via API here. Use in Python with GPU RAM / CPU RAM > 12 gb. - alternatively, you can message [a bot on telegram](http://t.me/GPTPeter_bot) where I test LLMs for dialogue generation - **any statements or claims made by this model do not reflect actual claims/statements by me.** Keep in mind it is a _fine-tuned_ version of the model on my data, so things from pre-training are also present in outputs. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 4e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.01 - num_epochs: 3 ### Training results ### Framework versions - Transformers 4.19.2 - Pytorch 1.10.0+cu113 - Datasets 2.2.2 - Tokenizers 0.12.1