--- license: apache-2.0 tags: - generated_from_trainer - text-generation - opt - non-commercial - dialogue - chatbot inference: false --- # pszemraj/opt-peter-2.7B This model is a fine-tuned version of [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) on about 80k whatsapp/text messages (mine). Please use responsibly :) Test it out on Google Colab [here](https://colab.research.google.com/gist/pszemraj/26a69775c9d012051396ab5ae980f5c1/example-text-gen-pszemraj-opt-peter-2-7b.ipynb)! ![chatdemo](https://i.imgur.com/1EgQYat.png) ## Model description - Exploring to see how OPT does in terms of dialogue/conversational applications - Seems to do a lot better than GPT-Neo with similar training parameters ## Intended uses & limitations > The base model has a custom license which propogates to this one. Most importantly, it cannot be used commercially. Read more here: [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) - the model is probably too large to use via API here. Use in Python with GPU RAM / CPU RAM > 12 gb, Colab notebook linked above. - alternatively, you can message [a bot on telegram](http://t.me/GPTPeter_bot) where I test LLMs for dialogue generation - **any statements or claims made by this model do not reflect actual claims/statements by me.** Keep in mind it is a _fine-tuned_ version of the model on my data, so things from pre-training are also present in outputs. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters **SESSION ONE** The following hyperparameters were used during training: - learning_rate: 4e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.01 - num_epochs: 3 **SESSION TWO** The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 4 ### Framework versions - Transformers 4.19.2 - Pytorch 1.10.0+cu113 - Datasets 2.2.2 - Tokenizers 0.12.1