--- license: - other - apache-2.0 library_name: transformers tags: - generated_from_trainer - text-generation - OPT - non-commercial - dialogue - chatbot - ai-msgbot pipeline_tag: text-generation widget: - text: 'If you could live anywhere, where would it be? peter szemraj:' example_title: live anywhere - text: 'What would you sing at Karaoke night? peter szemraj:' example_title: Karaoke - text: 'If you could hire someone to help you, would it be with cleaning, cooking, or yard work? peter szemraj:' example_title: help - text: 'What form of public transportation do you prefer? (air, boat, train, bus, car, etc.) peter szemraj:' example_title: transportation - text: 'What''s your favorite zoo animal? peter szemraj:' example_title: animal - text: 'Do you like or dislike surprises? Why or why not? peter szemraj:' example_title: surprises - text: 'What celebrity would you like to meet at Starbucks for a cup of coffee? peter szemraj:' example_title: 'celebrity ' base_model: facebook/opt-2.7b --- # pszemraj/opt-peter-2.7B Open In Colab This model is a fine-tuned version of [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) on about 80k WhatsApp/text messages (mine). Please use responsibly :) Test it out on Google Colab by clicking the button above. ![chatdemo](https://i.imgur.com/1EgQYat.png) ## Model description - Exploring to see how OPT does in terms of dialogue/conversational applications - Seems to do a lot better than GPT-Neo with similar training parameters - you can create your own digital clone and deploy it leveraging [this repository I am working on](https://github.com/pszemraj/ai-msgbot). ### sharded checkpoint As this model file is 10+ GB, it can impose some constraints with lower RAM runtimes and/or download speeds. To help with this issue, a sharded checkpoint of this model is available [here](https://huggingface.co/pszemraj/opt-peter-2.7B-sharded). The `pszemraj/opt-peter-2.7B-sharded` model can be used as a drop-in replacement for this one for all use cases. ## Intended uses & limitations > The base model has a custom license that propagates to this one. **Most importantly, it cannot be used commercially**. Read more here: [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) - the model is probably too large to use via API here. Use in Python with GPU RAM / CPU RAM > 12 GB, Colab notebook linked above. - alternatively, you can message [a bot on telegram](http://t.me/GPTPeter_bot) where I test LLMs for dialogue generation - **any statements or claims made by this model do not reflect actual claims/statements by me.** Keep in mind it is a _fine-tuned_ version of the model on my data, so things from pre-training are also present in outputs. ## Training and evaluation data WhatsApp & iMessage data were parsed using [ai-msgbot](https://github.com/pszemraj/ai-msgbot) and then fed as a text dataset to the HF trainer. ## Training procedure ### Training hyperparameters **SESSION ONE** The following hyperparameters were used during training: - learning_rate: 4e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.01 - num_epochs: 3 **SESSION TWO** The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 4 ### Framework versions - Transformers 4.19.2 - Pytorch 1.10.0+cu113 - Datasets 2.2.2 - Tokenizers 0.12.1