File size: 2,417 Bytes

bfb6abe
bfa6d6d
bfb6abe
 
bfa6d6d
 
 
b4d0e1d
 
bfa6d6d
3d50205
bfb6abe
 
71e37d7
bfb6abe
71e37d7
bfb6abe
2d66b0f
 
966ec27
94faa3c
bfb6abe
 
70f6c54
bfa6d6d
bfb6abe
 
 
71e37d7
bfb6abe
2d66b0f
bfa6d6d
 
 
bfb6abe
 
 
 
 
 
 
 
b19b86b
 
bfb6abe
 
 
 
 
 
 
 
 
 
 
 
 
b19b86b
bfb6abe
b19b86b
 
 
 
 
 
 
 
 
 
 
 
bfb6abe

---
license: apache-2.0
tags:
- generated_from_trainer
- text-generation
- opt
- non-commercial
- dialogue
- chatbot

inference: false
---

# pszemraj/opt-peter-2.7B

This model is a fine-tuned version of [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) on about 80k whatsapp/text messages (mine). Please use responsibly :)

Test it out on Google Colab [here](https://colab.research.google.com/gist/pszemraj/26a69775c9d012051396ab5ae980f5c1/example-text-gen-pszemraj-opt-peter-2-7b.ipynb)!

![chatdemo](https://i.imgur.com/1EgQYat.png)

## Model description

- Exploring to see how OPT does in terms of dialogue/conversational applications
- Seems to do a lot better than GPT-Neo with similar training parameters 

## Intended uses & limitations

> The base model has a custom license which propogates to this one. Most importantly, it cannot be used commercially. Read more here: [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) 

- the model is probably too large to use via API here. Use in Python with GPU RAM / CPU RAM > 12 gb, Colab notebook linked above.
  - alternatively, you can message [a bot on telegram](http://t.me/GPTPeter_bot) where I test LLMs for dialogue generation
- **any statements or claims made by this model do not reflect actual claims/statements by me.** Keep in mind it is a _fine-tuned_ version of the model on my data, so things from pre-training are also present in outputs.

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

**SESSION ONE**

The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 3

**SESSION TWO**

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 4


### Framework versions

- Transformers 4.19.2
- Pytorch 1.10.0+cu113
- Datasets 2.2.2
- Tokenizers 0.12.1