vahn9995's picture
Upload 14 files
68d75c2
---
license: bsd-3-clause
base_model: pszemraj/long-t5-tglobal-base-16384-book-summary
tags:
- generated_from_trainer
model-index:
- name: output
results: []
---
# Model description
This model is a fine-tuned version of [pszemraj/long-t5-tglobal-base-16384-book-summary](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) on a custom sample-size dataset.
The dataset was [kmfoda/booksum](https://huggingface.co/datasets/kmfoda/booksum) fed into GPT3.5-turbo with a finely tuned prompt to output high quality Stable Diffusion prompts.
The small dataset (less than $10 of OpenAI credits) was roughly 15k entries as a proof of concept.
The goal for this model concept was to create a text summarization model that creates decent Stable Diffusion prompts comparable to a human or high-end LLM like GPT-4.
Example generations from an excerpt of Hemingway:
```
this model: village in late summer, river and plain, mountains, pebbled boulders, blue water, troops marching, dusty trees, soldiers marching along road, crops rich with fruit trees, battle in the mountains, artillery flashes, cool nights, highly detailed, dramatic lighting
gpt-4: desert landscape with camel caravan at sunset, nomad tents, sand dunes, oasis, traditional clothing, dramatic lighting, 8k UHD, highly detailed, masterpiece, digital painting, global illumination
```
This is a VERY rough proof-of-concept model that could be greatly improved by a higher quality dataset and possibly different hyperparameters.
## Training procedure
Training was completed over 7 epochs with a modified version of the run_summarization.py Huggingface training script.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 6
- total_train_batch_size: 48
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 7.0
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 2.453 | 0.28 | 30 | 2.0444 |
| 2.2692 | 0.56 | 60 | 1.8970 |
| 2.1485 | 0.84 | 90 | 1.8373 |
| 2.0469 | 1.12 | 120 | 1.8033 |
| 1.9954 | 1.4 | 150 | 1.7762 |
| 1.9778 | 1.68 | 180 | 1.7593 |
| 1.9536 | 1.96 | 210 | 1.7472 |
| 1.8524 | 2.24 | 240 | 1.7306 |
| 1.8438 | 2.52 | 270 | 1.7255 |
| 1.8436 | 2.8 | 300 | 1.7140 |
| 1.7765 | 3.08 | 330 | 1.7049 |
| 1.7537 | 3.36 | 360 | 1.7057 |
| 1.7328 | 3.64 | 390 | 1.6977 |
| 1.723 | 3.92 | 420 | 1.6973 |
| 1.6592 | 4.2 | 450 | 1.7058 |
| 1.6563 | 4.48 | 480 | 1.7034 |
| 1.6443 | 4.76 | 510 | 1.6969 |
| 1.5782 | 5.04 | 540 | 1.6953 |
| 1.509 | 5.32 | 570 | 1.7136 |
| 1.5516 | 5.6 | 600 | 1.7064 |
| 1.558 | 5.88 | 630 | 1.7045 |
| 1.5016 | 6.16 | 660 | 1.7182 |
| 1.5288 | 6.44 | 690 | 1.7111 |
| 1.4665 | 6.72 | 720 | 1.7030 |
### Framework versions
- Transformers 4.36.0.dev0
- Pytorch 2.1.1+cu118
- Datasets 2.15.0
- Tokenizers 0.15.0