--- license: bsd-3-clause base_model: pszemraj/long-t5-tglobal-base-16384-book-summary tags: - generated_from_trainer model-index: - name: output results: [] --- # Model description This model is a fine-tuned version of [pszemraj/long-t5-tglobal-base-16384-book-summary](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) on a custom sample-size dataset. The dataset was [kmfoda/booksum](https://huggingface.co/datasets/kmfoda/booksum) fed into GPT3.5-turbo with a finely tuned prompt to output high quality Stable Diffusion prompts. The small dataset (less than $10 of OpenAI credits) was roughly 15k entries as a proof of concept. The goal for this model concept was to create a text summarization model that creates decent Stable Diffusion prompts comparable to a human or high-end LLM like GPT-4. Example generations from an excerpt of Hemingway: ``` this model: village in late summer, river and plain, mountains, pebbled boulders, blue water, troops marching, dusty trees, soldiers marching along road, crops rich with fruit trees, battle in the mountains, artillery flashes, cool nights, highly detailed, dramatic lighting gpt-4: desert landscape with camel caravan at sunset, nomad tents, sand dunes, oasis, traditional clothing, dramatic lighting, 8k UHD, highly detailed, masterpiece, digital painting, global illumination ``` This is a VERY rough proof-of-concept model that could be greatly improved by a higher quality dataset and possibly different hyperparameters. ## Training procedure Training was completed over 7 epochs with a modified version of the run_summarization.py Huggingface training script. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 6 - total_train_batch_size: 48 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 7.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 2.453 | 0.28 | 30 | 2.0444 | | 2.2692 | 0.56 | 60 | 1.8970 | | 2.1485 | 0.84 | 90 | 1.8373 | | 2.0469 | 1.12 | 120 | 1.8033 | | 1.9954 | 1.4 | 150 | 1.7762 | | 1.9778 | 1.68 | 180 | 1.7593 | | 1.9536 | 1.96 | 210 | 1.7472 | | 1.8524 | 2.24 | 240 | 1.7306 | | 1.8438 | 2.52 | 270 | 1.7255 | | 1.8436 | 2.8 | 300 | 1.7140 | | 1.7765 | 3.08 | 330 | 1.7049 | | 1.7537 | 3.36 | 360 | 1.7057 | | 1.7328 | 3.64 | 390 | 1.6977 | | 1.723 | 3.92 | 420 | 1.6973 | | 1.6592 | 4.2 | 450 | 1.7058 | | 1.6563 | 4.48 | 480 | 1.7034 | | 1.6443 | 4.76 | 510 | 1.6969 | | 1.5782 | 5.04 | 540 | 1.6953 | | 1.509 | 5.32 | 570 | 1.7136 | | 1.5516 | 5.6 | 600 | 1.7064 | | 1.558 | 5.88 | 630 | 1.7045 | | 1.5016 | 6.16 | 660 | 1.7182 | | 1.5288 | 6.44 | 690 | 1.7111 | | 1.4665 | 6.72 | 720 | 1.7030 | ### Framework versions - Transformers 4.36.0.dev0 - Pytorch 2.1.1+cu118 - Datasets 2.15.0 - Tokenizers 0.15.0