enable use_cache

tl;dr this makes it a lot faster
- read the below GPT-4 explanation
- in my tests, it does not decrease the quality of summarization

## gpt explanation

The `use_cache` configuration in transformer models is used to cache past hidden states of the transformer layers during decoding, which can speed up the generation process. This is especially useful when generating sequences in an auto-regressive manner (i.e., generating the next token based on all previously generated tokens), which is the case for models like T5.

When `use_cache` is set to `False`, the model can't utilize past hidden states and needs to recompute them at each step, which can slow down the process. So, your observation that models with `use_cache` set to `False` are slower does make sense.

As for changing this value, it is generally safe to set `use_cache` to `True`. However, you must be aware that it could potentially increase your memory usage, as the model would now be storing past hidden states in memory.

Do note that if you change this configuration, you should ideally retrain or at least fine-tune your model. While the model might still work without retraining, the change could potentially affect the model's learned behavior. The model was trained with `use_cache` set to `False`, and during training, it might have learned to rely on recomputing states rather than using cached ones.

That being said, if retraining is not an option, you could try changing `use_cache` to `True` and test your model thoroughly to ensure it still performs as expected.

Files changed (1) hide show

config.json +1 -2

config.json CHANGED Viewed

@@ -1,5 +1,4 @@
 {
-  "_name_or_path": "pszemraj/long-t5-tglobal-base-16384-booksum-V10",
   "architectures": [
     "LongT5ForConditionalGeneration"
   ],
@@ -38,6 +37,6 @@
   "tie_word_embeddings": false,
   "torch_dtype": "float32",
   "transformers_version": "4.20.1",
-  "use_cache": false,
   "vocab_size": 32128
 }

 {
   "architectures": [
     "LongT5ForConditionalGeneration"
   ],
   "tie_word_embeddings": false,
   "torch_dtype": "float32",
   "transformers_version": "4.20.1",
+  "use_cache": true,
   "vocab_size": 32128
 }