license: mit
datasets:
- cc100
language:
- en
pipeline_tag: text-generation
GPT-2 Medium Multi-Exit
Pre-trained language model with identical parameters to gpt2-medium, but with additional language modeling heads ("exits") connected to different layers of the model.
These 12 additional heads (in layers 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24) were trained on the English portion of CC-100 while keeping the original pre-trained model parameters frozen.
The model can be used for the Autocontrastive Decoding text generation approach described in Gera et al. 2023, for early-exiting approaches, or for other algorithms that consider the next-token predictions of different model layers.
Usage
Harnessing the additional language modeling heads requires loading the model using the auto-contrastive-generation library (pip install autocontrastive-gen
).
In a nutshell, the user creates a MultiExitConfiguration
that determines model behavior at training and inference, and then loads the model using the dedicated AutoMultiExitModel
class. After that, the model can be used with the transformers
API like any other model. See the GitHub for detailed usage instructions.
For example, the code below initializes the model to use Autocontrastive Decoding, and then performs text generation in this chosen setting:
from transformers import AutoTokenizer
from autocontrastive_gen.modeling.configuration import MultiExitConfiguration
from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel
# initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12
multi_exit_config = MultiExitConfiguration(use_original_head=False,
contrast_layer_indices=(24, 12))
model = AutoMultiExitModel.from_pretrained("IBM/gpt2-medium-multiexit", multi_exit_config=multi_exit_config)
# perform text generation as usual
tokenizer = AutoTokenizer.from_pretrained("IBM/gpt2-medium-multiexit")
prompt = tokenizer("humpty dumpty sat on", return_tensors='pt')
generated_ids = model.generate(**prompt, max_new_tokens=15)
print(tokenizer.batch_decode(generated_ids))
Citation
Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch. The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers. ACL 2023.
@inproceedings{gera2023autocontrastive,
title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers},
author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal},
booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month={july},
address={Toronto, Canada},
year={2023}
}