--- license: mit datasets: - cc100 language: - en pipeline_tag: text-generation --- # GPT-2 Medium Multi-Exit Pre-trained language model with identical parameters to [gpt2-medium](https://huggingface.co/gpt2-medium), but with additional language modeling heads ("exits") connected to different layers of the model. These 12 additional heads (in layers 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24) were trained on the English portion of [CC-100](https://huggingface.co/datasets/cc100) while keeping the original pre-trained model parameters frozen. The model can be used for the _Autocontrastive Decoding_ text generation approach described in [Gera et al. 2023](https://arxiv.org/abs/2305.01628), for _early-exiting_ approaches, or for other algorithms that consider the next-token predictions of different model layers. ## Usage Harnessing the additional language modeling heads requires loading the model using the [auto-contrastive-generation library](https://github.com/IBM/auto-contrastive-generation) (`pip install autocontrastive-gen`). In a nutshell, the user creates a `MultiExitConfiguration` that determines model behavior at training and inference, and then loads the model using the dedicated `AutoMultiExitModel` class. After that, the model can be used with the `transformers` API like any other model. See the [GitHub](https://github.com/IBM/auto-contrastive-generation) for detailed usage instructions. For example, the code below initializes the model to use _Autocontrastive Decoding_, and then performs text generation in this chosen setting: ```python from transformers import AutoTokenizer from autocontrastive_gen.modeling.configuration import MultiExitConfiguration from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel # initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12 multi_exit_config = MultiExitConfiguration(use_original_head=False, contrast_layer_indices=(24, 12)) model = AutoMultiExitModel.from_pretrained("IBM/gpt2-medium-multiexit", multi_exit_config=multi_exit_config) # perform text generation as usual tokenizer = AutoTokenizer.from_pretrained("IBM/gpt2-medium-multiexit") prompt = tokenizer("humpty dumpty sat on", return_tensors='pt') generated_ids = model.generate(**prompt, max_new_tokens=15) print(tokenizer.batch_decode(generated_ids)) ``` ## Citation Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch. [The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers](https://arxiv.org/abs/2305.01628). ACL 2023. ```bibtex @inproceedings{gera2023autocontrastive, title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers}, author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal}, booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month={july}, address={Toronto, Canada}, year={2023} } ```