--- license: mit datasets: - cc100 language: - en pipeline_tag: text-generation --- # GPT-Neo-125M Multi-Exit Pre-trained language model with identical parameters to [gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m), but with additional language modeling heads ("exits") connected to different layers of the model. These 6 additional heads (in layers 2, 4, 6, 8, 10, 12) were trained on the English portion of [CC-100](https://huggingface.co/datasets/cc100) while keeping the original pre-trained model parameters frozen. The model can be used for the _Autocontrastive Decoding_ text generation approach described in [Gera et al. 2023](https://arxiv.org/abs/2305.01628), for _early-exiting_ approaches, or for other algorithms that consider the next-token predictions of different model layers. ## Usage Harnessing the additional language modeling heads requires loading the model using the [auto-contrastive-generation library](https://github.com/IBM/auto-contrastive-generation) (`pip install autocontrastive-gen`). In a nutshell, the user creates a `MultiExitConfiguration` that determines model behavior at training and inference, and then loads the model using the dedicated `AutoMultiExitModel` class. After that, the model can be used with the `transformers` API like any other model. See the [GitHub](https://github.com/IBM/auto-contrastive-generation) for detailed usage instructions. For example, the code below initializes the model to use _Autocontrastive Decoding_, and then performs text generation in this chosen setting: ```python from transformers import AutoTokenizer from autocontrastive_gen.modeling.configuration import MultiExitConfiguration from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel # initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12 multi_exit_config = MultiExitConfiguration(use_original_head=False, contrast_layer_indices=(24, 12)) model = AutoMultiExitModel.from_pretrained("IBM/gpt-neo-125m-multiexit", multi_exit_config=multi_exit_config) # perform text generation as usual tokenizer = AutoTokenizer.from_pretrained("IBM/gpt-neo-125m-multiexit") prompt = tokenizer("humpty dumpty sat on", return_tensors='pt') generated_ids = model.generate(**prompt, max_new_tokens=15) print(tokenizer.batch_decode(generated_ids)) ``` ## Citation Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch. [The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers](https://arxiv.org/abs/2305.01628). ACL 2023. ```bibtex @inproceedings{gera2023autocontrastive, title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers}, author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal}, booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month={july}, address={Toronto, Canada}, year={2023} } ```