Transformers documentation

BERTology

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.38.2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

BERTology

There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT (that some call “BERTology”). Some good examples of this field are:

In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to help people access the inner representations, mainly adapted from the great work of Paul Michel (https://arxiv.org/abs/1905.10650):

  • accessing all the hidden-states of BERT/GPT/GPT-2,
  • accessing all the attention weights for each head of BERT/GPT/GPT-2,
  • retrieving heads output values and gradients to be able to compute head importance score and prune head as explained in https://arxiv.org/abs/1905.10650.

To help you understand and use these features, we have added a specific example script: bertology.py while extract information and prune a model pre-trained on GLUE.