---
license: apache-2.0
library_name: tfhub
language: en
tags:
- text
- bert
- tensorflow
datasets:
- bookcorpus
- wikipedia
---

## Model name: bert_en_cased_L-12_H-768_A-12
## Description adapted from [TFHub](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/4)

# Overview

BERT (Bidirectional Encoder Representations from Transformers) provides dense vector representations for natural language by using a deep, pre-trained neural network with the Transformer architecture. It was originally published by

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: ["BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"](https://arxiv.org/abs/1810.04805), 2018.

This model uses the implementation of BERT from the TensorFlow Models repository on GitHub at [tensorflow/models/official/legacy/bert](https://github.com/tensorflow/models/tree/master/official/legacy/bert). It uses L=12 hidden layers (i.e., Transformer blocks), a hidden size of H=768, and A=12 attention heads. For other model sizes, see the [BERT](https://tfhub.dev/google/collections/bert/1) collection.

The weights of this model are those released by the original BERT authors. This model has been pre-trained for English on the Wikipedia and BooksCorpus. Text inputs have been normalized the "cased" way, meaning that the distinction between lower and upper case as well as accent markers have been preserved. For training, random input masking has been applied independently to word pieces (as in the original BERT paper).

All parameters in the module are trainable, and fine-tuning all parameters is the recommended practice.

# Usage

This SavedModel implements the encoder API for [text embeddings with transformer encoders](https://www.tensorflow.org/hub/common_saved_model_apis/text#transformer-encoders). It expects a dict with three int32 Tensors as input: `input_word_ids`, `input_mask`, and `input_type_ids`.

The separate ***preprocessor** SavedModel at [https://huggingface.co/Dimitre/bert_en_cased_preprocess](https://huggingface.co/Dimitre/bert_en_cased_preprocess) transforms plain text inputs into this format, which its documentation describes in greater detail.

## Basic usage

The simplest way to use this model in the [Keras functional API](https://www.tensorflow.org/guide/keras/functional) is

### Using TF Hub and HF Hub
```
preprocessor_path = snapshot_download(repo_id="Dimitre/bert_en_cased_preprocess")
preprocessor =  KerasLayer(handle=preprocessor_path)
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)
encoder_inputs = preprocessor(text_input)

model_path = snapshot_download(repo_id="Dimitre/bert_en_cased_L-12_H-768_A-12")
encoder =  KerasLayer(handle=model_path, trainable=True)
outputs = encoder(encoder_inputs)

pooled_output = outputs["pooled_output"]      # [batch_size, 768].
sequence_output = outputs["sequence_output"]  # [batch_size, seq_length, 768].
```

### Using [TF Hub fork](https://github.com/dimitreOliveira/hub)
```
preprocessor = pull_from_hub(repo_id="Dimitre/bert_en_cased_preprocess")
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)
encoder_inputs = preprocessor(text_input)

encoder = pull_from_hub(repo_id="Dimitre/bert_en_cased_L-12_H-768_A-12", trainable=True)
outputs = encoder(encoder_inputs)

pooled_output = outputs["pooled_output"]      # [batch_size, 768].
sequence_output = outputs["sequence_output"]  # [batch_size, seq_length, 768].
```

The encoder's outputs are the `pooled_output` to represents each input sequence as a whole, and the `sequence_output` to represent each input token in context. Either of those can be used as input to further model building.

To print pooled_outputs for inspection, the following code can be used:
```
embedding_model = tf.keras.Model(text_input, pooled_output)
sentences = tf.constant(["(your text here)"])
print(embedding_model(sentences))
```

## Advanced topics
The [preprocessor documentation](https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3) explains how to input segment pairs and how to control `seq_length`.

The intermediate activations of all L=12 Transformer blocks (hidden layers) are returned as a Python list: `outputs["encoder_outputs"][i]` is a Tensor of shape `[batch_size, seq_length, 768]` with the outputs of the i-th Transformer block, for `0 <= i < L`. The last value of the list is equal to `sequence_output`.

The preprocessor can be run from inside a callable passed to `tf.data.Dataset.map()` while this encoder stays a part of a larger model that gets trained on that dataset. The Keras input objects for running on preprocessed inputs are
```
encoder_inputs = dict(
    input_word_ids=tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32),
    input_mask=tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32),
    input_type_ids=tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32),
)
```

# Masked Language Model

This SavedModel provides a trainable `.mlm` subobject with predictions for the Masked Language Model task it was originally trained with. This allows advanced users to continue MLM training for fine-tuning to a downstream task. It extends the encoder interface above with a zero-padded tensor of positions in the input sequence for which the `input_word_ids` have been randomly masked or altered. (See the [preprocessor model page](https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3) for how to get the id of the mask token and more.)
```
mlm_inputs = dict(
    input_word_ids=tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32),
    input_mask=tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32),
    input_type_ids=tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32),
    masked_lm_positions=tf.keras.layers.Input(shape=(num_predict,), dtype=tf.int32),
)

encoder = pull_from_hub(repo_id="Dimitre/bert_en_cased_L-12_H-768_A-12")
mlm = hub.KerasLayer(encoder.mlm, trainable=True)
mlm_outputs = mlm(mlm_inputs)
mlm_logits = mlm_outputs["mlm_logits"]  # [batch_size, num_predict, vocab_size]
# ...plus pooled_output, sequence_output and encoder_outputs as above.
```