# Quickstart ## Main concepts ## Quick tour: Usage Here are two quick-start examples showcasing a few `Bert` and `GPT2` classes and pre-trained models. See package reference for examples for each model classe. ### BERT example First let's prepare a tokenized input from a text string using `BertTokenizer` ```python import torch from pytorch_transformers import BertTokenizer, BertModel, BertForMaskedLM # OPTIONAL: if you want to have more information on what's happening under the hood, activate the logger as follows import logging logging.basicConfig(level=logging.INFO) # Load pre-trained model tokenizer (vocabulary) tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # Tokenize input text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]" tokenized_text = tokenizer.tokenize(text) # Mask a token that we will try to predict back with `BertForMaskedLM` masked_index = 8 tokenized_text[masked_index] = '[MASK]' assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]'] # Convert token to vocabulary indices indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) # Define sentence A and B indices associated to 1st and 2nd sentences (see paper) segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1] # Convert inputs to PyTorch tensors tokens_tensor = torch.tensor([indexed_tokens]) segments_tensors = torch.tensor([segments_ids]) ``` Let's see how we can use `BertModel` to encode our inputs in hidden-states: ```python # Load pre-trained model (weights) model = BertModel.from_pretrained('bert-base-uncased') # Set the model in evaluation mode to desactivate the DropOut modules # This is IMPORTANT to have reproductible results during evaluation! model.eval() # If you have a GPU, put everything on cuda tokens_tensor = tokens_tensor.to('cuda') segments_tensors = segments_tensors.to('cuda') model.to('cuda') # Predict hidden states features for each layer with torch.no_grad(): # See the models docstrings for the detail of the inputs outputs = model(tokens_tensor, token_type_ids=segments_tensors) # PyTorch-Transformers models always output tuples. # See the models docstrings for the detail of all the outputs # In our case, the first element is the hidden state of the last layer of the Bert model encoded_layers = outputs[0] # We have encoded our input sequence in a FloatTensor of shape (batch size, sequence length, model hidden dimension) assert tuple(encoded_layers.shape) == (1, len(indexed_tokens), model.config.hidden_size) ``` And how to use `BertForMaskedLM` to predict a masked token: ```python # Load pre-trained model (weights) model = BertForMaskedLM.from_pretrained('bert-base-uncased') model.eval() # If you have a GPU, put everything on cuda tokens_tensor = tokens_tensor.to('cuda') segments_tensors = segments_tensors.to('cuda') model.to('cuda') # Predict all tokens with torch.no_grad(): outputs = model(tokens_tensor, token_type_ids=segments_tensors) predictions = outputs[0] # confirm we were able to predict 'henson' predicted_index = torch.argmax(predictions[0, masked_index]).item() predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0] assert predicted_token == 'henson' ``` ### OpenAI GPT-2 Here is a quick-start example using `GPT2Tokenizer` and `GPT2LMHeadModel` class with OpenAI's pre-trained model to predict the next token from a text prompt. First let's prepare a tokenized input from our text string using `GPT2Tokenizer` ```python import torch from pytorch_transformers import GPT2Tokenizer, GPT2LMHeadModel # OPTIONAL: if you want to have more information on what's happening, activate the logger as follows import logging logging.basicConfig(level=logging.INFO) # Load pre-trained model tokenizer (vocabulary) tokenizer = GPT2Tokenizer.from_pretrained('gpt2') # Encode a text inputs text = "Who was Jim Henson ? Jim Henson was a" indexed_tokens = tokenizer.encode(text) # Convert indexed tokens in a PyTorch tensor tokens_tensor = torch.tensor([indexed_tokens]) ``` Let's see how to use `GPT2LMHeadModel` to generate the next token following our text: ```python # Load pre-trained model (weights) model = GPT2LMHeadModel.from_pretrained('gpt2') # Set the model in evaluation mode to desactivate the DropOut modules # This is IMPORTANT to have reproductible results during evaluation! model.eval() # If you have a GPU, put everything on cuda tokens_tensor = tokens_tensor.to('cuda') model.to('cuda') # Predict all tokens with torch.no_grad(): outputs = model(tokens_tensor) predictions = outputs[0] # get the predicted next sub-word (in our case, the word 'man') predicted_index = torch.argmax(predictions[0, -1, :]).item() predicted_text = tokenizer.decode(indexed_tokens + [predicted_index]) assert predicted_text == 'Who was Jim Henson? Jim Henson was a man' ``` Examples for each model class of each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [documentation](#documentation).