--- license: mit language: - de - en pipeline_tag: translation tags: - transformers - PyTorch - kaggle-dataset - Multi30K --- # Model card for Transformer_de_en_multi30K ## Model Description This project contains my work on building a transformer from scratch for an German-to-English translation.
This project uses pytorch-original-transformer work to understand the inner workings of the transformer and how to build it from scratch. Along with the implementation, we are referring to the original paper to study transformers. ## Model Details This model takes the following arguments as represented in the paper. ``` 'dk': key dimensions -> 32, 'dv': value dimensions -> 32, 'h': Number of parallel attention heads -> 8, 'src_vocab_size': source vocabulary size (German) -> 8500, 'target_vocab_size': target vocabulary size (English) -> 6500, 'src_pad_idx': Source pad index -> 2, 'target_pad_idx': Target pad index -> 2, 'num_encoders': Number of encoder modules -> 3, 'num_decoders': Number of decoder modules -> 3, 'dim_multiplier': Dimension multiplier for inner dimensions in pointwise FFN (dff = dk*h*dim_multiplier) -> 4, 'pdropout': Dropout probability in the network -> 0.1, 'lr': learning rate used to train the model -> 0.0003, 'N_EPOCHS': Number of Epochs -> 50, 'CLIP': 1, 'patience': 5 ``` We use Adam Optimizer along with CrossEntropyLoss to train the model. We tested the performance of the model on 1000 held-out test data and observed a Bleu score of 30.8 ## Usage Make sure to clone the repo and use the following code snippet to load the transformer model ```python # torch packages import torch from model.transformer import Transformer import json if __name__ == "__main__": """ Following parameters are for Multi30K dataset """ # Load config containing model input parameters with open('params.json') as json_data: config = json.load(json_data) print(config) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # Instantiate model model = Transformer( config["dk"], config["dv"], config["h"], config["src_vocab_size"], config["target_vocab_size"], config["num_encoders"], config["num_decoders"], config["dim_multiplier"], config["pdropout"], device = device) # Load model weights model.load_state_dict(torch.load('pytorch_transformer_model.pt', map_location=device)) print(model) ``` ### Source code Source code used to train the model is linked in this [github](https://github.com/m-np/pytorch-transformer) ## Resources The following code is derived from the pytorch-original-transformer ``` @misc{Gordić2020PyTorchOriginalTransformer, author = {Gordić, Aleksa}, title = {pytorch-original-transformer}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/gordicaleksa/pytorch-original-transformer}}, } ``` and using the following [blog](https://medium.com/@hunter-j-phillips/putting-it-all-together-the-implemented-transformer-bfb11ac1ddfe)