---
license: mit
language:
- de
- en
pipeline_tag: translation
tags:
- transformers
- PyTorch
- kaggle-dataset
- Multi30K
---
# Model card for Transformer_de_en_multi30K

## Model Description
This project contains my work on building a transformer from scratch for an German-to-English translation. <br>
This project uses <a href = "https://github.com/gordicaleksa/pytorch-original-transformer/tree/main">pytorch-original-transformer</a> 
work to understand the inner workings of the transformer and how to build it from scratch. 
Along with the implementation, we are referring to the <a href = "https://arxiv.org/abs/1706.03762">original paper</a> to study transformers.<be>


## Model Details

This model takes the following arguments as represented in the paper.

```
'dk': key dimensions -> 32,
'dv': value dimensions -> 32,
'h': Number of parallel attention heads -> 8,
'src_vocab_size': source vocabulary size (German) -> 8500,
'target_vocab_size': target vocabulary size (English) -> 6500,
'src_pad_idx': Source pad index -> 2,
'target_pad_idx': Target pad index -> 2,
'num_encoders': Number of encoder modules -> 3,
'num_decoders': Number of decoder modules -> 3,
'dim_multiplier': Dimension multiplier for inner dimensions in pointwise FFN (dff = dk*h*dim_multiplier) -> 4,
'pdropout': Dropout probability in the network -> 0.1,
'lr': learning rate used to train the model -> 0.0003,
'N_EPOCHS': Number of Epochs -> 50,
'CLIP': 1,
'patience': 5
```
We use Adam Optimizer along with CrossEntropyLoss to train the model.

We tested the performance of the model on 1000 held-out test data and observed a Bleu score of 30.8

## Usage

Make sure to clone the repo and use the following code snippet to load the transformer model

```python
# torch packages
import torch
from model.transformer import Transformer
import json

if __name__ == "__main__":
    """
    Following parameters are for Multi30K dataset
    """
    # Load config containing model input parameters
    with open('params.json') as json_data:
        config = json.load(json_data)
    print(config)

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    # Instantiate model
    model = Transformer(
                    config["dk"], 
                    config["dv"], 
                    config["h"],
                    config["src_vocab_size"],
                    config["target_vocab_size"],
                    config["num_encoders"],
                    config["num_decoders"],
                    config["dim_multiplier"], 
                    config["pdropout"],
                    device = device)
    # Load model weights
    model.load_state_dict(torch.load('pytorch_transformer_model.pt', 
                                     map_location=device))
    print(model)
    
```
### Source code

Source code used to train the model is linked in this [github](https://github.com/m-np/pytorch-transformer)

## Resources

The following code is derived from the pytorch-original-transformer 
```
@misc{Gordić2020PyTorchOriginalTransformer,
  author = {Gordić, Aleksa},
  title = {pytorch-original-transformer},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/gordicaleksa/pytorch-original-transformer}},
}
```

and using the following [blog](https://medium.com/@hunter-j-phillips/putting-it-all-together-the-implemented-transformer-bfb11ac1ddfe)