haoranxu commited on
Commit
924fb7e
1 Parent(s): 5abb78d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -7
README.md CHANGED
@@ -5,17 +5,24 @@ language:
5
  ---
6
  Our bibert-ende is a bilingual English-German Language Model. Please check out our EMNLP 2021 paper "[BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation](https://arxiv.org/abs/2109.04588)" for more details.
7
  ```
8
- @misc{xu2021bert,
9
- title={BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation},
10
- author={Haoran Xu and Benjamin Van Durme and Kenton Murray},
11
- year={2021},
12
- eprint={2109.04588},
13
- archivePrefix={arXiv},
14
- primaryClass={cs.CL}
 
 
 
 
 
 
15
  }
16
  ```
17
  # Download
18
 
 
19
  ```
20
  from transformers import BertTokenizer, AutoModel
21
  tokenizer = BertTokenizer.from_pretrained("jhu-clsp/bibert-ende")
 
5
  ---
6
  Our bibert-ende is a bilingual English-German Language Model. Please check out our EMNLP 2021 paper "[BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation](https://arxiv.org/abs/2109.04588)" for more details.
7
  ```
8
+ @inproceedings{xu-etal-2021-bert,
9
+ title = "{BERT}, m{BERT}, or {B}i{BERT}? A Study on Contextualized Embeddings for Neural Machine Translation",
10
+ author = "Xu, Haoran and
11
+ Van Durme, Benjamin and
12
+ Murray, Kenton",
13
+ booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
14
+ month = nov,
15
+ year = "2021",
16
+ address = "Online and Punta Cana, Dominican Republic",
17
+ publisher = "Association for Computational Linguistics",
18
+ url = "https://aclanthology.org/2021.emnlp-main.534",
19
+ pages = "6663--6675",
20
+ abstract = "The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation (NMT) systems. However, proposed methods for incorporating pre-trained models are non-trivial and mainly focus on BERT, which lacks a comparison of the impact that other pre-trained models may have on translation performance. In this paper, we demonstrate that simply using the output (contextualized embeddings) of a tailored and suitable bilingual pre-trained language model (dubbed BiBERT) as the input of the NMT encoder achieves state-of-the-art translation performance. Moreover, we also propose a stochastic layer selection approach and a concept of a dual-directional translation model to ensure the sufficient utilization of contextualized embeddings. In the case of without using back translation, our best models achieve BLEU scores of 30.45 for En→De and 38.61 for De→En on the IWSLT{'}14 dataset, and 31.26 for En→De and 34.94 for De→En on the WMT{'}14 dataset, which exceeds all published numbers.",
21
  }
22
  ```
23
  # Download
24
 
25
+ Note that tokenizer package is `BertTokenizer` not `AutoTokenizer`.
26
  ```
27
  from transformers import BertTokenizer, AutoModel
28
  tokenizer = BertTokenizer.from_pretrained("jhu-clsp/bibert-ende")