Spaces:
Runtime error
Runtime error
# Cross-lingual Retrieval for Iterative Self-Supervised Training | |
https://arxiv.org/pdf/2006.09526.pdf | |
## Introduction | |
CRISS is a multilingual sequence-to-sequnce pretraining method where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time. | |
## Requirements: | |
* faiss: https://github.com/facebookresearch/faiss | |
* mosesdecoder: https://github.com/moses-smt/mosesdecoder | |
* flores: https://github.com/facebookresearch/flores | |
* LASER: https://github.com/facebookresearch/LASER | |
## Unsupervised Machine Translation | |
##### 1. Download and decompress CRISS checkpoints | |
``` | |
cd examples/criss | |
wget https://dl.fbaipublicfiles.com/criss/criss_3rd_checkpoints.tar.gz | |
tar -xf criss_checkpoints.tar.gz | |
``` | |
##### 2. Download and preprocess Flores test dataset | |
Make sure to run all scripts from examples/criss directory | |
``` | |
bash download_and_preprocess_flores_test.sh | |
``` | |
##### 3. Run Evaluation on Sinhala-English | |
``` | |
bash unsupervised_mt/eval.sh | |
``` | |
## Sentence Retrieval | |
##### 1. Download and preprocess Tatoeba dataset | |
``` | |
bash download_and_preprocess_tatoeba.sh | |
``` | |
##### 2. Run Sentence Retrieval on Tatoeba Kazakh-English | |
``` | |
bash sentence_retrieval/sentence_retrieval_tatoeba.sh | |
``` | |
## Mining | |
##### 1. Install faiss | |
Follow instructions on https://github.com/facebookresearch/faiss/blob/master/INSTALL.md | |
##### 2. Mine pseudo-parallel data between Kazakh and English | |
``` | |
bash mining/mine_example.sh | |
``` | |
## Citation | |
```bibtex | |
@article{tran2020cross, | |
title={Cross-lingual retrieval for iterative self-supervised training}, | |
author={Tran, Chau and Tang, Yuqing and Li, Xian and Gu, Jiatao}, | |
journal={arXiv preprint arXiv:2006.09526}, | |
year={2020} | |
} | |
``` | |