Spaces:
Runtime error
Runtime error
## Instructions to run on Google cloud TPUs | |
Before starting these steps, make sure to prepare the dataset (normalization -> bpe -> .. -> binarization) following the steps in indicTrans workflow or do these steps on a cpu instance before launching the tpu instance (to save time and costs) | |
### Creating TPU instance | |
- Create a cpu instance on gcp with `torch-xla` image like: | |
```bash | |
gcloud compute --project=${PROJECT_ID} instances create <name for your instance> \ | |
--zone=<zone> \ | |
--machine-type=n1-standard-16 \ | |
--image-family=torch-xla \ | |
--image-project=ml-images \ | |
--boot-disk-size=200GB \ | |
--scopes=https://www.googleapis.com/auth/cloud-platform | |
``` | |
- Once the instance is created, Launch a Cloud TPU (from your cpu vm instance) using the following command (you can change the `accelerator_type` according to your needs): | |
```bash | |
gcloud compute tpus create <name for your TPU> \ | |
--zone=<zone> \ | |
--network=default \ | |
--version=pytorch-1.7 \ | |
--accelerator-type=v3-8 | |
``` | |
(or) | |
Create a new tpu using the GUI in https://console.cloud.google.com/compute/tpus and make sure to select `version` as `pytorch 1.7`. | |
- Once the tpu is launched, identify its ip address: | |
```bash | |
# you can run this inside cpu instance and note down the IP address which is located under the NETWORK_ENDPOINTS column | |
gcloud compute tpus list --zone=us-central1-a | |
``` | |
(or) | |
Go to https://console.cloud.google.com/compute/tpus and note down ip address for the created TPU from the `interal ip` column | |
### Installing Fairseq, getting data on the cpu instance | |
- Activate the `torch xla 1.7` conda environment and install necessary libs for IndicTrans (**Excluding FairSeq**): | |
```bash | |
conda activate torch-xla-1.7 | |
pip install sacremoses pandas mock sacrebleu tensorboardX pyarrow | |
``` | |
- Configure environment variables for TPU: | |
```bash | |
export TPU_IP_ADDRESS=ip-address; \ | |
export XRT_TPU_CONFIG="tpu_worker;0;$TPU_IP_ADDRESS:8470" | |
``` | |
- Download the prepared binarized data for FairSeq | |
- Clone the latest version of Fairseq (this supports tpu) and install from source. There is an [issue](https://github.com/pytorch/fairseq/issues/3259) with the latest commit and hence we use a different commit to install from source (This may have been fixed in the latest master but we have not tested it.) | |
```bash | |
git clone https://github.com/pytorch/fairseq.git | |
git checkout da9eaba12d82b9bfc1442f0e2c6fc1b895f4d35d | |
pip install --editable ./ | |
``` | |
- Start TPU training | |
```bash | |
# this is for using all tpu cores | |
export MKL_SERVICE_FORCE_INTEL=1 | |
fairseq-train {expdir}/exp2_m2o_baseline/final_bin \ | |
--max-source-positions=200 \ | |
--max-target-positions=200 \ | |
--max-update=1000000 \ | |
--save-interval=5 \ | |
--arch=transformer \ | |
--attention-dropout=0.1 \ | |
--criterion=label_smoothed_cross_entropy \ | |
--source-lang=SRC \ | |
--lr-scheduler=inverse_sqrt \ | |
--skip-invalid-size-inputs-valid-test \ | |
--target-lang=TGT \ | |
--label-smoothing=0.1 \ | |
--update-freq=1 \ | |
--optimizer adam \ | |
--adam-betas '(0.9, 0.98)' \ | |
--warmup-init-lr 1e-07 \ | |
--lr 0.0005 \ | |
--warmup-updates 4000 \ | |
--dropout 0.2 \ | |
--weight-decay 0.0 \ | |
--tpu \ | |
--distributed-world-size 8 \ | |
--max-tokens 8192 \ | |
--num-batch-buckets 8 \ | |
--tensorboard-logdir {expdir}/exp2_m2o_baseline/tensorboard \ | |
--save-dir {expdir}/exp2_m2o_baseline/model \ | |
--keep-last-epochs 5 \ | |
--patience 5 | |
``` | |
**Note** While training, we noticed that the training was slower on tpus, compared to using multiple GPUs, we have documented some issues and [filed an issue](https://github.com/pytorch/fairseq/issues/3317) at fairseq repo for advice. We'll update this section as we learn more about efficient training on TPUs. Also feel free to open an issue/pull request if you find a bug or know an efficient method to make code train faster on tpus. | |