# Train JoeyNMT from Google Drive

Run this notebook to train a JoeyNMT model from train and test data that has already been stored in a google drive folder. This is to allow a model to run after lengthy data preprocessing (such as fuzzy wuzzy) has run. 

## Initial Configuration

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
import os

# EDIT THESE:
source_language = "en"
target_language = "st"
tag = "baseline"

os.environ["src"] = source_language
os.environ["tgt"] = target_language
os.environ["tag"] = tag

# assumes the gdrive path was created by the masakhane starter notebook
os.environ["gdrive_path"] = "/content/drive/My Drive/masakhane/%s-%s-%s" % (source_language, target_language, tag)

In [0]:
!echo $gdrive_path

/content/drive/My Drive/masakhane/en-st-baseline


## Loading Data
Copy data from our drive to Colab.

In [0]:
# copy train files
! cp "$gdrive_path/train.$src" ./
! cp "$gdrive_path/train.$tgt" ./

# copy dev files
! cp "$gdrive_path/dev.$src" ./
! cp "$gdrive_path/dev.$tgt" ./

# copy test files
! cp "$gdrive_path/test.$src" ./
! cp "$gdrive_path/test.$tgt" ./

! ls

dev.en	dev.st	drive sample_data test.en test.st train.en	train.st


In [0]:
! head -n 5 train.$src

Little did I realize that in future years I would spend the major portion of my life continuing the work of these pilgrims by serving as a traveling overseer of Jehovah’s Witnesses .
These were things I never imagined I would be able to get rid of . ”
Ruins of the theater at Ephesus
We were arrested and taken to the police station .
Today , a more far - reaching destruction is looming , one that will bring an end to this entire system of things .


## Install JoeyNMT

In [0]:
# Install JoeyNMT
! git clone https://github.com/joeynmt/joeynmt.git
! cd joeynmt; pip3 install .

Cloning into 'joeynmt'...
remote: Enumerating objects: 149, done.[K
remote: Counting objects: 0% (1/149)[Kremote: Counting objects: 1% (2/149)[Kremote: Counting objects: 2% (3/149)[Kremote: Counting objects: 3% (5/149)[Kremote: Counting objects: 4% (6/149)[Kremote: Counting objects: 5% (8/149)[Kremote: Counting objects: 6% (9/149)[Kremote: Counting objects: 7% (11/149)[Kremote: Counting objects: 8% (12/149)[Kremote: Counting objects: 9% (14/149)[Kremote: Counting objects: 10% (15/149)[Kremote: Counting objects: 11% (17/149)[Kremote: Counting objects: 12% (18/149)[Kremote: Counting objects: 13% (20/149)[Kremote: Counting objects: 14% (21/149)[Kremote: Counting objects: 15% (23/149)[Kremote: Counting objects: 16% (24/149)[Kremote: Counting objects: 17% (26/149)[Kremote: Counting objects: 18% (27/149)[Kremote: Counting objects: 19% (29/149)[Kremote: Counting objects: 20% (30/149)[Kremote: Counting objects: 21% (32/149)[Kremote: Counting objects

# Preprocessing the Data into Subword BPE Tokens

- One of the most powerful improvements for agglutinative languages (a feature of most Bantu languages) is using BPE tokenization [ (Sennrich, 2015) ](https://arxiv.org/abs/1508.07909).

- It was also shown that by optimizing the umber of BPE codes we significantly improve results for low-resourced languages [(Sennrich, 2019)](https://www.aclweb.org/anthology/P19-1021) [(Martinus, 2019)](https://arxiv.org/abs/1906.05685)

- Below we have the scripts for doing BPE tokenization of our data. We use 4000 tokens as recommended by [(Sennrich, 2019)](https://www.aclweb.org/anthology/P19-1021). You do not need to change anything. Simply running the below will be suitable. 

In [0]:
# One of the huge boosts in NMT performance was to use a different method of tokenizing. 
# Usually, NMT would tokenize by words. However, using a method called BPE gave amazing boosts to performance

# Do subword NMT
from os import path

# set number of bpe codes to use
nb_codes = 40000
os.environ["codes"] = str(nb_codes)

# Learn BPEs on the training data.
os.environ["data_path"] = path.join("joeynmt", "data", source_language + target_language) # Herman! 
! subword-nmt learn-joint-bpe-and-vocab --input train.$src train.$tgt -s $codes -o bpe.codes.$codes --write-vocabulary vocab.$src vocab.$tgt

# Apply BPE splits to the development and test data.
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$src < train.$src > train.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$tgt < train.$tgt > train.bpe.$tgt

! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$src < dev.$src > dev.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$tgt < dev.$tgt > dev.bpe.$tgt
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$src < test.$src > test.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$tgt < test.$tgt > test.bpe.$tgt

# Create directory, move everyone we care about to the correct location
! mkdir -p $data_path
! cp train.* $data_path
! cp test.* $data_path
! cp dev.* $data_path
! cp bpe.codes.$codes $data_path
! ls $data_path

# Also move the bpe stuff back to the mounted location in google drive (relevant if running in colab) at gdrive_path
! cp dev.* "$gdrive_path"
! cp bpe.codes.$codes "$gdrive_path"
! ls "$gdrive_path"

# Create that vocab using build_vocab
! sudo chmod 777 joeynmt/scripts/build_vocab.py
! joeynmt/scripts/build_vocab.py joeynmt/data/$src$tgt/train.bpe.$src joeynmt/data/$src$tgt/train.bpe.$tgt --output_path joeynmt/data/$src$tgt/vocab.txt

# Some output
! echo "BPE $tgt Sentences"
! tail -n 5 test.bpe.$tgt
! echo "Combined BPE Vocab"
! tail -n 10 joeynmt/data/$src$tgt/vocab.txt # Herman

bpe.codes.40000 dev.en test.bpe.st train.bpe.en train.st
dev.bpe.en	 dev.st test.en	 train.bpe.st
dev.bpe.st	 test.bpe.en test.st	 train.en
bpe.codes.4000	 dev.en test.bpe.en	 test.st train.st
bpe.codes.40000 dev.st test.bpe.st	 train.bpe.en
dev.bpe.en	 dev.xh test.en	 train.bpe.st
dev.bpe.st	 models test.en-any.en train.en
BPE st Sentences
Ka lebaka leo , ke ile ka tloaela ho se tšepahale .
Ka mor’a hore ke ithute Bibele , ke ile ka tlohela mosebetsi oo , le hoja ke ne ke pata@@ loa hantle .
Ke behetse bara ba ka ba babeli mohlala o motle , ’ me ke khona le ho sebeletsa ka phuthehong .
Ho basebeletsi ba lekhetho le ho batho ba bang bao ke sebetsang le bona , ke tsebahala ke le motho ea tšepahalang . ”
Ruthe o ile a fallela Iseraele moo a neng a tla rapela Molimo oa ’ nete .
Combined BPE Vocab
vern@@
fatsing
zim
itamins
claim@@
leoat@@
Jerusale@@
them@@
elings
parishi@@


In [0]:
# This creates the config file for our JoeyNMT system. It might seem overwhelming so we've provided a couple of useful parameters you'll need to update
# (You can of course play with all the parameters if you'd like!)

name = '%s%s' % (source_language, target_language)
gdrive_path = os.environ["gdrive_path"]

# Create the config
config = """
name: "{name}_transformer"

data:
 src: "{source_language}"
 trg: "{target_language}"
 train: "data/{name}/train.bpe"
 dev: "data/{name}/dev.bpe"
 test: "data/{name}/test.bpe"
 level: "bpe"
 lowercase: False
 max_sent_length: 100
 src_vocab: "data/{name}/vocab.txt"
 trg_vocab: "data/{name}/vocab.txt"

testing:
 beam_size: 5
 alpha: 1.0

training:
 #load_model: "{gdrive_path}/models/{name}_transformer/1.ckpt" # if uncommented, load a pre-trained model from this checkpoint
 random_seed: 42
 optimizer: "adam"
 normalization: "tokens"
 adam_betas: [0.9, 0.999] 
 scheduling: "plateau" # TODO: try switching from plateau to Noam scheduling
 patience: 5 # For plateau: decrease learning rate by decrease_factor if validation score has not improved for this many validation rounds.
 learning_rate_factor: 0.5 # factor for Noam scheduler (used with Transformer)
 learning_rate_warmup: 1000 # warmup steps for Noam scheduler (used with Transformer)
 decrease_factor: 0.7
 loss: "crossentropy"
 learning_rate: 0.0003
 learning_rate_min: 0.00000001
 weight_decay: 0.0
 label_smoothing: 0.1
 batch_size: 4096
 batch_type: "token"
 eval_batch_size: 3600
 eval_batch_type: "token"
 batch_multiplier: 1
 early_stopping_metric: "ppl"
 epochs: 14 # TODO: Decrease for when playing around and checking of working. Around 30 is sufficient to check if its working at all
 validation_freq: 1000 # TODO: Set to at least once per epoch.
 logging_freq: 100
 eval_metric: "bleu"
 model_dir: "models/{name}_transformer"
 overwrite: False # TODO: Set to True if you want to overwrite possibly existing models. 
 shuffle: True
 use_cuda: True
 max_output_length: 100
 print_valid_sents: [0, 1, 2, 3]
 keep_last_ckpts: 3

model:
 initializer: "xavier"
 bias_initializer: "zeros"
 init_gain: 1.0
 embed_initializer: "xavier"
 embed_init_gain: 1.0
 tied_embeddings: True
 tied_softmax: True
 encoder:
 type: "transformer"
 num_layers: 6
 num_heads: 4 # TODO: Increase to 8 for larger data.
 embeddings:
 embedding_dim: 256 # TODO: Increase to 512 for larger data.
 scale: True
 dropout: 0.2
 # typically ff_size = 4 x hidden_size
 hidden_size: 256 # TODO: Increase to 512 for larger data.
 ff_size: 1024 # TODO: Increase to 2048 for larger data.
 dropout: 0.3
 decoder:
 type: "transformer"
 num_layers: 6
 num_heads: 4 # TODO: Increase to 8 for larger data.
 embeddings:
 embedding_dim: 256 # TODO: Increase to 512 for larger data.
 scale: True
 dropout: 0.2
 # typically ff_size = 4 x hidden_size
 hidden_size: 256 # TODO: Increase to 512 for larger data.
 ff_size: 1024 # TODO: Increase to 2048 for larger data.
 dropout: 0.3
""".format(name=name, gdrive_path=os.environ["gdrive_path"], source_language=source_language, target_language=target_language)
with open("joeynmt/configs/transformer_{name}.yaml".format(name=name),'w') as f:
 f.write(config)

In [0]:
# Train the model
# You can press Ctrl-C to stop. And then run the next cell to save your checkpoints! 
!cd joeynmt; python3 -m joeynmt train configs/transformer_$src$tgt.yaml

2020-02-13 12:28:37,724 Hello! This is Joey-NMT.
2020-02-13 12:28:38,902 Total params: 21245696
2020-02-13 12:28:38,904 Trainable parameters: ['decoder.layer_norm.bias', 'decoder.layer_norm.weight', 'decoder.layers.0.dec_layer_norm.bias', 'decoder.layers.0.dec_layer_norm.weight', 'decoder.layers.0.feed_forward.layer_norm.bias', 'decoder.layers.0.feed_forward.layer_norm.weight', 'decoder.layers.0.feed_forward.pwff_layer.0.bias', 'decoder.layers.0.feed_forward.pwff_layer.0.weight', 'decoder.layers.0.feed_forward.pwff_layer.3.bias', 'decoder.layers.0.feed_forward.pwff_layer.3.weight', 'decoder.layers.0.src_trg_att.k_layer.bias', 'decoder.layers.0.src_trg_att.k_layer.weight', 'decoder.layers.0.src_trg_att.output_layer.bias', 'decoder.layers.0.src_trg_att.output_layer.weight', 'decoder.layers.0.src_trg_att.q_layer.bias', 'decoder.layers.0.src_trg_att.q_layer.weight', 'decoder.layers.0.src_trg_att.v_layer.bias', 'decoder.layers.0.src_trg_att.v_layer.weight', 'decoder.layers.0.trg_trg_att.k_l

In [0]:
# Copy the created models from the notebook storage to google drive for persistant storage
!mkdir "$gdrive_path/models/${src}${tgt}_transformer"
!cp -r joeynmt/models/${src}${tgt}_transformer/* "$gdrive_path/models/${src}${tgt}_transformer/"
!cp joeynmt/models/${src}${tgt}_transformer/best.ckpt "$gdrive_path/models/${src}${tgt}_transformer/"

mkdir: cannot create directory ‘/content/drive/My Drive/masakhane/en-st-baseline/models/enst_transformer’: File exists
cp: cannot stat 'joeynmt/models/enst_transformer/*': No such file or directory
cp: cannot stat 'joeynmt/models/enst_transformer/best.ckpt': No such file or directory


In [0]:
# Output our validation accuracy
! cat "$gdrive_path/models/${src}${tgt}_transformer/validations.txt"

Steps: 1000	Loss: 108547.92969	PPL: 70.99663	bleu: 0.96149	LR: 0.00030000	*
Steps: 2000	Loss: 88209.55469	PPL: 31.94299	bleu: 1.72850	LR: 0.00030000	*
Steps: 3000	Loss: 76930.92188	PPL: 20.51273	bleu: 4.70271	LR: 0.00030000	*
Steps: 4000	Loss: 68434.78125	PPL: 14.69351	bleu: 9.21711	LR: 0.00030000	*
Steps: 5000	Loss: 61544.32812	PPL: 11.21016	bleu: 13.85765	LR: 0.00030000	*
Steps: 6000	Loss: 56446.87500	PPL: 9.17650	bleu: 17.98712	LR: 0.00030000	*
Steps: 7000	Loss: 52346.01562	PPL: 7.81157	bleu: 21.53790	LR: 0.00030000	*
Steps: 8000	Loss: 49132.64062	PPL: 6.88551	bleu: 25.71689	LR: 0.00030000	*
Steps: 9000	Loss: 46748.56250	PPL: 6.27013	bleu: 27.71073	LR: 0.00030000	*
Steps: 10000	Loss: 45489.00391	PPL: 5.96754	bleu: 29.18093	LR: 0.00030000	*
Steps: 11000	Loss: 42909.72656	PPL: 5.39271	bleu: 30.87249	LR: 0.00030000	*
Steps: 12000	Loss: 41262.10156	PPL: 5.05484	bleu: 32.44968	LR: 0.00030000	*
Steps: 13000	Loss: 40603.58984	PPL: 4.92580	bleu: 33.14003	LR: 0.00030000	*
Steps: 14000	Loss: 

## Testing from gdrive

In [0]:
! mkdir -p joeynmt/models/${src}${tgt}_transformer/
! cp "$gdrive_path/models/${src}${tgt}_transformer/best.ckpt" "joeynmt/models/${src}${tgt}_transformer/best.ckpt"

In [0]:
# copy test files
! cp "$gdrive_path/test.$src" ./
! cp "$gdrive_path/test.$tgt" ./

In [0]:
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$src < test.$src > test.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$tgt < test.$tgt > test.bpe.$tgt
! cp test.* $data_path

In [0]:
# Test our model
! cd joeynmt; python3 -m joeynmt test "$gdrive_path/models/${src}${tgt}_transformer/config.yaml"

2020-02-16 15:42:59,652 Hello! This is Joey-NMT.
2020-02-16 15:44:07,041 dev bleu: 46.15 [Beam search decoding with beam size = 5 and alpha = 1.0]
2020-02-16 15:44:51,075 test bleu: 41.23 [Beam search decoding with beam size = 5 and alpha = 1.0]


## Testing from Autshumato

In [0]:
! wget https://raw.githubusercontent.com/jasonrobwebster/autshumato-eval-bleu/master/data/processed/translator1.$src.txt
! wget https://raw.githubusercontent.com/jasonrobwebster/autshumato-eval-bleu/master/data/processed/translator1.$tgt.txt
! wget https://raw.githubusercontent.com/jasonrobwebster/autshumato-eval-bleu/master/data/processed/translator2.$tgt.txt
! wget https://raw.githubusercontent.com/jasonrobwebster/autshumato-eval-bleu/master/data/processed/translator3.$tgt.txt
! wget https://raw.githubusercontent.com/jasonrobwebster/autshumato-eval-bleu/master/data/processed/translator4.$tgt.txt

--2020-02-16 17:44:43-- https://raw.githubusercontent.com/jasonrobwebster/autshumato-eval-bleu/master/data/processed/translator1.en.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58665 (57K) [text/plain]
Saving to: ‘translator1.en.txt’


2020-02-16 17:44:43 (7.17 MB/s) - ‘translator1.en.txt’ saved [58665/58665]

--2020-02-16 17:44:48-- https://raw.githubusercontent.com/jasonrobwebster/autshumato-eval-bleu/master/data/processed/translator1.st.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 65568 (64K) [text/plain]
Saving to: ‘transl

In [0]:
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$src < ./translator1.$src.txt > test.bpe.$src
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$tgt < ./translator1.$tgt.txt > translator1.bpe.$tgt
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$tgt < ./translator2.$tgt.txt > translator2.bpe.$tgt
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$tgt < ./translator3.$tgt.txt > translator3.bpe.$tgt
! subword-nmt apply-bpe -c bpe.codes.$codes --vocabulary vocab.$tgt < ./translator4.$tgt.txt > translator4.bpe.$tgt

! head -n 5 test.bpe.$src
! echo ""
! head -n 5 translator1.bpe.$tgt

South African Social Security Agency
Custo@@ mer Care Char@@ ter
Welcome to our Cli@@ ent Char@@ ter
We want you , our cli@@ ent , to judge us according to the standard of service we set out in this char@@ ter .
We will treat you with respect , and give you good service in accordance with the BA@@ THO PE@@ LE principles .

Bo@@ e@@ me@@ di ba Afrika Bor@@ wa ya T@@ shi@@ rel@@ etso ba Set@@ j@@ ha@@ ba
Tumellano ya Tlhokomelo ya Ba@@ reki
Re a o amohela ho Tumellano ya rona ya Ba@@ reki
Re batla hore wena , moreki wa rona , o re hlahlobe ho ya ka maemo a ts@@ he@@ bel@@ etso ao re a hlahisitseng ka hara tumellano ena .
Re tla o sebeletsa ka tlhompho le ho o fa ts@@ he@@ bel@@ etso e lokileng ho k@@ ge@@ ma le maano a BATHO PELE .


In [0]:
! cd joeynmt; python3 -m joeynmt translate "$gdrive_path/models/${src}${tgt}_transformer/config.yaml" < ../test.bpe.$src > ../model.$tgt

2020-02-16 17:46:37,878 Hello! This is Joey-NMT.


In [0]:
! echo "==> Source <=="
! head -n 10 translator1.$src.txt
! echo ""
! head -n 10 *.$tgt.txt
! echo ""
! echo "==> Model <=="
! head -n 10 model.$tgt

==> Source <==
South African Social Security Agency
Customer Care Charter
Welcome to our Client Charter
We want you , our client , to judge us according to the standard of service we set out in this charter .
We will treat you with respect , and give you good service in accordance with the BATHO PELE principles .
As part of our responsibility , we promise to deliver a world class service , and to give you accurate information , advice and assistance for all our services .
Aim of the Charter
This charter tells you what standard of service you can expect from the South African Social Security Agency ( SASSA ) .
SASSA is an extension of a government delivery branch that administers the delivery of social grants to the citizens of South Africa .
We promise

==> translator1.st.txt <==
Boemedi ba Afrika Borwa ya Tshireletso ba Setjhaba
Tumellano ya Tlhokomelo ya Bareki
Re a o amohela ho Tumellano ya rona ya Bareki
Re batla hore wena , moreki wa rona , o re hlahlobe ho ya ka maemo a tshebelet

In [0]:
import re
import codecs

def load_all_translations(lang, translators=['translator1', 'translator2', 'translator3', 'translator4'], proc_dir='.'):
 """Load all autshumato evaluation translations into a dictionary.
 
 Params
 ------
 
 lang (str):
 The ISO code language to load.
 
 Returns
 -------
 
 out (dict):
 A dictionary containing all translated lines from the Autshumato evaluation set
 for the given language. The key corresponds to a translator, and the value is a list
 containing the translation.
 """
 out = {}
 for translator in translators:
 fp = f"{translator}.bpe.{lang}"
 fp = os.path.join(proc_dir, fp)
 with codecs.open(fp, 'r', encoding='utf-8') as f:
 lines = f.readlines()
 # strip the translation of any escape chars or whitespace
 out[translator] = list(map(lambda x: x.strip(), lines))
 out[translator] = [string.replace("@@ ", "") for string in out[translator]]
 return out

In [0]:
all_translations = load_all_translations(target_language)
all_translations['translator1'][0:5]

['Boemedi ba Afrika Borwa ya Tshireletso ba Setjhaba',
 'Tumellano ya Tlhokomelo ya Bareki',
 'Re a o amohela ho Tumellano ya rona ya Bareki',
 'Re batla hore wena , moreki wa rona , o re hlahlobe ho ya ka maemo a tshebeletso ao re a hlahisitseng ka hara tumellano ena .',
 'Re tla o sebeletsa ka tlhompho le ho o fa tshebeletso e lokileng ho kgema le maano a BATHO PELE .']

In [0]:
print(len(all_translations['translator1']) == 500)
print(len(all_translations['translator2']) == 500)
print(len(all_translations['translator3']) == 500)
print(len(all_translations['translator4']) == 500)

True
True
True
True


In [0]:
refs = list(all_translations.values())
with codecs.open(f'model.{target_language}', 'r', 'utf-8') as f:
 sys = f.readlines()
 sys = list(map(lambda x: x.strip(), sys))

print(sys[0:5])
print(refs[0][0:5])

['Mokhatlo oa Tšireletso ea Sechaba oa Afrika Boroa', 'Moreki oa Tlhokomelo', 'Amohela Khakanyo ea Rōna e Tiileng', 'Re batla hore uena , motho eo re mo batlang , u re ahlole ho ea ka tekanyetso ea tšebeletso eo re e behileng tlhokomelong ena ea molao .', 'Re tla u tšoara ka tlhompho , ’ me re u fe tšebeletso e molemo tumellanong le melao - motheo ea BATHO .']
['Boemedi ba Afrika Borwa ya Tshireletso ba Setjhaba', 'Tumellano ya Tlhokomelo ya Bareki', 'Re a o amohela ho Tumellano ya rona ya Bareki', 'Re batla hore wena , moreki wa rona , o re hlahlobe ho ya ka maemo a tshebeletso ao re a hlahisitseng ka hara tumellano ena .', 'Re tla o sebeletsa ka tlhompho le ho o fa tshebeletso e lokileng ho kgema le maano a BATHO PELE .']


In [0]:
import sacrebleu

score = sacrebleu.corpus_bleu(sys, refs).score
print(f"Autshumato Test BLEU: {score}")



Autshumato Test BLEU: 12.182730696079144
