# NLP demo software by HyperbeeAI

Copyrights © 2023 Hyperbee.AI Inc. All rights reserved. hello@hyperbee.ai 

### Evaluation

This notebook evaluates the model on the test set with chosen examples, and calculates the BLEU score. A simulation of the ai85 chip implemented in pytorch is used for this purpose. See imported .py modules for further info.

In [1]:
import torch, random
import torch.nn as nn
from torchtext.legacy.datasets import TranslationDataset
from torchtext.legacy.data import Field, BucketIterator
from utils import tokenize_es, tokenize_en, tokenizer_es, tokenizer_en, TRG_PAD_IDX, \
 translate_sentence, calculate_bleu
from models import encoder, decoder, seq2seq
from dataloader import NewsDataset

imported utils.py
NLP demo software by HyperbeeAI. Copyrights © 2023 Hyperbee.AI Inc. All rights reserved. hello@hyperbee.ai

imported layers.py
NLP demo software by HyperbeeAI. Copyrights © 2023 Hyperbee.AI Inc. All rights reserved. hello@hyperbee.ai

imported functions.py
NLP demo software by HyperbeeAI. Copyrights © 2023 Hyperbee.AI Inc. All rights reserved. hello@hyperbee.ai

imported models.py
NLP demo software by HyperbeeAI. Copyrights © 2023 Hyperbee.AI Inc. All rights reserved. hello@hyperbee.ai

imported dataloader.py
NLP demo software by HyperbeeAI. Copyrights © 2023 Hyperbee.AI Inc. All rights reserved. hello@hyperbee.ai



In [2]:
SEED = 1234
random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
BATCH_SIZE = 48

In [3]:
SRC = Field(tokenize = tokenize_es, 
 init_token = tokenizer_es.token_to_id(""), 
 eos_token = tokenizer_es.token_to_id(""), 
 pad_token = tokenizer_es.token_to_id(""),
 unk_token = tokenizer_es.token_to_id(""),
 use_vocab = False,
 batch_first = True)

TRG = Field(tokenize = tokenize_en, 
 init_token = tokenizer_en.token_to_id(""), 
 eos_token = tokenizer_en.token_to_id(""), 
 pad_token = tokenizer_en.token_to_id(""),
 unk_token = tokenizer_en.token_to_id(""),
 use_vocab = False,
 batch_first = True)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
#device = 'cpu'
print("Working with device:", device)

Working with device: cuda


In [4]:
train_data, valid_data, test_data = NewsDataset.splits(exts=('.es', '.en'), fields=(SRC, TRG))
_, _, test_iterator = BucketIterator.splits(
 (train_data, valid_data, test_data),
 batch_size = BATCH_SIZE,
 device = device)

In [5]:
enc = encoder(device)
dec = decoder(device, TRG_PAD_IDX)
model = seq2seq(enc, dec)

In [6]:
trained_checkpoint = "assets/es2en_hw_cp6.pt"
res = model.load_state_dict(torch.load(trained_checkpoint, map_location=device), strict=False);
model.to(device);

In [7]:
print("Example from test data:")
example_idx = 800
src = vars(test_data.examples[example_idx])['src']
trg = tokenizer_en.decode(vars(test_data.examples[example_idx])['trg'], skip_special_tokens=False)
print(f'trg = {trg}')
print("")
translation = translate_sentence(src, SRC, TRG, model, device)
print(f'predicted trg = {translation}')
print("")
src = tokenizer_es.decode(src, skip_special_tokens=False)
print(f'src = {src}')
print("")

Example from test data:
trg = for a relatively poor country like china , real unions could help balance employers ’ power , bringing quality - of - life benefits that outweigh the growth costs .

predicted trg = for a relatively poor country as china , the existence of real unions could help balance employers ’ power , generating higher life benefits than the costs for growth .

src = para un país relativamente pobre como es china , la existencia de sindicatos reales podría ayudar a equilibrar el poder de los empleadores , generando beneficios de calidad de vida mayores que los costes para el crecimiento .



In [8]:
b_score = calculate_bleu(test_data, SRC, TRG, model, device)
print('BLEU score:')
print(b_score)

1it [00:00, 5.08it/s]

Evaluate on bleu:


3998it [14:55, 4.47it/s]
That's 100 lines that end in a tokenized period ('.')
It looks like you forgot to detokenize your test data, which may hurt your score.
If you insist your data is detokenized, or don't care, you can suppress this message with '--force'.


BLEU score:
{'score': 28.35048236992193, 'counts': [57540, 32851, 20648, 13309], 'totals': [100210, 96590, 92970, 89354], 'precisions': [57.41941921963876, 34.01076716016151, 22.209314832741743, 14.894688542202923], 'bp': 1.0, 'sys_len': 100210, 'ref_len': 91115}
