Bilingual Translation Evaluation Script (EN โ KK)
This repository provides an evaluation pipeline for English-to-Kazakh/Russian-to-Kazakh (and vice versa) translation models based on the Gemma3ForCausalLM
architecture from Hugging Face Transformers.
๐ Overview
The script:
- Loads a fine-tuned model and tokenizer
- Performs inference on a FLORES-style test set (
.jsonl
) - Computes BLEU score using NLTK
- Saves predictions and evaluation results into a JSON file
โ๏ธ Configuration
Modify these lines at the top of the script as needed:
SRC_LANG = "en"
TGT_LANG = "kk"
MODEL_PATH = "/path/to/your/model"
TEST_FILE = "/path/to/test_file.jsonl"
OUTPUT_JSON = "/path/to/output_file.jsonl"
MAX_NEW_TOKS = 64
DEVICE = "cuda" # or "cpu"
To specify GPU devices:
export CUDA_VISIBLE_DEVICES=2,3,4,5
โถ๏ธ Run the Script
python eval_blue.py