DigitalUmuganda
/

Joeynmt-kin-en

Machine-translation

Model card Files Files and versions Community

Joeynmt-kin-en / README.md

Kleber's picture

Update README.md

f88a24b over 2 years ago

|

2.29 kB

	---
	library_name: JoeyNMT
	task: Machine-translation
	tags:
	- JoeyNMT
	- Machine-translation
	language: rw
	datasets:
	- DigitalUmuganda/kinyarwanda-english-machine-translation-dataset
	widget:
	- text: "Muraho neza, murakaza neza mu Rwanda."
	example_title: "Muraho neza, murakaza neza mu Rwanda."
	---
	# Kinyarwanda-to-English Machine Translation

	This model is a Kinyarwanda-to-English machine translation model, it was built and trained using JoeyNMT framework. The translation model uses transformer encoder-decoder based architecture. It was trained on a 47,211-long English-Kinyarwanda bitext dataset prepared by Digital Umuganda.


	## Model architecture
	Encoder && Decoder
	> Type: Transformer
	Num_layer: 6
	Num_heads: 8
	Embedding_dim: 256
	ff_size: 1024
	Dropout: 0.1
	Layer_norm: post
	Initializer: xavier
	Total params: 12563968

	## Pre-processing

	Tokenizer_type: subword-nmt
	num_merges: 4000
	BPE encoding learned on the bitext, separate vocabularies for each language
	Pretokenizer: None
	No lowercase applied

	## Training
	Optimizer: Adam
	Loss: crossentropy
	Epochs: 30
	Batch_size: 256
	Number of GPUs: 1



	## Evaluation

	Evaluation_metrics: Blue_score, chrf
	Tokenization: None
	Beam_width: 15
	Beam_alpha: 1.0

	## Tools
	* joeyNMT 2.0.0
	* datasets
	* pandas
	* numpy
	* transformers
	* sentencepiece
	* pytorch(with cuda)
	* sacrebleu
	* protobuf>=3.20.1

	## How to train

	[Use the following link for more information](https://github.com/joeynmt/joeynmt)

	## Translation
	To install joeyNMT run:
	```
	$ git clone https://github.com/joeynmt/joeynmt.git
	$ cd joeynmt
	$ pip install . -e
	```

	Interactive translation(stdin):
	```
	$ python -m joeynmt translate args.yaml
	```

	File translation:
	```
	$ python -m joeynmt translate args.yaml < src_lang.txt > hypothesis_trg_lang.txt
	```

	## Accuracy measurement
	Sacrebleu installation:
	```
	$ pip install sacrebleu
	```

	Measurement(bleu_score, chrf):
	```
	$ sacrebleu reference.tsv -i hypothesis.tsv -m bleu chrf
	```

	## To-do

	>* Test the model using different datasets including the jw300
	>* Use the Digital Umuganda dataset on some available State Of The Art(SOTA) models.
	>* Expand the dataset

	## Result
	The following result was obtained using sacrebleu.


	Kinyarwanda-to-English:
	```
	Blue: 79.87
	Chrf: 84.40
	```