OldestSalt/translation_enru
Viewer • Updated • 3.09M • 39
Bimba is almost linear SimulMT model trained with wait-k policy (k = 3, 5, 7, 9, 11) on en-ru translation dataset.
The model has encoder-decoder architecture, where self-attention blocks are Mamba-2 blocks instead. It means that encoder is linear, but cross-attention's input is all outputs of encoder, and this means that complexity of Bimba is O(S * T), which is not exactly linear
Bimba was developed and trained as a part of master's thesis, and I hope that I will continue research in the Linear SimulMT field.
To download Bimba you can clone the GitHub repository and use the HybridMamba2MT class:
from model_classes import HybridMamba2MT
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("OldestSalt/Bimba")
model = HybridMamba2MT.from_pretrained("OldestSalt/Bimba")
Maybe someday I will write here an example of simultaneous translation.
This model was distilled from NLLB-200-1.3B, so Bimba uses its' tokenizer.