Bimba

Bimba is almost linear SimulMT model trained with wait-k policy (k = 3, 5, 7, 9, 11) on en-ru translation dataset.

Architecture

The model has encoder-decoder architecture, where self-attention blocks are Mamba-2 blocks instead. It means that encoder is linear, but cross-attention's input is all outputs of encoder, and this means that complexity of Bimba is O(S * T), which is not exactly linear

Bimba inference

Bimba was developed and trained as a part of master's thesis, and I hope that I will continue research in the Linear SimulMT field.

Using

To download Bimba you can clone the GitHub repository and use the HybridMamba2MT class:

from model_classes import HybridMamba2MT
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("OldestSalt/Bimba")
model = HybridMamba2MT.from_pretrained("OldestSalt/Bimba")

Translation

Maybe someday I will write here an example of simultaneous translation.

Tokenizer

This model was distilled from NLLB-200-1.3B, so Bimba uses its' tokenizer.

Downloads last month
175
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train OldestSalt/Bimba