Bimba

Bimba is almost linear SimulMT model trained with wait-k policy (k = 3, 5, 7, 9, 11) on en-ru translation dataset.

Architecture

The model has encoder-decoder architecture, where self-attention blocks are Mamba-2 blocks instead. It means that encoder is linear, but cross-attention's input is all outputs of encoder, and this means that complexity of Bimba is O(S * T), which is not exactly linear

Bimba was developed and trained as a part of master's thesis, and I hope that I will continue research in the Linear SimulMT field.

Using

To download Bimba you can clone the GitHub repository and use the HybridMamba2MT class:

from model_classes import HybridMamba2MT
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("OldestSalt/Bimba")
model = HybridMamba2MT.from_pretrained("OldestSalt/Bimba")

Translation

Maybe someday I will write here an example of simultaneous translation.

Tokenizer

This model was distilled from NLLB-200-1.3B, so Bimba uses its' tokenizer.

Code: https://github.com/OldestSalt/LinearSimultMT
Paper: Soon (I hope)

Downloads last month: 175

Safetensors

Model size

0.3B params

Tensor type

F32

OldestSalt
/

Bimba

Bimba

Architecture

Using

Translation

Tokenizer

Dataset used to train OldestSalt/Bimba