Conserve formating across translation...

#16
by WCDR - opened

Hi, thanks for this model,

I use it successfully with ctranlate+sentencepiece (c++).

Is there a way to tag the token to add information about the source token that can be retrieved on the target token?

The aim is to preserve the formatting: bold, italics, etc...

Extra question (on the same subject): I've noticed a problem with the country name or surname, the last letter is missing or an extra letter has been added:

If I can mark a word, I can correct the target using the source...

ex: de->fr : Allemagnes > Allemagne
"Berlin ist nicht nur Weltmetropole und die Hauptstadt Deutschlands, sondern auch meine Heimatstadt."
"Berlin n’est pas seulement une métropole mondiale et la capitale de l'Allemagnes, mais aussi ma ville natale. "

ex fr -> en : Leclèr -> Leclère
"À la hauteur de Civita-Vecchia, nous avons perdu ce brave capitaine Leclère."
"At the height of Civita-Vecchia, we lost this brave captain Leclèr."

Thanks,
Regards
WCDR

Sign up or log in to comment