document-translator / readme.md
mjuvilla's picture
fixed dockerfile to download tikal and build fast_align
deb0ca6

document_translator

Project to translate files using BSC's models while keeping the formatting and style of the original file.

Requirements

python 3.12

fast_align

Clone https://github.com/clab/fast_align, run the compilation commands indicated in the project's readme, place fast_align and atools (.exe if using windows) in this project's root.

fast_align fine-tuning files

I took the 4 files (ca-en.params, ca-en.err, en-ca.params and en-ca.err) from https://huggingface.co/projecte-aina/aina-translator-ca-en/tree/main. Maybe we could automatize the download of these files. For now, place these files in config_folder (defined in main.py).

python requirements

pip install -r requirements.txt

mtuoc_aina_translator

To use this class you also need to be running MTUOC's translation server with the proper translation models. There's also no need to use fastalign on that side since the current project already runs it.

salamandrata7b_translator

Class that uses huggingface's demo.

Docker

docker build -t document-translator . docker run -p 7860:7860 -e HF_TOKEN=your_token_here --rm -it document-translator