Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
s3nh 
posted an update Jan 20
Post
GPU Poor POV: Low Hanging Fruits


Sometimes we had to work with different language than English (what a surprise!) and it can be problematic, because as you may know many algorithms are mainly developed in English.
I was involved in building RAG in Polish language. At first, we need an proper embeddings for Polish language to feed them into lightweight LLM.
Looking through possible solution I become aware that existing/possible models are not accurate enough, and worked much worse than its 'english equivalent'.
First thing that comes to mind is:
Lets become a mad scientist, download all possible data and train model for months to get the proper one.

But there are few cons of this.
- Its computionally heavy
- You are not full time researcher
- you have potential clients who want to use your solution, and they really happy to use it (in optimistic mood).
Here comes the low hanging fruits.
We developed a easier, workable solution. Instead of training new SOTA, we can use translation module like this one:

Helsinki-NLP/opus-mt-pl-en
translate your knowledge base to english, and use proper embedding model accurately.
I converted existing model using ctranslate2,

ct2-transformers-converter --model Helsinki-NLP/opus-mt-pl-en --output_dir opus-mt-pl-en

so making an inference is not heavy (we observe 5 times speedup in compare to original version).

And by indexing knowledge base, we can return answer to LLM in any language. (Indexes of context found in english language are equal to indexes in native language knowledge base).

Of course there are some tweaks required, we have to validate accuracy of the translation.

It was nice episode, we have our work done, there are people who can use it, so additive value exists.
Have a great day and I wish you more effective deploys! <3

Let me know if its interesting for you, i can jump into more details ^^

·

Hi @s3nh
Thanks for sharing this solution. Two quick questions if you don't mind:

  • I've never used ct2, what's the different with this and onnx-runtime for instance when it comes to inference speed?
  • we have to validate accuracy of the translation.

Is this something done offline for a one-time evaluation, or do you do this on the fly for every question?

In this post