German translation - an upside and 2 huge problems in actually using this

#1
by cmp-nct - opened

I compared the quality for german translations with models up to 70B in size, basically every german fine tune available, all the Falcon variants etc.
The upside:
The quality of the translation is flawless. Every single model I tested makes grammar mistakes and uses strange words sometimes.
Only GPT4 appears to be good, but even GPT4 makes grammar mistakes sometimes.

Now the problems that appear to make this un-useable for now:

  1. The inference speed is roughly 1/10 of what it should be compared to other models.
    When looking at a 13B model on llama.cpp/ggml on a 4090 (in any quantization that fits) we see a speed of 60-80 tokens/sec generation.
    However, this model is not supported - the best I could do was bitsandbytes on cuda at 4 bit inference - it's at 8 tokens/second generation and appears to be as slow in prompt ingestion (which would run at 2000/sec in ggml)

  2. Context !
    When feeding a paragraph to this model it just responds with gibberish repeated phrases from the paragraph.
    When splitting the paragraph into sentences, prepending each with <2de> the output is of high quality but it does nott see the previous sentences anymore so the translation can become totally wrong.
    For example: "A display case is green. The case is filled"
    The context is clear, it's a display case! But the second sentence suddenls transforms into a "court case"

This would need to be integrated into llama.cpp, then the speed likely could approach 100 tokens/sec on the 10B model, also comes with a superior quantization.
For context I don't know, maybe there is a solution to add previous generations into it's total context so it understands what's going on.

Sign up or log in to comment