config for LM Studio

#6
by DataSoul - opened

I made a q_6_k GGUF and tried it in LM Studio. The translation effect is indeed quite good, there are some articles that other models would fail to translate correctly, but this model can translate them normally.
Unfortunately, my upload speed is too slow to upload the GGUF file, so I uploaded the following configuration files.

The code below can be converted into "LM Studio ALMA-preset-template.json" format for use by LM Studio.

{
"name": "ALMA-preset-example",
"load_params": {
"n_ctx": 4000,
"n_batch": 512,
"rope_freq_base": 0,
"rope_freq_scale": 0,
"n_gpu_layers": 60,
"use_mlock": true,
"main_gpu": 0,
"tensor_split": [
0
],
"seed": -1,
"f16_kv": true,
"use_mmap": true
},
"inference_params": {
"n_threads": 4,
"n_predict": -1,
"top_k": 40,
"top_p": 0.6,
"temp": 0.9,
"repeat_penalty": 1.1,
"input_prefix": "\nEnglish:",
"input_suffix": "\nChinese:",
"antiprompt": [
],
"pre_prompt": "Translate this from English to Chinese.",
"pre_prompt_prefix": "",
"pre_prompt_suffix": "",
"seed": -1,
"tfs_z": 1,
"typical_p": 1,
"repeat_last_n": 64,
"frequency_penalty": 0,
"presence_penalty": 0,
"n_keep": 0,
"logit_bias": {},
"mirostat": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"memory_f16": true,
"multiline_input": false,
"penalize_nl": true
}
}

Another problem is that with the sentence segmentation of the "Immersive Translation" plugin, the translation works well, but when I type a long text of hundreds to thousands of characters in the chat window, the translation is incomplete.

Thanks for converting our model into GGUF version!

The possible reasons of incomplete translations could be:

  • the setting of max length is not large enough (similar issue here: https://github.com/fe1ixxu/ALMA/issues/20)
  • ALMA models are trained with a maximum length of 512 tokens, so it may have some unexpected performance on very long sentences (which we will improve in the future).

Thanks again!

Thanks for converting our model into GGUF version!

The possible reasons of incomplete translations could be:

  • the setting of max length is not large enough (similar issue here: https://github.com/fe1ixxu/ALMA/issues/20)
  • ALMA models are trained with a maximum length of 512 tokens, so it may have some unexpected performance on very long sentences (which we will improve in the future).

Thanks again!

DataSoul changed discussion status to closed

Thanks for your reply! I found a similar option in LM called "Tokens to generate" and tried modifying it, but the effect was not obvious. It is still quite difficult when dealing with long texts. Looking forward to future updates

Due to the limitation of the computing power of my computer equipment, the speed of translating pdf is not enough.

So I tried to make a 7B Q2K quantization file again, although the translation quality has dropped a little, but it is enough.

Now, it shows an amazing translation speed: 80.06 tok/s, which is enough to keep me satisfied!

Prior to that, the Q6K for the 13B was 28.46 tok/s and the Q8 for the 7B was 57.83 tok/s.

Maybe in the pursuit of speed, the 7B is more suitable for me.

As for the problem of long texts, for now, it is completely possible to rely on sentence segmentation to solve it, and this model is very useful for me, thank you very much!

Great! It is nice to know the model is useful to you!!

Sign up or log in to comment