jbochi/madlad400-3b-mt · quantized version as app

Jan 8

Hello, thanks for these quantized versions. Can you please give hint on how to implement it into app, for the first time I just want it to be in terminal. where user input is asked for the text to be translated. I was working on offline app that late will be integrated into system built in php y2ii, where user inputs text and gets output in russian and quantized version is the best I have seen, as I have limited gpu memory. would highly apprecitate your help!

sarahai

Jan 8

now this prompt in terminal works quite well

cargo run --example quantized-t5 --release -- --model-id "jbochi/madlad400-3b-mt" --weight-file "model-q4k.gguf" --prompt "<2ru>El silencio acolchado del bosque nevado, donde cada paso deja una huella efímera. El crujido del fuego en la chimenea, una sinfonía de calor y consuelo. El cielo plateado al crepúsculo, donde las estrellas brillan como diamantes esparcidos. El viento helado que muerde las mejillas y hace llorar los ojos. Las ramas desnudas de los árboles, dibujando siluetas esqueléticas contra el cielo gris. El paisaje inmaculado y silencioso, una belleza austera que inspira respeto. Las guirnaldas luminosas que adornan las calles, chispas de alegría en la noche de invierno. El aroma de vino caliente y pan de especias que flota en el aire, promesa de fiestas cálidas. Las risas de los niños que juegan en la nieve, huellas de alegría en el blanco inmaculado." --temperature 0

but I need it to show up "Enter text" textbox and then after translation to russian ("<2ru>") to be attached to the input text

sarahai

Jan 8

One more question, what version allows to input larger texts. I was using quantized version 4 for 3b and it translates up to 128 tokens, so will larger models' wuantized versions translate longer texts

jbochi

Owner Jan 11

Sorry for the late reply. The best way to do this is forking the example from https://github.com/huggingface/candle/tree/main/candle-examples/examples/quantized-t5

I believe all model sizes can output up to 512 tokens, but quality may decrease for longer texts.

FM-1976

Apr 5

ciao,
were you able to run it with llama.cpp?
can you share some code example?