question about quants
#12
by
prudant
- opened
this kind of "LLM" for embeddings can be quantized, by example to AWQ o GPTQ format?
regards!
Indeed, gte embedding models can be quantized to reduce their computational requirements and memory footprint.
can you give me a little info of how get started with that? wich format, library or useful starting poing please !
i am planning to quantize from 4bytes to 2 bytes so that it is under pgvector's 2k limits. https://jkatz05.com/post/postgres/pgvector-scalar-binary-quantization/
I can report back and see if that works