Can BAAI/bge-reranker-v2-gemma be run quantized?

#3
by dophys - opened

Hello, I'm interested in bge-reranker based on gemma. A question is if this model could be run in a quantized form. This would greatly improve inference efficiency and reduce memory requirements.
I used torch to quantize this model (int8), but fragembedding doesn't seem to support running quantized models. Can anyone give me some guidance?

Sign up or log in to comment