Transformers
mpt
Composer
MosaicML
llm-foundry
text-generation-inference

Poor performance

#5
by HAvietisov - opened

All quantized variations, as well as fp16, perform extremely poorly on extractive question answering, when inference ran via ctransformers.
Responses are extremely different on avx and avx2 engines, given the same promt, and in general, really bad and often don't contain answer to the question at all, contrary to un-quantized MPT-7B-instruct or quantized MPT-30B-instruct

Sign up or log in to comment