Technical requirement to use NuExtract-v1.5

#5
by radema - opened

Hi, are there any technical requirements or suggestion to run NuExtract-v1.5? I've tried to use it on few text locally but I've get no result after several minutes even using the provided example.

Hi, are there any technical requirements or suggestion to run NuExtract-v1.5? I've tried to use it on few text locally but I've get no result after several minutes even using the provided example.

do you have problems only with this model? can you run other models (e.g. phi-3.5-mini)?

NuMind org

Hello!

To load the model in bfloat16 you will need ~8GB of memory. For inference then you will need a bit more than this (depending heavily on the size of your inputs). Let's say ~12-16GB total for comfortable use. Using a different inference framework (e.g. vLLM) can help to optimize things further.

If necessary you can also use quantization. For example, if you use 8bit quantization (e.g. with bitsandbytes) you can effectively half the model size while still maintaining close to equivalent performance.

Hi, thanks for your support. It seems that the issue that I had where related to flash-attn and torch specific version compatibility on a Windows machine.
I'm now able to use it even if I still need to understand how to manage efficiently the GPU for batch inferencing.

@liamcripwell are there any plans to offer quantized model weights as well?

NuMind org

We don't currently provide any pre-quantized models, but there are some that have been put up by 3rd parties (e.g. https://huggingface.co/bartowski/NuExtract-v1.5-GGUF). Otherwise using libraries like bitsandbytes with the transformers library can work.

Thanks so much for pointing me to that repo!

Sign up or log in to comment