add-quantized-gguf-files

#1
NorwAI org
No description provided.
NorwAI org

This PR is analogous to my previous contributions for the NorwAI-Mistral, NorwAI-Mistral-Instruct and NorwAI-Llama. It will add quantized versions of the full model, as GGUF files. These files may be used to run the model on a lower performance machine than otherwise, typically a laptop through Ollama or similar.

Important note: since the 8x7B model is very large, even these quantized models are quite large in size. GGUF-files range in size from 20-45GB, so you'll likely need a quite powerful machine (in terms of RAM in particular, but also potentially GPU) to run this. Remember that Ollama can be used to expose an API to interact with the model (much like OpenAI's API), so as long as you have one machine that's powerful enough and can act like a server (for instance a cloud VM with the necessary ports exposed) you can interact with the model even if the machine you're asking from isn't powerful enough to run the model on its own.

All good on my end, ready for review and then merge :)

espenhk changed pull request status to open
NorLLM-NTNU changed pull request status to merged

Sign up or log in to comment