Text Generation
Transformers
GGUF
PyTorch
nemotron_h
nvidia
conversational

Just asking

#1
by ToTully - opened

First of all, thank you for quantizing the model. what's the minimum requirement to run certain quantizations, such as q4_k_m?

For the q4_k_m version, the ideal is around 8~9GB RAM, but you can add another 2 GB RAM to handle a 4k~8k context.

Sign up or log in to comment