Compressed LLMs for nm-vllm
Collection
LLMs compressed using SparseGPT and GPTQ for optimized inference with nm-vllm https://github.com/neuralmagic/nm-vllm
•
16 items
•
Updated
•
5
This repo contains 4-bit Marlin format model files for abacaj's Phi-2 Super
Base Model: microsoft/phi-2
The model uses the same chat template as found in Mistral instruct models:
text = "<|endoftext|>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!<|endoftext|> "
"[INST] Do you have mayonnaise recipes? [/INST]"