imatrix

pinned

by Bakanayatsu - opened Mar 16, 2024

Discussion

Bakanayatsu

Mar 16, 2024

Hello, i did imatrix quants for this model! I like this a lot.

https://huggingface.co/Bakanayatsu/Fimbulvetr-Kuro-Lotus-10.7B-GGUF-imatrix

saishf

Owner Mar 17, 2024

Thank you! I'll add it to the top of the card :3

saishf pinned discussion Mar 17, 2024

Anderson452

Mar 17, 2024

I have tried the Imatrix versions of Bakanayatsu, but they seem to be corrupted, when I run them in LM Studio, the program crashes. On the other hand, this model is excellent. In fact, it is the best in its category, of all the ones I have tried. A marvel

saishf

Owner Mar 17, 2024

I have tried the Imatrix versions of Bakanayatsu, but they seem to be corrupted, when I run them in LM Studio, the program crashes. On the other hand, this model is excellent. In fact, it is the best in its category, of all the ones I have tried. A marvel

Are you trying the IQ xs variants?
The IQ xs variants are pretty new and may not be supported by lm studio yet.

saishf

Owner Mar 17, 2024

xs version running in koboldcpp

ahad992

Mar 22, 2024

can i run this model with TensorRT-LLM

saishf

Owner Mar 22, 2024

I don't believe gguf is supported by TensorRT but I'm not completely sure

FP16
FP8
INT8 & INT4 Weight-Only
SmoothQuant
Groupwise quantization (AWQ/GPTQ)
FP8 KV CACHE
INT8 KV CACHE (+ AWQ/per-channel weight-only)
Tensor Parallel
STRONGLY TYPED

This is all that's listed on TensorRTs' GitHub page under support matrix for llama

ahad992

Mar 22, 2024

gotcha.. but this should still be supported by TensorRT right?

https://huggingface.co/saishf/Fimbulvetr-Kuro-Lotus-10.7B

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment