Walter-Klaus
/

Llama-3.1-Minitron-4B-Chat-Q4_K_M-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Walter-Klaus commited on Sep 4, 2024

Commit

499e994

·

verified ·

1 Parent(s): 94203ae

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -13,6 +13,8 @@ tags:
 ---
 # Walter-Klaus/Llama-3.1-Minitron-4B-Chat-Q4_K_M-GGUF
 This model was converted to GGUF format from [`rasyosef/Llama-3.1-Minitron-4B-Chat`](https://huggingface.co/rasyosef/Llama-3.1-Minitron-4B-Chat) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/rasyosef/Llama-3.1-Minitron-4B-Chat) for more details on the model.

 ---
 # Walter-Klaus/Llama-3.1-Minitron-4B-Chat-Q4_K_M-GGUF
+Please note: As of now (Sept 4, 2024) it seems that the Minitron modified model architecture is not yet supported by llama-cpp and any tool, which is based on llama.cpp, e.g. Gpt4All, KoboldCpp, LM Studio or llama-cli.
 This model was converted to GGUF format from [`rasyosef/Llama-3.1-Minitron-4B-Chat`](https://huggingface.co/rasyosef/Llama-3.1-Minitron-4B-Chat) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/rasyosef/Llama-3.1-Minitron-4B-Chat) for more details on the model.