hierholzer
/

Llama-3.1-70B-Instruct-GGUF

Text Generation

llama-3.1-instruct

Model card Files Files and versions Community

hierholzer commited on Aug 2

Commit

8ac9b95

•

1 Parent(s): e899c1b

Create README.md

Files changed (1) hide show

README.md +56 -0

README.md ADDED Viewed

	@@ -0,0 +1,56 @@

+---
+license: apache-2.0
+language:
+- en
+---
+---
+# Model
+Here is a Quantized version of Llama-3.1-70B-Instruct using GGUF
+GGUF is designed for use with GGML and other executors.
+GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.
+Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.
+## Uploaded Quantization Types
+Currently, I have uploaded 2 quantized versions:
+Q5_K_M : - large, very low quality loss
+and
+Q8_0 : - very large, extremely low quality loss
+### All Quantization Types Possible
+Here are all of the Quantization Types that are Possible. Let me know if you need any other versions
+   2  or  Q4_0   :  - small, very high quality loss - legacy, prefer using Q3_K_M
+   3  or  Q4_1   :  - small, substantial quality loss - legacy, prefer using Q3_K_L
+   8  or  Q5_0   :  - medium, balanced quality - legacy, prefer using Q4_K_M
+   9  or  Q5_1   :  - medium, low quality loss - legacy, prefer using Q5_K_M
+  10  or  Q2_K   :  - smallest, extreme quality loss - not recommended
+  12  or  Q3_K   :  alias for Q3_K_M
+  11  or  Q3_K_S :  - very small, very high quality loss
+  12  or  Q3_K_M :  - very small, very high quality loss
+  13  or  Q3_K_L :  - small, substantial quality loss
+  15  or  Q4_K   :  alias for Q4_K_M
+  14  or  Q4_K_S :  - small, significant quality loss
+  15  or  Q4_K_M :  - medium, balanced quality - *recommended*
+  17  or  Q5_K   :  alias for Q5_K_M
+  16  or  Q5_K_S :  - large, low quality loss - *recommended*
+  17  or  Q5_K_M :  - large, very low quality loss - *recommended*
+  18  or  Q6_K   :  - very large, extremely low quality loss
+   7  or  Q8_0   :  - very large, extremely low quality loss - not recommended
+   1  or  F16    :  - extremely large, virtually no quality loss - not recommended
+   0  or  F32    :  - absolutely huge, lossless - not recommended
+## Uses
+By using the GGUF version of Llama-3.1-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.