hierholzer
commited on
Commit
•
8ac9b95
1
Parent(s):
e899c1b
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
---
|
6 |
+
|
7 |
+
|
8 |
+
---
|
9 |
+
|
10 |
+
# Model
|
11 |
+
|
12 |
+
|
13 |
+
Here is a Quantized version of Llama-3.1-70B-Instruct using GGUF
|
14 |
+
|
15 |
+
GGUF is designed for use with GGML and other executors.
|
16 |
+
GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.
|
17 |
+
Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.
|
18 |
+
|
19 |
+
|
20 |
+
## Uploaded Quantization Types
|
21 |
+
|
22 |
+
Currently, I have uploaded 2 quantized versions:
|
23 |
+
|
24 |
+
Q5_K_M : - large, very low quality loss
|
25 |
+
and
|
26 |
+
Q8_0 : - very large, extremely low quality loss
|
27 |
+
|
28 |
+
### All Quantization Types Possible
|
29 |
+
|
30 |
+
Here are all of the Quantization Types that are Possible. Let me know if you need any other versions
|
31 |
+
|
32 |
+
2 or Q4_0 : - small, very high quality loss - legacy, prefer using Q3_K_M
|
33 |
+
3 or Q4_1 : - small, substantial quality loss - legacy, prefer using Q3_K_L
|
34 |
+
8 or Q5_0 : - medium, balanced quality - legacy, prefer using Q4_K_M
|
35 |
+
9 or Q5_1 : - medium, low quality loss - legacy, prefer using Q5_K_M
|
36 |
+
10 or Q2_K : - smallest, extreme quality loss - not recommended
|
37 |
+
12 or Q3_K : alias for Q3_K_M
|
38 |
+
11 or Q3_K_S : - very small, very high quality loss
|
39 |
+
12 or Q3_K_M : - very small, very high quality loss
|
40 |
+
13 or Q3_K_L : - small, substantial quality loss
|
41 |
+
15 or Q4_K : alias for Q4_K_M
|
42 |
+
14 or Q4_K_S : - small, significant quality loss
|
43 |
+
15 or Q4_K_M : - medium, balanced quality - *recommended*
|
44 |
+
17 or Q5_K : alias for Q5_K_M
|
45 |
+
16 or Q5_K_S : - large, low quality loss - *recommended*
|
46 |
+
17 or Q5_K_M : - large, very low quality loss - *recommended*
|
47 |
+
18 or Q6_K : - very large, extremely low quality loss
|
48 |
+
7 or Q8_0 : - very large, extremely low quality loss - not recommended
|
49 |
+
1 or F16 : - extremely large, virtually no quality loss - not recommended
|
50 |
+
0 or F32 : - absolutely huge, lossless - not recommended
|
51 |
+
|
52 |
+
|
53 |
+
|
54 |
+
## Uses
|
55 |
+
|
56 |
+
By using the GGUF version of Llama-3.1-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.
|