Text Generation
GGUF
English
code
CISCai commited on
Commit
9848e27
1 Parent(s): aceab81

Requantized everything with new pre-tokenizer

Browse files
OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e617a6d520032a2a782c6105aa153ae70a8f8b0ba38fdbda67f5c3d02f143f40
3
- size 1530209440
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a09e844828446cf1c2f6fbfdda289b7c048f91f1bf64ed42486dc7ce74f00873
3
+ size 1530080384
OpenCodeInterpreter-DS-6.7B.IQ2_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dc8b562221cc31b9b4db87e63d9a09b99481e90bf53b93e6d365bfc540cd95e2
3
- size 2361484448
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7662673acdded25afa5846d17ea6391d8e0d8c57bd434e5ffa27a9761488140c
3
+ size 2361355392
OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e2734866607bf4bf33d004bdc164af5f3581c9b19bba39b9c27a35e805039066
3
- size 2198299808
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66e7d5bbe7eadc9166c5fd699082521e8a72b00ab12696912809583be567f321
3
+ size 2198170752
OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b91b0c74cfc8e8b40283311327836503dde18847da73ca9dd35d05e6165e52a9
3
- size 2036540576
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a13b4cb2a81348911d6d9ca9a587e9ff7b08478974cb5e30502792ca6ce2a46
3
+ size 2036411520
OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3f94cdb4cbcbf3591f204a438e3724a310f91fa62617dc0dba6ced5c02dc86bb
3
- size 1856578720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0660175e93185ca92b8e01a2cecc043f643cc23b4357306918be58948ba3ec4
3
+ size 1856449664
OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5f6f9c0b8e5529de5f2914bd147e50d59b8e61475f2cd7e2d495e50b7a69b36a
3
- size 3116737696
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4482aa902c96eeaebbabc76f81129bc2e67284182a2073487429ed150a58f6c5
3
+ size 3116608640
OpenCodeInterpreter-DS-6.7B.IQ3_S.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8640de98871c18b430f47d84bc7361015adf8d5cf46cac500d1575be43de6aab
3
- size 2950177952
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1c65eb086d34bafbe005a1d320d9709a0dd080af89edef574c074ef73d27be9
3
+ size 2950048896
OpenCodeInterpreter-DS-6.7B.IQ3_XS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2980e66945bed767a280b13089ed9adabaf61f90a0b8102dca9ad564a60eb582
3
- size 2798396576
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af3525ae7e915200a857a1d6012eb63eb5ebde0c0fdb077a047c57cbccc3812c
3
+ size 2798267520
OpenCodeInterpreter-DS-6.7B.IQ3_XXS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:691218cc94838ef6e27d9f24796d8bcac2ad4c8d7e037059209584d2792fc2b7
3
- size 2587124896
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:22e420c184552cc7028b5547c436c138ca91415f75a229bc6f378232416ea980
3
+ size 2586995840
OpenCodeInterpreter-DS-6.7B.IQ4_XS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ce70c09cc1ab5d6415dd24ff452cb5b4e1b6a1f7fd5853ded6cbfc8006cf9b9e
3
- size 3621315744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41bcc0e72a94b988446daab9daa6f8f6327065d5d51330ed91bc8d7b810c3c79
3
+ size 3621186688
OpenCodeInterpreter-DS-6.7B.imatrix.dat CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f8fd39c9b70789f1cf31e8b11ddc0ec39002f63f9b453aa277aaa8ec793ec9ab
3
- size 4562142
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d799acf6c89364444c40223faadf8696a469ad80d6621e0327e2a84524f59c42
3
+ size 4562176
README.md CHANGED
@@ -23,9 +23,9 @@ quantized_by: CISC
23
 
24
  This repo contains State Of The Art quantized GGUF format model files for [OpenCodeInterpreter DS 6.7B](https://huggingface.co/m-a-p/OpenCodeInterpreter-DS-6.7B).
25
 
26
- Quantization was done with an importance matrix that was trained for ~1M tokens (2000 batches of 512 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
27
 
28
- Even though the 1-bit quantized model file "works" it is **not recommended** for normal use ~~as it is extremely error-prone~~, I've requantized it with a 4K-context imatrix which seems to have improved it a little bit but it still defaults to infinite loops, you have been warned. 🧐
29
 
30
  <!-- description end -->
31
 
@@ -59,6 +59,7 @@ They are also compatible with many third party UIs and libraries provided they a
59
  The new methods available are:
60
 
61
  * GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
 
62
  * GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
63
  * GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
64
  * GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
@@ -68,6 +69,7 @@ The new methods available are:
68
  * GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
69
  * GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
70
  * GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
 
71
 
72
  Refer to the Provided Files table below to see what files use which methods, and how.
73
  </details>
@@ -78,7 +80,7 @@ Refer to the Provided Files table below to see what files use which methods, and
78
 
79
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
80
  | ---- | ---- | ---- | ---- | ---- | ----- |
81
- | [OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss - not recommended **at all** |
82
  | [OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
83
  | [OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
84
  | [OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |
@@ -91,8 +93,6 @@ Refer to the Provided Files table below to see what files use which methods, and
91
 
92
  Generated importance matrix file: [OpenCodeInterpreter-DS-6.7B.imatrix.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix.dat)
93
 
94
- Generated importance matrix file (4K context): [OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat)
95
-
96
  **Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
97
 
98
  <!-- README_GGUF.md-provided-files end -->
 
23
 
24
  This repo contains State Of The Art quantized GGUF format model files for [OpenCodeInterpreter DS 6.7B](https://huggingface.co/m-a-p/OpenCodeInterpreter-DS-6.7B).
25
 
26
+ Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
27
 
28
+ Everything has been reconverted and quantized with a new importance matrix using llama.cpp from April 29th 2024 onwards, as of commit [f4ab2a4](https://github.com/ggerganov/llama.cpp/commit/f4ab2a41476600a98067a9474ea8f9e6db41bcfa) to ensure correct pre-tokenization. The new GGUFs will work with older llama.cpp, but this may not generate correct prompt tokens, please use a recent build to ensure the best possible results!
29
 
30
  <!-- description end -->
31
 
 
59
  The new methods available are:
60
 
61
  * GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
62
+ * GGML_TYPE_IQ1_M - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.75 bpw
63
  * GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
64
  * GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
65
  * GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
 
69
  * GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
70
  * GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
71
  * GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
72
+ * GGML_TYPE_IQ4_NL - 4-bit non-linearly mapped quantization with an importance matrix applied, effectively using 4.5 bpw
73
 
74
  Refer to the Provided Files table below to see what files use which methods, and how.
75
  </details>
 
80
 
81
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
82
  | ---- | ---- | ---- | ---- | ---- | ----- |
83
+ | [OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss - **TBD**: Waiting for [this issue](https://github.com/ggerganov/llama.cpp/issues/5996) to be resolved |
84
  | [OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
85
  | [OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
86
  | [OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |
 
93
 
94
  Generated importance matrix file: [OpenCodeInterpreter-DS-6.7B.imatrix.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix.dat)
95
 
 
 
96
  **Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
97
 
98
  <!-- README_GGUF.md-provided-files end -->