dranger003
/

dbrx-instruct-iMat.GGUF

Text Generation

GGUF

Inference Endpoints

conversational

Model card Files Files and versions Community

dranger003 commited on Apr 7

Commit

63b33a1

•

1 Parent(s): 76431e2

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -6,13 +6,16 @@ license_link: https://www.databricks.com/legal/open-model-license
 pipeline_tag: text-generation
 base_model: databricks/dbrx-instruct
 ---
-**2024-04-06**: Support for this model is still being worked on - [`PR #6515`](https://github.com/ggerganov/llama.cpp/pull/6515).
 * GGUF importance matrix (imatrix) quants for https://huggingface.co/databricks/dbrx-instruct
 * The importance matrix is trained for ~100K tokens (200 batches of 512 tokens) using [wiki.train.raw](https://huggingface.co/datasets/wikitext).
-* [Which GGUF is right for me? (from Artefact2)](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
 * The [imatrix is being used on the K-quants](https://github.com/ggerganov/llama.cpp/pull/4930) as well (only for < Q6_K).
 * You can merge GGUFs with `gguf-split --merge <first-chunk> <output-file>` although this is not required since [f482bb2e](https://github.com/ggerganov/llama.cpp/commit/f482bb2e4920e544651fb832f2e0bcb4d2ff69ab).
 > DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2. This provides 65x more possible combinations of experts and we found that this improves model quality. DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA). It uses the GPT-4 tokenizer as provided in the tiktoken repository. We made these choices based on exhaustive evaluation and scaling experiments.

 pipeline_tag: text-generation
 base_model: databricks/dbrx-instruct
 ---
+**2024-04-07**: Support for this model is still being worked on - [`PR #6515`](https://github.com/ggerganov/llama.cpp/pull/6515).
+We are currently testing quants and I will upload them once they are working.
 * GGUF importance matrix (imatrix) quants for https://huggingface.co/databricks/dbrx-instruct
 * The importance matrix is trained for ~100K tokens (200 batches of 512 tokens) using [wiki.train.raw](https://huggingface.co/datasets/wikitext).
+* [Which GGUF is right for me? (from Artefact2)](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9) - X axis is file size and Y axis is perplexity (lower perplexity is better quality).
 * The [imatrix is being used on the K-quants](https://github.com/ggerganov/llama.cpp/pull/4930) as well (only for < Q6_K).
 * You can merge GGUFs with `gguf-split --merge <first-chunk> <output-file>` although this is not required since [f482bb2e](https://github.com/ggerganov/llama.cpp/commit/f482bb2e4920e544651fb832f2e0bcb4d2ff69ab).
+* What is importance matrix (imatrix)? You can [read more about it from the author here](https://github.com/ggerganov/llama.cpp/pull/4861).
+* How do I use imatrix quants? Just like any other GGUF, the `.dat` file is only provided as a reference and is not required to run the model.
 > DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2. This provides 65x more possible combinations of experts and we found that this improves model quality. DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA). It uses the GPT-4 tokenizer as provided in the tiktoken repository. We made these choices based on exhaustive evaluation and scaling experiments.