metadata
library_name: pruna-engine
thumbnail: >-
https://assets-global.website-files.com/646b351987a8d8ce158d1940/64ec9e96b4334c0e1ac41504_Logo%20with%20white%20text.svg
metrics:
- memory_disk
- memory_inference
- inference_latency
- inference_throughput
- inference_CO2_emissions
- inference_energy_consumption
Simply make AI models cheaper, smaller, faster, and greener!
- Give a thumbs up if you like this model!
- Contact us and tell us which model to compress next here.
- Request access to easily compress your own AI models here.
- Read the documentations to know more here
- Join Pruna AI community on Discord here to share feedback/suggestions or get help.
Frequently Asked Questions
- How does the compression work? The model is compressed by using bitsandbytes.
- How does the model quality change? The quality of the model output will slightly degrade.
- What is the model format? We the standard safetensors format.
- How to compress my own models? You can request premium access to more compression methods and tech support for your specific use-cases here.
Usage
These are several general ways to use the DBRX models:
- DBRX Base and DBRX Instruct are available for download on HuggingFace (see our Quickstart guide below). This is the HF repository for DBRX Base; DBRX Instruct can be found here.
- The DBRX model repository can be found on GitHub here.
- DBRX Base and DBRX Instruct are available with Databricks Foundation Model APIs via both Pay-per-token and Provisioned Throughput endpoints. These are enterprise-ready deployments.
- For more information on how to fine-tune using LLM-Foundry, please take a look at our LLM pretraining and fine-tuning documentation.
Quickstart Guide
Getting started with DBRX models is easy with the transformers
library. The model requires ~264GB of RAM and the following packages:
pip install "torch==2.4.0" "transformers>=4.39.2" "tiktoken>=0.6.0" "bitsandbytes"
If you'd like to speed up download time, you can use the hf_transfer
package as described by Huggingface here.
pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
You will need to request access to this repository to download the model. Once this is granted,
obtain an access token with read
permission, and supply the token below.
Run the model on multiple GPUs:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("PrunaAI/dbrx-base-bnb-4bit", trust_remote_code=True, token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("PrunaAI/dbrx-base-bnb-4bit", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, token="hf_YOUR_TOKEN")
input_text = "Databricks was founded in "
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Credits & License
The license of the smashed model follows the license of the original model. Please check the license of the original model databricks/dbrx-base before using this model which provided the base model. The license of the pruna-engine
is here on Pypi.