alokabhishek
/

Mistral-7B-Instruct-v0.2-bnb-8bit

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

Mistral-7B-Instruct-v0.2-bnb-8bit / README.md

alokabhishek's picture

Updated Readme

a859fae verified 4 months ago

|

history blame contribute delete

No virus

3.52 kB

	---
	library_name: transformers
	license: apache-2.0
	pipeline_tag: text-generation
	tags:
	- bitsandbytes
	- quantized
	- 8bit
	- Mistral
	- Mistral-7B
	- bnb
	---

	# Model Card for alokabhishek/Mistral-7B-Instruct-v0.2-bnb-8bit

	<!-- Provide a quick summary of what the model is/does. -->
	This repo contains 8-bit quantized (using bitsandbytes) model Mistral AI_'s Mistral-7B-Instruct-v0.2



	## Model Details

	- Model creator: [Mistral AI_](https://huggingface.co/mistralai)
	- Original model: [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)


	### About 8 bit quantization using bitsandbytes

	- QLoRA: Efficient Finetuning of Quantized LLMs: [arXiv - QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)

	- Hugging Face Blog post on 8-bit quantization using bitsandbytes: [A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes](https://huggingface.co/blog/hf-bitsandbytes-integration)

	- bitsandbytes github repo: [bitsandbytes github repo](https://github.com/TimDettmers/bitsandbytes)



	# How to Get Started with the Model

	Use the code below to get started with the model.


	## How to run from Python code

	#### First install the package
	```shell
	!pip install --quiet bitsandbytes
	!pip install --quiet --upgrade transformers # Install latest version of transformers
	!pip install --quiet --upgrade accelerate
	!pip install --quiet sentencepiece
	pip install flash-attn --no-build-isolation
	```

	# Import

	```python
	import torch
	import os
	from torch import bfloat16
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM
	```

	# Use a pipeline as a high-level helper

	```python
	model_id_mistral = "alokabhishek/Mistral-7B-Instruct-v0.2-bnb-8bit"

	tokenizer_mistral = AutoTokenizer.from_pretrained(model_id_mistral, use_fast=True)

	model_mistral = AutoModelForCausalLM.from_pretrained(
	model_id_mistral,
	device_map="auto"
	)


	pipe_mistral = pipeline(model=model_mistral, tokenizer=tokenizer_mistral, task='text-generation')

	prompt_mistral = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."

	output_mistral = pipe_llama(prompt_mistral, max_new_tokens=512)

	print(output_mistral[0]["generated_text"])

	```


	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	[More Information Needed]

	### Downstream Use [optional]

	<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

	[More Information Needed]

	### Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

	[More Information Needed]

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	[More Information Needed]



	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->


	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	[More Information Needed]

	### Results

	[More Information Needed]


	## Model Card Authors [optional]

	[More Information Needed]

	## Model Card Contact

	[More Information Needed]