thesven
/

Mistral-7B-Instruct-v0.3-GPTQ

@@ -16,123 +16,55 @@ Mistral-7B-v0.3 has the following changes compared to [Mistral-7B-v0.2](https://
 - Supports v3 Tokenizer
 - Supports function calling
-## Installation
-It is recommended to use `mistralai/Mistral-7B-Instruct-v0.3` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling.
-```
-pip install mistral_inference
-```
-## Download
-```py
-from huggingface_hub import snapshot_download
-from pathlib import Path
-mistral_models_path = Path.home().joinpath('mistral_models', '7B-Instruct-v0.3')
-mistral_models_path.mkdir(parents=True, exist_ok=True)
-snapshot_download(repo_id="mistralai/Mistral-7B-Instruct-v0.3", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
-```
-### Chat
-After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using
-```
-mistral-chat $HOME/mistral_models/7B-Instruct-v0.3 --instruct --max_tokens 256
-```
-### Instruct following
 ```py
-from mistral_inference.model import Transformer
-from mistral_inference.generate import generate
-from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
-from mistral_common.protocol.instruct.messages import UserMessage
-from mistral_common.protocol.instruct.request import ChatCompletionRequest
-tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
-model = Transformer.from_folder(mistral_models_path)
-completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
-tokens = tokenizer.encode_chat_completion(completion_request).tokens
-out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
-result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
-print(result)
-```
-### Function calling
-```py
-from mistral_common.protocol.instruct.tool_calls import Function, Tool
-from mistral_inference.model import Transformer
-from mistral_inference.generate import generate
-from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
-from mistral_common.protocol.instruct.messages import UserMessage
-from mistral_common.protocol.instruct.request import ChatCompletionRequest
-tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
-model = Transformer.from_folder(mistral_models_path)
-completion_request = ChatCompletionRequest(
-    tools=[
-        Tool(
-            function=Function(
-                name="get_current_weather",
-                description="Get the current weather",
-                parameters={
-                    "type": "object",
-                    "properties": {
-                        "location": {
-                            "type": "string",
-                            "description": "The city and state, e.g. San Francisco, CA",
-                        },
-                        "format": {
-                            "type": "string",
-                            "enum": ["celsius", "fahrenheit"],
-                            "description": "The temperature unit to use. Infer this from the users location.",
-                        },
-                    },
-                    "required": ["location", "format"],
-                },
-            )
-        )
-    ],
-    messages=[
-        UserMessage(content="What's the weather like today in Paris?"),
-        ],
-)
-tokens = tokenizer.encode_chat_completion(completion_request).tokens
-out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
-result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
-print(result)
-```
-## Generate with `transformers`
-If you want to use Hugging Face `transformers` to generate text, you can do something like this.
-```py
-from transformers import pipeline
-messages = [
-    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
-    {"role": "user", "content": "Who are you?"},
-]
-chatbot = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.3")
-chatbot(messages)
 ```
 ## Limitations

 - Supports v3 Tokenizer
 - Supports function calling
+## Generate with `transformers`
+If you want to use Hugging Face `transformers` to generate text, you can do something like this.
 ```py
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+pretrained_model_name = "thesven/Mistral-7B-Instruct-v0.3-GPTQ"
+device = "cuda:0"
+# Load the tokenizer
+tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name)
+# Load the model with the specified configuration and move to device
+model = AutoModelForCausalLM.from_pretrained(
+    pretrained_model_name,
+    device_map="auto",
+)
+print(model)
+# Set EOS token ID
+model.eos_token_id = tokenizer.eos_token_id
+# Move model to the specified device
+model.to(device)
+# Define the input text
+input_text = "What is PEFT finetuning?"
+# Encode the input text
+input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)
+# Generate output
+output = model.generate(input_ids, max_length=1000)
+# Decode the generated output
+decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)
+# Print the decoded output
+for i, sequence in enumerate(decoded_output):
+    print(f"Generated Sequence {i+1}: {sequence}")
+del model
+torch.cuda.empty_cache()
 ```
 ## Limitations