alokabhishek commited on
Commit
70e0a24
1 Parent(s): 7c87486

Updated Readme

Browse files
Files changed (1) hide show
  1. README.md +137 -6
README.md CHANGED
@@ -1,15 +1,146 @@
1
  ---
 
2
  license: apache-2.0
3
  pipeline_tag: text-generation
4
  tags:
5
- - finetuned
6
- inference: true
7
- widget:
8
- - messages:
9
- - role: user
10
- content: What is your favorite condiment?
 
11
  ---
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  # Model Card for Mistral-7B-Instruct-v0.2
14
 
15
  The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.
 
1
  ---
2
+ library_name: transformers
3
  license: apache-2.0
4
  pipeline_tag: text-generation
5
  tags:
6
+ - ExLlamaV2
7
+ - 8bit
8
+ - Mistral
9
+ - Mistral-7B
10
+ - quantized
11
+ - exl2
12
+ - 8.0-bpw
13
  ---
14
 
15
+ # Model Card for alokabhishek/Mistral-7B-Instruct-v0.2-8.0-bpw-exl2
16
+
17
+ <!-- Provide a quick summary of what the model is/does. -->
18
+ This repo contains 8-bit quantized (using ExLlamaV2) model Mistral AI_'s Mistral-7B-Instruct-v0.2
19
+
20
+
21
+
22
+ ## Model Details
23
+
24
+ - Model creator: [Mistral AI_](https://huggingface.co/mistralai)
25
+ - Original model: [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
26
+
27
+
28
+ ### About quantization using ExLlamaV2
29
+
30
+
31
+ - ExLlamaV2 github repo: [ExLlamaV2 github repo](https://github.com/turboderp/exllamav2)
32
+
33
+
34
+
35
+ # How to Get Started with the Model
36
+
37
+ Use the code below to get started with the model.
38
+
39
+
40
+ ## How to run from Python code
41
+
42
+ #### First install the package
43
+ ```shell
44
+ # Install ExLLamaV2
45
+ !git clone https://github.com/turboderp/exllamav2
46
+ !pip install -e exllamav2
47
+ ```
48
+
49
+ #### Import
50
+
51
+ ```python
52
+ from huggingface_hub import login, HfApi, create_repo
53
+ from torch import bfloat16
54
+ import locale
55
+ import torch
56
+ import os
57
+ ```
58
+
59
+ #### set up variables
60
+
61
+ ```python
62
+ # Define the model ID for the desired model
63
+ model_id = "alokabhishek/Mistral-7B-Instruct-v0.2-8.0-bpw-exl2"
64
+ BPW = 8.0
65
+
66
+ # define variables
67
+ model_name = model_id.split("/")[-1]
68
+
69
+ ```
70
+
71
+ #### Download the quantized model
72
+ ```shell
73
+ !git-lfs install
74
+ # download the model to loacl directory
75
+ !git clone https://{username}:{HF_TOKEN}@huggingface.co/{model_id} {model_name}
76
+ ```
77
+
78
+ #### Run Inference on quantized model using
79
+ ```shell
80
+ # Run model
81
+ !python exllamav2/test_inference.py -m {model_name}/ -p "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
82
+ ```
83
+
84
+
85
+ ```python
86
+ import sys, os
87
+
88
+ sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
89
+
90
+ from exllamav2 import (
91
+ ExLlamaV2,
92
+ ExLlamaV2Config,
93
+ ExLlamaV2Cache,
94
+ ExLlamaV2Tokenizer,
95
+ )
96
+
97
+ from exllamav2.generator import ExLlamaV2BaseGenerator, ExLlamaV2Sampler
98
+
99
+ import time
100
+
101
+ # Initialize model and cache
102
+
103
+ model_directory = "/model_path/Mistral-7B-Instruct-v0.2-8.0-bpw-exl2/"
104
+ print("Loading model: " + model_directory)
105
+
106
+ config = ExLlamaV2Config(model_directory)
107
+ model = ExLlamaV2(config)
108
+ cache = ExLlamaV2Cache(model, lazy=True)
109
+ model.load_autosplit(cache)
110
+ tokenizer = ExLlamaV2Tokenizer(config)
111
+
112
+ # Initialize generator
113
+
114
+ generator = ExLlamaV2BaseGenerator(model, cache, tokenizer)
115
+
116
+ # Generate some text
117
+
118
+ settings = ExLlamaV2Sampler.Settings()
119
+ settings.temperature = 0.85
120
+ settings.top_k = 50
121
+ settings.top_p = 0.8
122
+ settings.token_repetition_penalty = 1.01
123
+ settings.disallow_tokens(tokenizer, [tokenizer.eos_token_id])
124
+
125
+ prompt = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
126
+
127
+ max_new_tokens = 512
128
+
129
+ generator.warmup()
130
+ time_begin = time.time()
131
+
132
+ output = generator.generate_simple(prompt, settings, max_new_tokens, seed=1234)
133
+
134
+ time_end = time.time()
135
+ time_total = time_end - time_begin
136
+
137
+ print(output)
138
+ print()
139
+ print(f"Response generated in {time_total:.2f} seconds")
140
+
141
+
142
+ ```
143
+
144
  # Model Card for Mistral-7B-Instruct-v0.2
145
 
146
  The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.