mobiuslabsgmbh
/

Llama-2-70b-chat-hf-2bit_g16_s128-HQQ

Text Generation

Transformers

llama

conversational

text-generation-inference

Model card Files Files and versions Community

mobicham commited on Nov 25, 2023

Commit

70e587f

•

1 Parent(s): 67db312

Update README.md

Browse files

Files changed (1) hide show

README.md +92 -0

README.md CHANGED Viewed

@@ -8,6 +8,7 @@ pipeline_tag: text-generation
 ## Llama-2-70b-chat-hf-2bit_g16_s128-HQQ
 This is a version of the LLama-2-70B-chat-hf model quantized to 2-bit via Half-Quadratic Quantization (HQQ): https://mobiusml.github.io/hqq_blog/
 To run the model, install the HQQ library from https://github.com/mobiusml/hqq and use it as follows:
 ``` Python
 from hqq.models.llama_hf import LlamaHQQ
@@ -20,6 +21,97 @@ tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
 model = LlamaHQQ.from_quantized(model_id)
 ```
 *Limitations*: <br>
 -Only supports single GPU runtime.<br>
 -Not compatible with HuggingFace's PEFT.<br>

 ## Llama-2-70b-chat-hf-2bit_g16_s128-HQQ
 This is a version of the LLama-2-70B-chat-hf model quantized to 2-bit via Half-Quadratic Quantization (HQQ): https://mobiusml.github.io/hqq_blog/
+### Basic Usage
 To run the model, install the HQQ library from https://github.com/mobiusml/hqq and use it as follows:
 ``` Python
 from hqq.models.llama_hf import LlamaHQQ
 model = LlamaHQQ.from_quantized(model_id)
 ```
+### Basic Chat Example
+``` Python
+import transformers
+from hqq.models.llama_hf import LlamaHQQ
+model_id  = 'mobiuslabsgmbh/Llama-2-70b-chat-hf-2bit_g16_s128-HQQ'
+#Load the tokenizer
+tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
+#Load the model
+model     = LlamaHQQ.from_quantized(model_id)
+##########################################################################################################
+from threading import Thread
+from sys import stdout
+def print_flush(data):
+	stdout.write("\r" + data)
+	stdout.flush()
+#Adapted from https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat/blob/main/app.py
+def process_conversation(chat):
+	system_prompt = chat['system_prompt']
+	chat_history  = chat['chat_history']
+	message       = chat['message']
+	conversation = []
+	if system_prompt:
+		conversation.append({"role": "system", "content": system_prompt})
+	for user, assistant in chat_history:
+		conversation.extend([{"role": "user", "content": user}, {"role": "assistant", "content": assistant}])
+	conversation.append({"role": "user", "content": message})
+	return tokenizer.apply_chat_template(conversation, return_tensors="pt").to('cuda')
+def chat_processor(chat, max_new_tokens=100, do_sample=True):
+	tokenizer.use_default_system_prompt = False
+	streamer = transformers.TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
+	generate_params = dict(
+		{"input_ids": process_conversation(chat)},
+		streamer=streamer,
+		max_new_tokens=max_new_tokens,
+		do_sample=do_sample,
+		top_p=0.90,
+		top_k=50,
+		temperature= 0.6,
+		num_beams=1,
+		repetition_penalty=1.2,
+	)
+	t = Thread(target=model.generate, kwargs=generate_params)
+	t.start()
+	outputs = []
+	for text in streamer:
+		outputs.append(text)
+		print_flush("".join(outputs))
+	return outputs
+###################################################################################################
+outputs = chat_processor({'system_prompt':"You are a helpful assistant.",
+						'chat_history':[],
+						'message':"How can I build a car?"
+						},
+						 max_new_tokens=1000, do_sample=False)
+```
+<b>Output</b>:
+<p>
+Building a car is a complex process that involves designing, prototyping, testing, and manufacturing. Here are some general steps you can follow to build a car:
+1. Design the car: Determine the type of car you want to build, including the size, shape, and features. Create a detailed set of blueprints or computer-aided design (CAD) drawings to guide your building process.
+2. Source materials: Purchase or gather all the necessary materials, such as steel, aluminum, rubber, plastics, and any other components required for the car's body, frame, and engine.
+3. Build the frame: Construct the frame, which is the foundation of the car. This includes creating the chassis, suspension, and steering systems.
+4. Install the engine: Choose an appropriate engine and install it in the frame. Connect the engine to the transmission, exhaust system, and cooling system.
+5. Add the body: Attach the body panels to the frame, including the hood, doors, trunk lid, and roof. Ensure proper alignment and fitment.
+6. Install the electrical system: Connect the battery, starter, alternator, and wiring harness to the engine and other components. Install headlights, taillights, and other electrical accessories.
+7. Add the brakes: Install the brake system, including the brake pads, rotors, calipers, and master cylinder. Connect the brake lines and bleed the system to remove air bubbles.
+8. Install the interior: Fit the seats, dashboard, carpeting, and other interior components. Install the steering column, pedals, and shifter.
+9. Test and inspect: Check the car's systems, including the brakes, suspension, and engine performance. Make sure everything is functioning properly and safely.
+10. Register and insure: Obtain registration and insurance for your newly built car. Comply with local regulations and laws regarding vehicle ownership and operation.
+Please note that this is a high-level overview of the process, and building a car can be a complex and time-consuming task. It requires specialized knowledge, skills, and tools, as well as a clean and organized workspace. Additionally, safety precautions should always be taken when working on vehicles, as they can be dangerous if mishandled.
+If you are not experienced in automotive construction, it may be advisable to seek guidance from professionals or take a course in automotive mechanics before attempting to build a car.
+----------------------------------------------------------------------------------------------------------------------------------
+</p>
 *Limitations*: <br>
 -Only supports single GPU runtime.<br>
 -Not compatible with HuggingFace's PEFT.<br>