Edit model card

Llama-2-7b-chat-hf-4bit_g64-HQQ

This is a version of the LLama-2-7B-chat-hf model quantized to 4-bit via Half-Quadratic Quantization (HQQ): https://mobiusml.github.io/hqq_blog/

Basic Usage

To run the model, install the HQQ library from https://github.com/mobiusml/hqq and use it as follows:

model_id = 'mobiuslabsgmbh/Llama-2-7b-chat-hf-4bit_g64-HQQ'

from hqq.engine.hf import HQQModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = HQQModelForCausalLM.from_quantized(model_id)

Basic Chat Example

model_id  = 'mobiuslabsgmbh/Llama-2-7b-chat-hf-4bit_g64-HQQ'

from hqq.engine.hf import HQQModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = HQQModelForCausalLM.from_quantized(model_id)

##########################################################################################################
import transformers
from threading import Thread

from sys import stdout
def print_flush(data):
    stdout.write("\r" + data)
    stdout.flush()

#Adapted from https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat/blob/main/app.py
def process_conversation(chat):
    system_prompt = chat['system_prompt']
    chat_history  = chat['chat_history']
    message       = chat['message']

    conversation = []
    if system_prompt:
        conversation.append({"role": "system", "content": system_prompt})
    for user, assistant in chat_history:
        conversation.extend([{"role": "user", "content": user}, {"role": "assistant", "content": assistant}])
    conversation.append({"role": "user", "content": message})

    return tokenizer.apply_chat_template(conversation, return_tensors="pt").to('cuda')

def chat_processor(chat, max_new_tokens=100, do_sample=True):
    tokenizer.use_default_system_prompt = False
    streamer = transformers.TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)

    generate_params = dict(
        {"input_ids": process_conversation(chat)},
        streamer=streamer,
        max_new_tokens=max_new_tokens,
        do_sample=do_sample,
        top_p=0.90,
        top_k=50,
        temperature= 0.6,
        num_beams=1,
        repetition_penalty=1.2,
    )

    t = Thread(target=model.generate, kwargs=generate_params)
    t.start()

    outputs = []
    for text in streamer:
        outputs.append(text)
        print_flush("".join(outputs))

    return outputs

###################################################################################################

outputs = chat_processor({'system_prompt':"You are a helpful assistant.",
                        'chat_history':[],
                        'message':"How can I build a car?"
                        }, 
                         max_new_tokens=1000, do_sample=False)

Output:

Building a car is an incredibly complex and challenging project that requires extensive knowledge and expertise in various fields, including engineering, mechanics, design, and manufacturing. However, if you're interested in learning about the basics of how cars are built, here are some general steps involved:

  1. Design and Planning: The first step in building a car is to design and plan its structure, layout, and functionality. This involves creating detailed drawings and models using computer-aided design (CAD) software or other tools.
  2. Material Selection: Once the design is finalized, the next step is to select the materials needed for construction, such as steel, aluminum, plastics, rubber, and other components. These materials must be carefully chosen based on their strength, durability, and compatibility with each other.
  3. Frame and Body Construction: The frame is the skeleton of the car, providing structural support and stability. It is typically made of steel or aluminum and is assembled by welding or bonding individual parts together. The body panels, which include the hood, doors, trunk lid, and roof, are attached to the frame using adhesives, rivets, or spot welds.
  4. Engine and Transmission Installation: The engine and transmission are critical components of any car, responsible for propelling it forward. The engine must be properly installed and connected to the transmission, which transfers power from the engine to the wheels.
  5. Electrical Systems: The electrical systems of a car include the battery, starter motor, alternator, and wiring harness. These components work together to provide power to the various accessories and systems within the vehicle, such as the lights, radio, and climate control.
  6. Suspension and Steering: The suspension system helps absorb shocks and bumps while driving, while the steering system allows the driver to maneuver the vehicle. Both of these systems require careful attention during the assembly process to ensure proper function and safety.
  7. Interior Fitting Out: Once the major components are in place, the interior of the car can be fitted out with seats, dashboard, carpeting, and other elements.
  8. Painting and Finishing: After all the components are in place, the car can be painted and finished according to personal preference or factory specifications.
  9. Testing and Quality Control: Before the car is ready for use, it undergoes rigorous testing and quality control measures to ensure that it meets safety and performance standards.

Please note that this is a very high-level overview of the process, and actual car production involves many more details and considerations. Additionally, modern car manufacturing often relies heavily on automation and robotic processes, making the actual assembly line process much more complex than what is described above.


Limitations:
-Only supports single GPU runtime.
-Not compatible with HuggingFace's PEFT.

Downloads last month
4
Inference API
Input a message to start chatting with mobiuslabsgmbh/Llama-2-7b-chat-hf-4bit_g64-HQQ.
Inference API (serverless) has been turned off for this model.

Collection including mobiuslabsgmbh/Llama-2-7b-chat-hf-4bit_g64-HQQ