Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

InvestLM

This is the repo for a new financial domain large language model, InvestLM, tuned on Mixtral-8x7B-v0.1, using a carefully curated instruction dataset related to financial investment. We provide guidance on how to use InvestLM for inference.

Github Link: InvestLM

About HQQ

HQQ is a fast and accurate model quantizer that skips the need for calibration data. Quantize the largest models, without calibration data, in just a few minutes.

Inference

Please use the following command to log in hugging face first.

huggingface-cli login

Prompt template

[INST] {prompt} [/INST]

How to use this AWQ model from Python code

pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"
import transformers 
from threading import Thread

model_id = 'yixuantt/InvestLM-mistral-8x7B-v2-HQQ'
#Load the model
from hqq.engine.hf import HQQModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = HQQModelForCausalLM.from_quantized(model_id)

#Optional: set backend/compile
#You will need to install CUDA kernels apriori
# git clone https://github.com/mobiusml/hqq/
# cd hqq/kernels && python setup_cuda.py install
from hqq.core.quantize import *
HQQLinear.set_backend(HQQBackend.ATEN_BACKPROP)


# Convert prompt to tokens
prompt_template = "[INST] {prompt} [/INST]"
prompt = "What is finance?"

def chat_processor(chat, max_new_tokens=100, do_sample=True):
    tokenizer.use_default_system_prompt = False
    streamer = transformers.TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)

    generate_params = dict(
        tokenizer(prompt_template.format(prompt=chat), return_tensors="pt").to('cuda'),
        streamer=streamer,
        max_new_tokens=max_new_tokens,
        do_sample=do_sample,
        temperature= 0.5,
        repetition_penalty=1.2,
    )

    t = Thread(target=model.generate, kwargs=generate_params)
    t.start()
    outputs = []
    for text in streamer:
        outputs.append(text)
        print(text, end="", flush=True)

    return outputs

################################################################################################
#Generation
outputs = chat_processor(prompt, max_new_tokens=1000, do_sample=True)

Downloads last month
8