Edit model card

This model is fine-tuned on meta-llama/Llama-2-7b-chat-hf using MedQuAD (Medical Question Answering Dataset).
If you are interested how to fine-tune Llama-2 or other LLM models, the repo will tell you.

Usage

base_model = "meta-llama/Llama-2-7b-chat-hf"
adapter = 'EdwardYu/llama-2-7b-MedQuAD'

tokenizer = AutoTokenizer.from_pretrained(adapter)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type='nf4'
    ),
)
model = PeftModel.from_pretrained(model, adapter)

question = 'What are the side effects or risks of Glucagon?'
inputs = tokenizer(question, return_tensors="pt").to("cuda")
outputs = model.generate(inputs=inputs.input_ids, max_length=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

To run model inference faster, you can load in 16-bits without 4-bit quantization.

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Downloads last month
19
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for EdwardYu/llama-2-7b-MedQuAD

Adapter
(1050)
this model