--- language: - en library_name: peft pipeline_tag: text-generation tags: - medical license: cc-by-nc-3.0 --- # MedFalcon 40b LoRA ## Model Description ### Architecture `nmitchko/medfalcon-40b-lora` is a large language model LoRa specifically fine-tuned for medical domain tasks. It is based on [`Falcon-40b-instruct`](https://huggingface.co/tiiuae/falcon-40b-instruct/) at 40 billion parameters. The primary goal of this model is to improve question-answering and medical dialogue tasks. It was trained using [LoRA](https://arxiv.org/abs/2106.09685), specifically [QLora](https://github.com/artidoro/qlora), to reduce memory footprint. > This Lora supports 4-bit and 8-bit modes. ### Requirements ``` bitsandbytes>=0.39.0 peft transformers ``` Steps to load this model: 1. Load base model using QLORA 2. Apply LoRA using peft ```python # from transformers import AutoTokenizer, AutoModelForCausalLM import transformers import torch model = "tiiuae/falcon-40b-instruct" LoRA = "nmitchko/medfalcon-40b-lora" tokenizer = AutoTokenizer.from_pretrained(model) model = AutoModelForCausalLM.from_pretrained(model, load_in_8bit=load_8bit, torch_dtype=torch.float16, trust_remote_code=True, ) model = PeftModel.from_pretrained(model, LoRA) pipeline = transformers.pipeline( "text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", ) sequences = pipeline( "What does the drug ceftrioxone do?\nDoctor:", max_length=200, do_sample=True, top_k=40, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id, ) for seq in sequences: print(f"Result: {seq['generated_text']}") ```