LLama-2-MedText-13b-GGUF

Quantized GGUF of https://huggingface.co/truehealth/LLama-2-MedText-13b

Usage

Interactive llama.cpp session:

llama-cpp \
  --instruct \
  --color \
  --in-prefix "[INST] " \
  --in-suffix "[\INST] " \
  --model LLama-2-MedText-13b-q8_0.gguf

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> [INST] How confident are you in your knowledge and abilities?
[\INST] [RSP] As an AI language model, I can provide information to the best of my ability based on the resources I was trained on, which were primarily before <DATE>. While I strive to provide useful and accurate responses, my knowledge is not infinite, and I might not be able to provide professional medical advice or predictions in all cases. Additionally, healthcare decisions should always be evaluated in the context of an individual's unique circumstances and should be evaluated by a healthcare professional.

Model card from truehealth/Llama-2-MedText-Delta-Preview

Trained on https://huggingface.co/datasets/BI55/MedText.

These are PEFT delta weights and need to be merged into LLama-2-13b to be used for inference.

library_name: peft

Training procedure

The following bitsandbytes quantization config was used during training:

load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: float16

Framework versions

PEFT 0.5.0.dev0

Setup Notes

Download torch model

This example demonstrates using hfdownloader to download a torch model from HF to ./storage

./hfdownloader -m truehealth/LLama-2-MedText-13b

If necessary, install hfdownloader from https://github.com/bodaay/HuggingFaceModelDownloader

bash <(curl -sSL https://raw.githubusercontent.com/bodaay/HuggingFaceModelDownloader/master/scripts/gist_gethfd.sh) -h

Quantize torch model with llama.cpp

Quantize directly to q8_0

llama.cpp/convert.py --outtype q8_0 --outfile LLama-2-MedText-13b-q8_0.gguf ./models/Storage/truehealth_LLama-2-MedText-13b/pytorch_model-00001-of-00003.bin

First convert to f32 GGUF

llama.cpp/convert.py --outtype f32 --outfile LLama-2-MedText-13b-f32.gguf ./models/Storage/truehealth_LLama-2-MedText-13b/pytorch_model-00001-of-00003.bin

Then quantize f32 GGUF to lower bit resolutions

llama.cpp/build/bin/quantize LLama-2-MedText-13b-f32.gguf LLama-2-MedText-13b-Q3_K_L.gguf Q3_K_L
llama.cpp/build/bin/quantize LLama-2-MedText-13b-f32.gguf LLama-2-MedText-13b-Q6_K.gguf Q6_K

Distributing model through huggingface

mkvirtualenv -p `which python3.11` -a . ${PWD##*/}
python -m pip install huggingface_hub
huggingface-cli login
huggingface-cli lfs-enable-largefiles .