--- library_name: peft --- ## Training procedure The following `bitsandbytes` quantization config was used during training: - load_in_8bit: False - load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: nf4 - bnb_4bit_use_double_quant: True - bnb_4bit_compute_dtype: bfloat16 ### Framework versions - PEFT 0.4.0.dev0 ### 额外说明 这是基于LLaMA使用QLoRA技术微调的一个适配器模型 ``` # imports from peft import PeftModel from transformers import GenerationConfig, LlamaForCausalLM, LlamaTokenizer import torch # create tokenizer base_model = "huggyllama/llama-7b" tokenizer = LlamaTokenizer.from_pretrained(base_model) # base model model = LlamaForCausalLM.from_pretrained( base_model, torch_dtype=torch.float16, device_map="auto", ) # LORA PEFT adapters adapter_model = "AtomGradient/adjust_llama-7b" model = PeftModel.from_pretrained( model, adapter_model, #torch_dtype=torch.float16, ) model.eval() # prompt prompt = "美国的总统是谁" inputs = tokenizer(prompt, return_tensors="pt") # Generate generate_ids = model.generate(**inputs, max_new_tokens=30) print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]) ```