doshisha-mil/llama-2-70b-chat-4bit-japanese-v1

This model is Llama-2-Chat 70B fine-tuned with the following Japanese version of the alpaca dataset.

https://github.com/shi3z/alpaca_ja

Copyright Notice

Since this model is built on the copyright of Meta's LLaMA series, users of this model must also agree to Meta's license.

https://ai.meta.com/llama/

How to use

from huggingface_hub import notebook_login
notebook_login()
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "meta-llama/Llama-2-70b-chat-hf"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")

peft_name = "doshisha-mil/llama-2-70b-chat-4bit-japanese-v1"
model = PeftModel.from_pretrained(
    model, 
    peft_name, 
    is_trainable=True
)
model.eval()

device = "cuda:0"

text = "# Q: 日本一高い山は何ですか? # A: "
inputs = tokenizer(text, return_tensors="pt").to(device)
with torch.no_grad():
  outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: float32

Framework versions

  • PEFT 0.4.0
Downloads last month
10
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.