GenZ 13B v2 4bit
The instruction finetuned model with 4K input length. The model is finetuned on top of pretrained LLaMa2
Inference
from transformers import LlamaForCausalLM, LlamaTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
base_model = 'budecosystem/genz-13b-v2-4bit'
tokenizer = LlamaTokenizer.from_pretrained(base_model)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path=base_model,
model_basename="gptq_model-4bit-128g",
use_safetensors=True,
trust_remote_code=True)
prompt = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER: who are you? ASSISTANT: """
inputs = tokenizer(prompt, return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))
Use following prompt template
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi, how are you? ASSISTANT:
Finetuning
python finetune.py
--model_name meta-llama/Llama-2-13b
--data_path dataset.json
--output_dir output
--trust_remote_code
--prompt_column instruction
--response_column output
Check the GitHub for the code -> GenZ
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.