Uploaded model

Developed by: Aratan
License: apache-2.0
Finetuned from model : llama-3.1-8b-bnb-4bit

Interencia

# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

Si usas ollama

FROM aratan_lora_model.Q4_K_M.gguf

TEMPLATE """ Below are some instructions that describe some tasks. Write responses that appropriately complete each request.{{ if .Prompt }}

Instruction:

Response:

{{ .Response }}<|eot_id|> """

system """Responde solo a la pregunta, no inventes, se concreto."""