Uploaded model

  • Developed by: Aratan
  • License: apache-2.0
  • Finetuned from model : llama-3.1-8b-bnb-4bit

Interencia

# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

Si usas ollama

FROM aratan_lora_model.Q4_K_M.gguf

TEMPLATE """ Below are some instructions that describe some tasks. Write responses that appropriately complete each request.{{ if .Prompt }}

Instruction:

{{ .Prompt }}{{ end }}

Response:

{{ .Response }}<|eot_id|> """

system """Responde solo a la pregunta, no inventes, se concreto."""

PARAMETER stop "<|eom_id|>" PARAMETER stop "<|end_header_id|>" PARAMETER stop "<|start_header_id|>" PARAMETER stop "<|finetune_right_pad_id|>" PARAMETER stop "<|python_tag|>" PARAMETER stop "<|end_of_text|>" PARAMETER stop "<|eot_id|>" PARAMETER stop "<|reserved_special_token_"

Downloads last month
35
GGUF
Model size
8.03B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support