Edit model card

image/png

Uploaded model

  • Developed by: thesven
  • License: apache-2.0
  • Finetuned from model : unsloth/mistral-7b-v0.3-bnb-4bit

This model is an iteration of the Mistral 7B model, fine-tuned using Supervised Fine-Tuning (SFT) on the AetherCode-v1 dataset specifically for code-related tasks. It combines the advanced capabilities of the base Mistral 7B model with specialized training to enhance its performance in software development contexts.

Usage

from unsloth import FastLanguageModel

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "thesven/Aether-Code-Mistral-7B-0.3-v1", # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "You are an expert python developer, help me with my questions.", # instruction
        "How can I use puppeteer to get a mobile screen shot of a website?", # input
        "", # output - leave this blank for generation!
    ),
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 4000, use_cache = True)
print(tokenizer.batch_decode(outputs))

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
44
Safetensors
Model size
3.87B params
Tensor type
F32
·
BF16
·
U8
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Quantized from

Dataset used to train thesven/Aether-Code-Mistral-7B-0.3-v1-bnb-4bit

Collection including thesven/Aether-Code-Mistral-7B-0.3-v1-bnb-4bit