Edit model card

GusLovesMath/LlaMATH-3-8B-Instruct-4bit

This model was converted to MLX format from mlx-community/Meta-Llama-3-8B-Instruct-4bit using mlx-lm version 0.12.1. Refer to the original model card for more details on the model. Note: This model was trained locally on an M2 Pro chip with 16GB of RAM, 16 GPUs, and CPUs.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("GusLovesMath/LlaMATH-3-8B-Instruct-4bit")
response = generate(model, tokenizer, prompt="hello", verbose=True)

Try the following prompt.

# Our Prompt
prompt = """
Q A new program had 60 downloads in the first month.
The number of downloads in the second month was three
times as many as the downloads in the first month,
but then reduced by 30% in the third month. How many
downloads did the program have total over the three months?
"""
print(f"Our Test Prompt")
print(f"Q {prompt}")

# Testing model with prompt
response = generate(
    model,
    tokenizer,
    prompt=prompt,
    max_tokens=132,
    temp=0.0, 
    verbose=False
)

# Printing models repsonse
print(f'LlaMATH Response')
print(response)
A: The number of downloads in the first month was 60.
The number of downloads in the second month was three times as many as the downloads in the first month, so it was 60 * 3 = <<60*3=180>>180.
The number of downloads in the third month was 30% less than the number of downloads in the second month, so it was 180 * 0.7 = <<180*0.7=126>>126.
The total number of downloads over the three months was 60 + 180 + 126 = <<60+180+126=366>>366.
#### 366
Downloads last month
2
Safetensors
Model size
1.7B params
Tensor type
FP16
·
U32
·
Inference API
Input a message to start chatting with GusLovesMath/LlaMATH-3-8B-Instruct-4bit.
Inference API (serverless) does not yet support mlx models for this pipeline type.