Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit

masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit is an Apple MLX 4-bit quantized derivative of lianghsun/Llama-3.2-Taiwan-3B-Instruct.

This chat/instruct conversion is prepared from the upstream safetensors model, not by converting GGUF weights back to another format. The non-instruct GGUF repository originally referenced was QuantFactory/Llama-3.2-Taiwan-3B-GGUF; this repository publishes the Instruct variant for better chat behavior.

This conversion is intended for local inference on Apple Silicon using MLX / MLX-LM. It preserves the upstream model architecture, tokenizer, and chat template files, while quantizing supported linear weights to 4-bit for a smaller memory footprint.

Attribution and license

A NOTICE file is included in this repository. By using, copying, modifying, redistributing, deploying, or making this derivative model available to others, you are responsible for complying with all applicable upstream terms, including the Llama 3.2 Community License and any additional terms, notices, access requirements, usage instructions, export-control, sanctions, or other legal requirements that apply to the upstream models.

No affiliation, sponsorship, endorsement, or trademark grant

This repository is independently prepared and published by the repository owner. It is not affiliated with, sponsored by, approved by, or endorsed by Meta, Llama, lianghsun, QuantFactory, or their affiliates unless they explicitly state otherwise.

The names "Llama", "Meta", "Llama-3.2-Taiwan", "lianghsun", "QuantFactory", and related marks are used here only for reasonable descriptive attribution and identification of the upstream source/base models. No trademark license or other rights in those marks are granted by this repository.

Conversion details

  • Source: lianghsun/Llama-3.2-Taiwan-3B-Instruct
  • Related request reference: QuantFactory/Llama-3.2-Taiwan-3B-GGUF
  • Format: MLX / MLX-LM
  • Quantization: 4-bit
  • Group size: 64

Equivalent conversion command:

python -m mlx_lm convert \
  --hf-path lianghsun/Llama-3.2-Taiwan-3B-Instruct \
  --mlx-path ./Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit \
  -q \
  --q-bits 4 \
  --q-group-size 64

Usage

Install MLX-LM:

pip install -U mlx-lm

Chat/generate with MLX-LM:

mlx_lm.generate \
  --model masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit \
  --prompt "請用繁體中文簡短介紹台灣的夜市文化。" \
  --max-tokens 160

Python example using the chat template:

from mlx_lm import load, generate

model, tokenizer = load("masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit")
messages = [{"role": "user", "content": "請用繁體中文簡短介紹台灣的夜市文化。"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=160)
print(response)

Important safety, quality, and compliance notes

Quantization may change model behavior, quality, robustness, refusal behavior, calibration, and safety characteristics compared with the upstream model. No claim is made that this derivative is equivalent to, safer than, or better than the upstream model.

Large language model outputs may be inaccurate, unsafe, biased, offensive, incomplete, or unsuitable for your intended use. You are responsible for evaluating outputs and for complying with applicable laws, licenses, policies, and platform rules.

Warranty and liability disclaimer

This derivative model and associated files are provided as-is and without warranties or conditions of any kind, express or implied, including without limitation warranties of merchantability, fitness for a particular purpose, title, non-infringement, accuracy, availability, or error-free operation.

To the maximum extent permitted by applicable law, the repository owner is not liable for any direct, indirect, incidental, special, consequential, exemplary, punitive, or other damages arising from or related to use of this repository, the derivative model, or model outputs.

Nothing in this README is legal advice. You are responsible for reviewing and complying with the applicable license terms and laws.

Downloads last month
67
Safetensors
Model size
0.5B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit