Instructions to use masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit
masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit is an Apple MLX 4-bit quantized derivative of lianghsun/Llama-3.2-Taiwan-3B-Instruct.
This chat/instruct conversion is prepared from the upstream safetensors model, not by converting GGUF weights back to another format. The non-instruct GGUF repository originally referenced was QuantFactory/Llama-3.2-Taiwan-3B-GGUF; this repository publishes the Instruct variant for better chat behavior.
This conversion is intended for local inference on Apple Silicon using MLX / MLX-LM. It preserves the upstream model architecture, tokenizer, and chat template files, while quantizing supported linear weights to 4-bit for a smaller memory footprint.
Attribution and license
- Upstream Taiwan Instruct model:
lianghsun/Llama-3.2-Taiwan-3B-Instruct - Original base model:
meta-llama/Llama-3.2-3B-Instruct - Upstream license: Llama 3.2 Community License (
llama3.2) - License file: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/blob/main/LICENSE.txt
- This repository is a derivative conversion/quantization and is not the original Meta Llama or Llama-3.2-Taiwan release.
A NOTICE file is included in this repository. By using, copying, modifying, redistributing, deploying, or making this derivative model available to others, you are responsible for complying with all applicable upstream terms, including the Llama 3.2 Community License and any additional terms, notices, access requirements, usage instructions, export-control, sanctions, or other legal requirements that apply to the upstream models.
No affiliation, sponsorship, endorsement, or trademark grant
This repository is independently prepared and published by the repository owner. It is not affiliated with, sponsored by, approved by, or endorsed by Meta, Llama, lianghsun, QuantFactory, or their affiliates unless they explicitly state otherwise.
The names "Llama", "Meta", "Llama-3.2-Taiwan", "lianghsun", "QuantFactory", and related marks are used here only for reasonable descriptive attribution and identification of the upstream source/base models. No trademark license or other rights in those marks are granted by this repository.
Conversion details
- Source:
lianghsun/Llama-3.2-Taiwan-3B-Instruct - Related request reference:
QuantFactory/Llama-3.2-Taiwan-3B-GGUF - Format: MLX / MLX-LM
- Quantization: 4-bit
- Group size: 64
Equivalent conversion command:
python -m mlx_lm convert \
--hf-path lianghsun/Llama-3.2-Taiwan-3B-Instruct \
--mlx-path ./Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit \
-q \
--q-bits 4 \
--q-group-size 64
Usage
Install MLX-LM:
pip install -U mlx-lm
Chat/generate with MLX-LM:
mlx_lm.generate \
--model masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit \
--prompt "請用繁體中文簡短介紹台灣的夜市文化。" \
--max-tokens 160
Python example using the chat template:
from mlx_lm import load, generate
model, tokenizer = load("masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit")
messages = [{"role": "user", "content": "請用繁體中文簡短介紹台灣的夜市文化。"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=160)
print(response)
Important safety, quality, and compliance notes
Quantization may change model behavior, quality, robustness, refusal behavior, calibration, and safety characteristics compared with the upstream model. No claim is made that this derivative is equivalent to, safer than, or better than the upstream model.
Large language model outputs may be inaccurate, unsafe, biased, offensive, incomplete, or unsuitable for your intended use. You are responsible for evaluating outputs and for complying with applicable laws, licenses, policies, and platform rules.
Warranty and liability disclaimer
This derivative model and associated files are provided as-is and without warranties or conditions of any kind, express or implied, including without limitation warranties of merchantability, fitness for a particular purpose, title, non-infringement, accuracy, availability, or error-free operation.
To the maximum extent permitted by applicable law, the repository owner is not liable for any direct, indirect, incidental, special, consequential, exemplary, punitive, or other damages arising from or related to use of this repository, the derivative model, or model outputs.
Nothing in this README is legal advice. You are responsible for reviewing and complying with the applicable license terms and laws.
- Downloads last month
- 67
4-bit
Model tree for masato25/Llama-3.2-Taiwan-3B-Instruct-Arbor-4bit
Base model
meta-llama/Llama-3.2-3B