docs: add MLX 4-bit format, update quant links in model card

Files changed (1) hide show

README.md CHANGED Viewed

@@ -13,6 +13,9 @@ tags:
   - gguf
   - llama-cpp
   - mlx
 base_model: Qwen/Qwen2.5-7B-Instruct
 pipeline_tag: text-generation
 ---
@@ -52,10 +55,9 @@ Reasoning Trace 기반 구조화된 투자 분석을 수행합니다.
 | Format | File | Size | Use Case |
 |--------|------|------|----------|
-| **BF16** (safetensors) | `model-*.safetensors` | 14.5 GB | Full precision, GPU inference |
-| **GGUF Q4_K_M** | `vela-dpo-v6-q4_k_m.gguf` | 4.4 GB | Fast & lightweight, GPU/CPU |
-> MLX 4-bit 양자화 모델은 별도 레포에서 제공 예정 (Apple Silicon 최적화)
 ---
@@ -300,7 +302,10 @@ print(outputs[0].outputs[0].text)
 ```python
 from mlx_lm import load, generate
-model, tokenizer = load("intrect/VELA")  # or local MLX 4-bit path
 response = generate(
     model,

   - gguf
   - llama-cpp
   - mlx
+  - apple-silicon
+  - 4bit
+  - quantized
 base_model: Qwen/Qwen2.5-7B-Instruct
 pipeline_tag: text-generation
 ---
 | Format | File | Size | Use Case |
 |--------|------|------|----------|
+| **BF16** (safetensors) | [`model-*.safetensors`](https://huggingface.co/intrect/VELA/tree/main) | 14.5 GB | Full precision, GPU inference |
+| **GGUF Q4_K_M** | [`vela-dpo-v6-q4_k_m.gguf`](https://huggingface.co/intrect/VELA/blob/main/vela-dpo-v6-q4_k_m.gguf) | 4.4 GB | llama.cpp / Ollama / LM Studio |
+| **MLX 4-bit** | [`mlx-int4/`](https://huggingface.co/intrect/VELA/tree/main/mlx-int4) | 4.0 GB | Apple Silicon (M1/M2/M3/M4) |
 ---
 ```python
 from mlx_lm import load, generate
+# HF에서 mlx-int4 폴더만 다운로드
+from huggingface_hub import snapshot_download
+mlx_path = snapshot_download("intrect/VELA", allow_patterns="mlx-int4/*")
+model, tokenizer = load(f"{mlx_path}/mlx-int4")
 response = generate(
     model,