docs: add MLX 4-bit format, update quant links in model card
Browse files
README.md
CHANGED
|
@@ -13,6 +13,9 @@ tags:
|
|
| 13 |
- gguf
|
| 14 |
- llama-cpp
|
| 15 |
- mlx
|
|
|
|
|
|
|
|
|
|
| 16 |
base_model: Qwen/Qwen2.5-7B-Instruct
|
| 17 |
pipeline_tag: text-generation
|
| 18 |
---
|
|
@@ -52,10 +55,9 @@ Reasoning Trace 기반 구조화된 투자 분석을 수행합니다.
|
|
| 52 |
|
| 53 |
| Format | File | Size | Use Case |
|
| 54 |
|--------|------|------|----------|
|
| 55 |
-
| **BF16** (safetensors) | `model-*.safetensors` | 14.5 GB | Full precision, GPU inference |
|
| 56 |
-
| **GGUF Q4_K_M** | `vela-dpo-v6-q4_k_m.gguf` | 4.4 GB |
|
| 57 |
-
|
| 58 |
-
> MLX 4-bit 양자화 모델은 별도 레포에서 제공 예정 (Apple Silicon 최적화)
|
| 59 |
|
| 60 |
---
|
| 61 |
|
|
@@ -300,7 +302,10 @@ print(outputs[0].outputs[0].text)
|
|
| 300 |
```python
|
| 301 |
from mlx_lm import load, generate
|
| 302 |
|
| 303 |
-
|
|
|
|
|
|
|
|
|
|
| 304 |
|
| 305 |
response = generate(
|
| 306 |
model,
|
|
|
|
| 13 |
- gguf
|
| 14 |
- llama-cpp
|
| 15 |
- mlx
|
| 16 |
+
- apple-silicon
|
| 17 |
+
- 4bit
|
| 18 |
+
- quantized
|
| 19 |
base_model: Qwen/Qwen2.5-7B-Instruct
|
| 20 |
pipeline_tag: text-generation
|
| 21 |
---
|
|
|
|
| 55 |
|
| 56 |
| Format | File | Size | Use Case |
|
| 57 |
|--------|------|------|----------|
|
| 58 |
+
| **BF16** (safetensors) | [`model-*.safetensors`](https://huggingface.co/intrect/VELA/tree/main) | 14.5 GB | Full precision, GPU inference |
|
| 59 |
+
| **GGUF Q4_K_M** | [`vela-dpo-v6-q4_k_m.gguf`](https://huggingface.co/intrect/VELA/blob/main/vela-dpo-v6-q4_k_m.gguf) | 4.4 GB | llama.cpp / Ollama / LM Studio |
|
| 60 |
+
| **MLX 4-bit** | [`mlx-int4/`](https://huggingface.co/intrect/VELA/tree/main/mlx-int4) | 4.0 GB | Apple Silicon (M1/M2/M3/M4) |
|
|
|
|
| 61 |
|
| 62 |
---
|
| 63 |
|
|
|
|
| 302 |
```python
|
| 303 |
from mlx_lm import load, generate
|
| 304 |
|
| 305 |
+
# HF에서 mlx-int4 폴더만 다운로드
|
| 306 |
+
from huggingface_hub import snapshot_download
|
| 307 |
+
mlx_path = snapshot_download("intrect/VELA", allow_patterns="mlx-int4/*")
|
| 308 |
+
model, tokenizer = load(f"{mlx_path}/mlx-int4")
|
| 309 |
|
| 310 |
response = generate(
|
| 311 |
model,
|