Instructions to use maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF", filename="Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: llama cli -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: llama cli -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: ./llama-cli -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
Use Docker
docker model run hf.co/maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
- LM Studio
- Jan
- Ollama
How to use maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF with Ollama:
ollama run hf.co/maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
- Unsloth Studio
How to use maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF to start chatting
- Pi
How to use maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF with Docker Model Runner:
docker model run hf.co/maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
- Lemonade
How to use maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
Run and chat with the model
lemonade run user.Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF-{{QUANT_TAG}}List all available models
lemonade list
Ornstein-3.5-9B-V2 ROCmFPX STRIX_LEAN — GGUF
ROCmFPX Q4_0_ROCMFP4_STRIX_LEAN quant of GestaltLabs/Ornstein-3.5-9B-V2 (Qwen3.5 9B + RLVR/GRPO post-training, 9.2 B params, native multimodal with vision tower, single-file MTP head for speculative decoding).
Built with charlie12345/ROCmFPX on a Radeon RX 9060 XT 16 GB (gfx1200), ROCm 7.2.3, NixOS 25.11. Quantized 2026-06-28 with build commit 11d76c2.
| File | Size | Quant | BPW |
|---|---|---|---|
Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN.gguf |
4.84 GB | Q4_0_ROCMFP4_STRIX_LEAN (4-bit ROCmFP4 + Strix K/V + Q5_K embed) |
4.42 |
This is not a stock llama.cpp quant; you need a ROCmFPX build of llama-server / llama-cli / llama-quantize to load it. The ROCmFP4 weight format is unknown to stock llama.cpp and will fail with unknown quantization.
Multimodal note: this is a vision-capable model. To use the vision tower, also load the mmproj-ornstein-v2-f16.gguf (~921 MB) companion file via --mmproj <path>. The companion is published separately by GestaltLabs in Ornstein-3.5-9B-V2-GGUF. Vision is verified to work in mesh_eval (1×1 test pixel identified as "Red").
Scope of these benchmarks — read this first
These numbers are a light baseline, not a thorough ROCmFPX evaluation. The mesh's bench framework is built for production agent workload regression-detection on the local stack, not for the kind of multi-axis sweep that upstream quant maintainers typically publish. Specifically:
- Harness scope is bounded. The numbers below come from the mesh's
mesh_eval(6 tests, 4 deterministic + throughput + vision) +hermes_loop_eval(5 agent scenarios) + actx_scaling_benchrun at 4 K → 32 K (64 K+ blocked by harness HTTP timeout, not model capability). That's a regression suite, not a quality benchmark — it answers "does this quant still serve the mesh's agent stack correctly," not "is this the best possible 4-bit ROCmFP4 quant of this model." - Sample sizes are small. Throughput numbers are 3 reps on a single GPU; hermes_loop is 5 scenarios with one-shot generation. None are powered for statistical significance on a per-token level.
- No perplexity / wikitext / MMLU / GSM8K. The mesh's stack isn't a quality benchmark — those are upstream ROCmFPX's territory. If you need a quality signal, charlie's own validation ladder or an
lm-eval-harnessrun is the right tool. (Note: GestaltLabs's own published GPQA / reasoning numbers on the parent V2 model are 1.00 / 1.00 on the GBS-200 suite — those are upstream's numbers, not ours.) - Single GPU class. All measurements are on a 16 GB RDNA4 (RX 9060 XT, gfx1200). No Strix unified-memory, no CDNA, no multi-GPU, no Vulkan, no CUDA. Cross-hardware generalization is not implied.
- No human eval. "Faster and same-coherent on the regression tests" is not a quality verdict on this specific quant.
What this IS good for: a quick signal that the quant (a) loads, (b) runs at sane throughput, (c) doesn't break the mesh's agent tool-calling, (d) handles the vision path, (e) scales predictably with context. What this is NOT good for: claiming "this is the best quant of this model," reproducing academic benchmark results, or substituting for upstream's validation work.
For a rigorous view, the parent repo GestaltLabs/Ornstein-3.5-9B-V2 (which itself includes a published GBS-200 benchmark table) and the model's stock GGUF variants on GestaltLabs/Ornstein-3.5-9B-V2-GGUF and mradermacher/Ornstein-3.5-9B-V1.5-i1-GGUF are the place to look.
What we measured
Hardware: Node B, AMD Ryzen 9 5900XT 16-core, Radeon RX 9060 XT 16 GB (gfx1200), ROCm 7.2.3, NixOS 25.11
Software: charlie12345/ROCmFPX main @ 11d76c2
Source GGUF: ornstein-v2-f16.gguf (F16, 18.4 GB — includes the single-file MTP draft head baked into the same file) sourced from GestaltLabs/Ornstein-3.5-9B-V2-GGUF
Companion file for vision: mmproj-ornstein-v2-f16.gguf (~921 MB) — not part of this upload, see GestaltLabs repo
Same-stack comparison: none — the only same-source Q3_0_ROCMFPX quant for Ornstein we have is a baseline reference, not a same-harness A/B (different sampler settings, different time). The headline below is the model itself, not a comparison.
Throughput (mesh_eval, 4 K ctx, MTP-ON, turbo4 KV, rep_pen=1.1)
3 reps of 256-token completion, gen t/s mean 47.2 ± 0.4 (45.3, 47.5, 47.4 individual reps). This matches Vibetuned STRIX_LEAN's 47-48 t/s range on the same Node B card despite Ornstein being 40 % smaller (4.84 GB vs 7.0 GB).
Agent / loop validation (raw JSON: raw-hermes-loop-ornstein-v2-strix-lean-reppen1.1.json)
mesh_eval.py 4 deterministic + vision (raw-mesh-eval-ornstein-v2-strix-lean.json):
| Test | Result |
|---|---|
gibberish (no degenerate repetition) |
OK (47 words, 0 repeated chars) |
thinking_leak (no <think> leakage) |
CLEAN |
tool_calling (single call) |
PASS — get_weather(location=Tokyo) |
coding (merge_sorted_lists) |
PASS — correct two-pointer impl, tests pass |
uncensored (no refusal) |
PASS — ss -tuln answer |
throughput (3×256-token gen) |
47.2 t/s mean, ±0.4 stdev |
vision (1×1 test pixel) |
PASS — identifies "Red" |
overall_status |
PASS, 4/4 + vision |
hermes_loop_eval.py 5 scenarios with rep_pen=1.1 (raw-hermes-loop-ornstein-v2-strix-lean-reppen1.1.json):
| Scenario | Result | avg t/s |
|---|---|---|
single (one tool call) |
PASS — final answer correct | 33.9 |
chained (calc → use result) |
PASS — 15 × 37 = 555 |
27.9 |
multi_step (compare 2 cities) |
PASS — table + conclusion | 39.7 |
search (web search + extract) |
PASS — Eiffel Tower height | 25.6 |
error_recovery (file not found) |
PASS — clean | 25.4 |
overall_status |
PASS, 5/5 | mean 30.5 |
rep_pen=1.1 is required for this model (and for Fable5, the other RLVR/GRPO-trained model in the mesh's stack). Without it, the model loops on chained and multi_step scenarios — the same tool-loop pattern the mesh's Fable5 work hit. With rep_pen=1.1 the model passes 5/5. The baseline (default sampler) was 3/5; that raw JSON is included as raw-hermes-loop-ornstein-v2-strix-lean-baseline.json for reference. The 2 missing scenarios are tool-loop failures, not quant defects.
Context scaling (raw JSON: ctx-scaling-ornstein-v2-strix-lean-20260628-212553.json)
| Ctx target | pp t/s | tg t/s | Result |
|---|---|---|---|
| 4 K | 1081 | (per server logs) | PASS (per SUMMARY.md) |
| 8 K | 701 | (per server logs) | PASS |
| 16 K | 540 | (per server logs) | PASS |
| 32 K | 1140 | 50.0 | PASS, server healthy |
| 64 K | n/a | n/a | harness HTTP timeout (120s), not a model defect |
| 128 K | n/a | n/a | server OOM at 8 GB cache-ram; resolved with cram=24576 |
Findings:
- 32 K prompt processing holds at 1140 pp t/s — the model handles 32 K comfortably on a 16 GB card with KV offload.
- Decode throughput holds at 50 t/s at 32 K (matches Ornith 9B and Vibetuned 14B on the same card).
- 64 K+ ctx scaling is harness-limited, not model-limited. The harness's 120 s
urlopentimeout blocks measurement before the model can finish. Server health at 128 K is verified separately (withcram=24576, server is healthy at 71 % VRAM and processes 64 K+ prompts in ~7 min). The ctx_scaling_bench harness needs a longer HTTP timeout for proper 64 K+ measurement — that's a separate follow-up, not a model issue. - The 128 K test that initially OOM'd was due to
cache-ram=8192being too small for 128 K; bumping tocache-ram=24576(24 GB DDR4 budget on Node B's 48 GB) resolves it.
KV cache type (head_dim=128, same as Ornith + Vibetuned)
The mesh's KV-type sweep was run on the head_dim=128 Qwen family. turbo4 is the production default for any head_dim=128 model in the ROCmFPX build: -0.7-1.1 GB VRAM, same throughput vs q8_0. See the Ornith 9B ROCmFPX STRIX_LEAN repo for the full sweep data. turbo3/4 are TheTom's turboquant types, absorbed into ROCmFPX main via PlunderStruck commits d859c9e + d0141e8.
Quick start
# Build llama.cpp with ROCmFPX
git clone https://github.com/charlie12345/ROCmFPX
cd ROCmFPX
cmake -S . -B build -DGGML_HIP=ON -DGGML_VULKAN=OFF -DGGML_CUDA=OFF \
-DCMAKE_HIP_ARCHITECTURES=gfx1200 ...
cmake --build build --target llama-server llama-cli llama-quantize
# Download the mmproj companion for vision (separately published by GestaltLabs)
# wget https://huggingface.co/GestaltLabs/Ornstein-3.5-9B-V2-GGUF/resolve/main/mmproj-ornstein-v2-f16.gguf
# Serve (131 072 ctx, turbo4 KV for head_dim=128, fa=on, MTP-ON, rep_pen=1.1)
./build/bin/llama-server \
-m Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN.gguf \
--mmproj mmproj-ornstein-v2-f16.gguf \
-np 1 -c 131072 \
-ctk turbo4 -ctv turbo4 \
-kvo -cram 24576 -fa on \
--spec-type draft-mtp \
--spec-draft-n-max 3 --spec-draft-p-min 0.75 \
--repeat-penalty 1.1
--spec-type draft-mtp is the correct flag, not mtp (the --spec-type mtp form in the upstream GestaltLabs HF model card is a typo; the ROCmFPX llama-server rejects mtp with a list of valid types). This is the same typo pattern that hit the SABER card upstream.
Reproduce the quant
# Source (F16 GGUF with MTP head baked in, from GestaltLabs)
SRC=/mnt/e/llms-models-data/ornstein/ornstein-v2-f16.gguf
# ROCmFPX llama-quantize (preset is built in; see `llama-quantize --help`)
~/ROCmFPX/build-rdna4/bin/llama-quantize \
"$SRC" \
Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN.gguf \
Q4_0_ROCMFP4_STRIX_LEAN
Quantize time: ~4 min for 18.4 GB F16 source, CPU-only, no GPU required.
Files in this repo
| File | What it is |
|---|---|
Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN.gguf |
The quant. Load only with a ROCmFPX llama-server. |
README.md |
This file |
raw-mesh-eval-ornstein-v2-strix-lean.json |
mesh_eval.py output (2026-06-29 01:04 UTC) — 4/4 + vision |
raw-hermes-loop-ornstein-v2-strix-lean-baseline.json |
hermes_loop_eval.py output WITHOUT rep_pen (2026-06-29 01:05 UTC) — 3/5 |
raw-hermes-loop-ornstein-v2-strix-lean-reppen1.1.json |
hermes_loop_eval.py output WITH rep_pen=1.1 (2026-06-29 01:06 UTC) — 5/5 |
ctx-scaling-ornstein-v2-strix-lean-20260628-212553.json |
4 K → 32 K ctx scaling (32 K pp 1140, tg 50) |
ctx-scaling-ornstein-v2-strix-lean-20260628-213758.json |
64 K / 128 K attempt (harness timeout) |
quant-command.sh |
The exact llama-quantize invocation used |
Not in this repo (intentionally): the mmproj-ornstein-v2-f16.gguf (~921 MB) is a separate file published by GestaltLabs in Ornstein-3.5-9B-V2-GGUF. The model card for the parent quant list explicitly says "GGUF" includes the mmproj in the same repo. We don't redistribute it here to avoid a third-party redistribution; download directly from GestaltLabs.
What's NOT in this repo (caveats)
- Stock llama.cpp will not load this file. The ROCmFP4 weight format is unique to charlie12345/ROCmFPX. Use that fork's
llama-server/llama-cli/llama-quantize. - No CUDA / non-AMD GPU bench. All measurements are RDNA4 (gfx1200). Vulkan path on RDNA4 has a known upstream regression (charlie12345/rocmfp4-llama issue #6) — we did not test it.
- 64 K+ ctx scaling is harness-limited, not model-limited. The
ctx_scaling_bench.py120 s HTTP timeout blocks measurement at 64 K. The model itself handles 128 K ctx (verified separately withcram=24576). Proper 64 K+ numbers will require a harness fix (longer timeout or async polling) — that's a separate follow-up. - The source GGUF is GestaltLabs-distributed (per
general.quantized_byin the F16 source metadata). The actual parent isGestaltLabs/Ornstein-3.5-9B-V2(the safetensors model), itself a finetune ofGestaltLabs/Ornstein-3.5-9B-V1.5, itself a finetune ofQwen/Qwen3.5-9B. The chain is: Qwen3.5-9B → V1.5 (SFT) → V2 (DPO + GRPO/RLVR post-training) → GestaltLabs F16 GGUF → our STRIX_LEAN. - 5 GB minimum VRAM for the GGUF alone; 12 GB with KV offload at 128 K. The mesh's 16 GB card runs it with ~3 GB headroom at 128 K ctx.
rep_pen=1.1is mandatory for the agent loop. Without it, the model loops onchainedandmulti_step(3/5 PASS). This is the same Fable5 tool-loop pattern — a property of the RLVR/GRPO SFT family, not the quant. The fix is universal: add--repeat-penalty 1.1to the serve command. (Note: this is unusual for an Apache-2.0 release to require; upstream's HF card does not document it. A friendly bug report to GestaltLabs is the right next step.)--spec-type mtpis a typo in the upstream HF model card. The correct flag for llama-server is--spec-type draft-mtp. Themtpform is rejected with a list of valid types. This is a separate upstream bug.- Vision requires the
mmproj-ornstein-v2-f16.ggufcompanion file. Not bundled in this repo; download fromGestaltLabs/Ornstein-3.5-9B-V2-GGUF. The model card there labels the mmproj asmmproj-ornstein-v2-f16.ggufand notes it covers image + video input. - No MTP / speculative-decode sweep on this file beyond the default
--spec-draft-n-max 3 --spec-draft-p-min 0.75. The mesh's MTP sweep was on Fable5 (Node D CUDA, +81% decode on Fable5 Q8 + MTP). Ornstein MTP settings may have a different optimal; we used the upstream-recommended values. - No quality benchmark (perplexity, MMLU, GSM8K). GestaltLabs's own published GPQA / reasoning numbers on the parent V2 model are 1.00 / 1.00 on the GBS-200 suite (per their HF card) — those are upstream's numbers, not ours.
Provenance
- Source model:
GestaltLabs/Ornstein-3.5-9B-V2— 9.2 B params, Qwen3.5 9B base + DPO + GRPO/RLVR post-training, vision tower + MTP head baked in - Source model license: apache-2.0
- Source GGUF uploader: GestaltLabs (the model authors themselves)
- Companion file:
mmproj-ornstein-v2-f16.gguf(~921 MB) inGestaltLabs/Ornstein-3.5-9B-V2-GGUF(NOT in this repo — see "Files in this repo" above) - Quantizer: charlie12345/ROCmFPX
main@11d76c2(2026-06-27) - Quantizer license: MIT
- Build hardware: Node B, AMD Ryzen 9 5900XT 16-core, Radeon RX 9060 XT 16 GB (gfx1200), ROCm 7.2.3, NixOS 25.11
- Build tooling: NixOS 25.11, ROCm store paths dynamic-discovered. See the
meshinarepo'sreferences/nixos-rocm-external-build-recipe.mdfor the build env setup. - Bench harnesses:
scripts/mesh-bench/mesh_eval.py+scripts/mesh-bench/hermes_loop_eval.py+scripts/mesh-bench/ctx_scaling_bench.pyfrom the meshina repo (private) - Original bench report:
raw/benchmarks/2026-06-28-ornstein-charlie-bench/SUMMARY.mdin the meshina repo (177 lines, full session record + cross-model comparison + 6 caveats) - Research note on 27B Ornstein feasibility:
raw/research/2026-06-28-ornstein-27b-charlie-size-math.md(concludes 27B Ornstein is not feasible on 16 GB at 128 K; defer to 24 GB+ hardware)
License
- The Ornstein 3.5 9B V2 parent is apache-2.0 (per its HF model card).
- The
charlie12345/ROCmFPXquantizer is MIT. - The GGUF in this repo is a derivative of the apache-2.0 parent, produced with the MIT-licensed quantizer. Both upstream licenses are preserved.
- Downloads last month
- 259
We're not able to determine the quantization variants.
Model tree for maczzzzzz/Ornstein-3.5-9B-V2-ROCmFPX-STRIX_LEAN-GGUF
Base model
Qwen/Qwen3.5-9B-Base