Qwen3-Coder-30B-A3B — IES4 Turtle Generation (research prototype)

A LoRA fine-tune of mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit that generates IES4 (UK Government Information Exchange Standard) RDF/Turtle from natural-language scenarios, and follows a supplied target ontology for general knowledge-graph extraction. To our knowledge this is the first openly published LLM fine-tune targeting IES4 (checked against the Hugging Face API, GitHub and arXiv at release time). It is a research prototype: validate all output before production use.

IES4 is a 4D ontology specified as an RDF Schema, developed by UK Government (Dstl, MOD, Home Office, Metropolitan Police, HMRC, DBT) with technical support from Telicent and Aurora Consulting. Repo: dstl/IES4.

Why fine-tune at all?

The untuned base model cannot produce real IES4: 93.7% of the ies: terms it emits do not exist in the ontology (0% term conformance). After LoRA:

Metric (held-out, in-distribution)	Base model	This model
Syntactic validity	93.2%	95.5%
IES4 term conformance	0.0%	88.6%
Hallucinated-term rate	0.937	0.010
Structural conformance (domain/range)	0.932*	0.955
Namespace fidelity (when instructed)	—	100%

Out-of-distribution (real dstl sample-data scenarios)	Base	This model
Syntactic validity	90.0%	70.0%
IES4 term conformance	0.0%	30.0%
Structural conformance	0.900*	0.640

Ontology-conditioned extraction (Text2KGBench slice)	Base	This model
Syntactic validity	50.0%	91.7%
Relation conformance	75.0%	91.7%
IES-vocabulary bleed	0%	0%

* Baseline structural numbers are inflated: with mostly hallucinated vocabulary there are few checkable property usages. Metrics follow the spirit of Text2KGBench: validity, conformance, hallucination. Eval code ships with the dataset repo; the OOD row is deliberately reported although it is the model's weakest surface.

Training data (correct-by-construction)

1,589 IES pairs: graphs built programmatically with telicent-ies-tool across 14 scenario patterns (employment, birth/death, events, identifiers, ownership, posts, location-states, access, possession, communication, composites), human-plausible instance IRIs, 35% namespace-varied with explicit namespace instructions. Every graph passed BOTH the telicent validation AND an independent term-membership validator built from the published ontology (510 classes, 204 properties). Descriptions are deterministic plus fact-checked local-LLM paraphrases (paraphrases dropping any name or year were discarded).
210 vocabulary/boundary pairs: class and property definitions verbatim from the ontology, plus refusal examples teaching what IES4 cannot express (opinions, speculation, causal claims).
448 ontology-conditioned extraction pairs from Text2KGBench (Wikidata-TekGen), predicates restricted to each domain ontology.

Split by target graph (no paraphrase leakage); OOD test set = descriptions of the real dstl sample-data files, never trained on.

Method

LoRA (16 layers) via mlx-lm 0.31.3 on Apple Silicon (M3 Max), QLoRA on the 8-bit MoE base, 1,000 iterations, batch 2, seq 2048, final val loss 0.15. The repo contains the fused 8-bit MLX model; the raw adapter is in adapters/ for applying to the bf16 base with other toolchains.

Usage (MLX)

pip install mlx-lm
python -m mlx_lm generate --model fabsssss/qwen3-coder-30b-a3b-ies4 --max-tokens 600 --prompt \
  "Encode the following scenario as IES4 RDF/Turtle. Use only real IES4 terms and the
  4D state/period pattern where relevant. Output only Turtle.

  Scenario: Priya Patel has worked for Meridian Bank since 2019-03-01 and attended a
  security briefing at Heathrow Terminal 4 on 2024-05-02 from 09:00 to 11:00."

Limitations

Out-of-distribution performance (rich, idiomatic IES exchanges) is markedly lower than in-distribution; treat complex outputs as drafts for expert review.
Coverage: measures, representation/document patterns and intelligence-assessment structures are under-represented.
MLX 8-bit format; GGUF conversion not yet provided. Use the adapter on the bf16 base if you need other runtimes.
Always validate output (e.g. with telicent-ies-tool or the shipped validator) before exchange.

Training data licensing & attribution

IES4 ontology: MIT, © Crown copyright, Defence Science and Technology Laboratory (Dstl). This model card retains that notice.
telicent-ies-tool: used only to generate training graphs (library not redistributed).
~448 training pairs derive from Text2KGBench (data licence CC BY-SA 4.0; sources Wikidata-TekGen / DBpedia-WebNLG). Attribution: "Data derived from Text2KGBench (Mihindukulasooriya, Tiwari, Enguix, Lata; ISWC 2023), licensed CC BY-SA 4.0." The published dataset repo marks that slice separately under CC BY-SA 4.0.
Model weights: MIT. Weight releases are not, on current consensus, derivative works of training data; attribution obligations above are honoured regardless.

Provenance

Built and adversarially red-teamed (dataset design, eval integrity, licensing, tooling) before release; the eval harness and dataset are published for reproduction. By The Tesseract Academy.

Downloads last month: -

Safetensors

Model size

31B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fabsssss/qwen3-coder-30b-a3b-ies4

Base model

Qwen/Qwen3-Coder-30B-A3B-Instruct

Quantized

mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit

Adapter

(1)

this model

Paper for fabsssss/qwen3-coder-30b-a3b-ies4

Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text

Paper • 2308.02357 • Published Aug 4, 2023 • 3