Model Card for Illustrious Spatial Grammar LoRA

This model is a specialized Low-Rank Adaptation (LoRA) designed to power the Illustrious Studio Engine. It interprets natural language scene descriptions and translates them into a highly structured, token-minimum spatial layout grammar for 3D world generation and dynamic WebGL/WebGPU canvases.

Model Details

Model Description

The Illustrious Spatial LoRA extracts explicit spatial relationships from human language—mapping distances, structural orientations, and morph behaviors directly into a customized bounding coordinate array: [X, Y, Z, Pitch, Yaw, Roll, Scale].

Instead of guessing arbitrary floats, the engine operates on relative bounding variables (e.g., fw for Full Width, fh for Full Height) to stack, space, and anchor geometry dynamically. It understands compound placement rules ("tucked slightly behind"), rotational operators (sym(X) for symmetric mirroring), absolute overrides (abs), and primitive instantiation ([sphere], [box], [mesh]).

Model Sources

Uses

Direct Use

The primary function of this model is to operate inside a multi-threaded web worker environment or headless GPU cluster, intercepting textual prompts to dynamically layout 3D objects, point clouds, and environments.

It handles:

  1. Scene modification: Altering specific features of a map (like a winter to summer transition) by generating structural layout tokens.
  2. Dynamic asset placement: Placing Objaverse models contextually (e.g., generating layout tokens to "insert a bicycle behind the bench").
  3. Procedural scene generation: Turning descriptive text into layout meshes, CSG booleans, and localized shader attributes (like noise).

Out-of-Scope Use

This model is heavily fine-tuned to generate spatial arrays and layout syntax. It is not intended for conversational chatting, general Q&A, or creative writing outside the scope of 3D environment construction.

Bias, Risks, and Limitations

Because the model translates open-ended language into rigid spatial arrays, ambiguous prompts may default to origin overlap if explicit human-scale distance descriptors are missing. The base model is abliterated, meaning it will comply with spatial generation requests for any environment description without refusal logic.

Recommendations

Users should provide clear relational anchors when prompting. Use vocabulary mapped to the engine's distance multipliers (e.g., "flush with", "slightly overlapping", "diagonally back-left") for the most predictable structural layouts.

How to Get Started with the Model

The model expects natural language input and outputs a sequence of primitive targets with bounding coordinates.

Example Prompt:

"Two stone pillars holding up a heavy steel crossbeam mesh across the top."

Expected Output:

[cylinder][-3fw,0,0,0,0,0,1.5][cylinder][3fw,0,0,0,0,0,1.5][mesh][@0,@1][0,0,fh,0,0,0,[6.2,0.4,0.2]]

Loading the Model (Python/Transformers):

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Goekdeniz-Guelmez/Josiefied-Qwen2.5-0.5B-Instruct-abliterated-v1"
adapter_id = "BrianJamesCullinan/illustrious-spatial-qwen2.5-0.5b"

tokenizer = AutoTokenizer.from_pretrained(base_id)
base_model = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(base_model, adapter_id)

prompt = "A fuzzy sphere resting on top of a cube."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The training dataset comprises synthetic layout chains mapping descriptive English spatial positioning to the Illustrious dimension alias syntax. It heavily emphasizes:

  • Anchors: hw (half width), fd (full depth), fh (full height).
  • Global & Target Pointers: @0 (global root), @idx (iteration index).
  • Boolean Multipliers: -% for CSG subtractions / depressed geometry.
  • Modifiers: surf (raycast alignment), twist, taper, noise.

Training Procedure

Preprocessing

Data was formatted into conversational turns where the user describes an object or scene, and the assistant responds exclusively in the 7-parameter spatial array format ([X, Y, Z, Pitch, Yaw, Roll, Scale]).

Training Hyperparameters

  • Training regime: bf16 mixed precision
  • Target Modules: Q, K, V, O projections (Attention layers)
  • Hardware: Headless Google Cloud Spot instances (Nvidia Tesla T4) orchestrated via the Illustrious dynamic cluster manager.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Validation sets consisted of complex structural alignments, such as mirrored facial features or architectural supports, to ensure the model correctly triggers symmetric (sym(X)) and interpolative (@0, @1) operators.

Results & Output Examples

The model reliably reproduces the following geometric behaviors:

Target Descriptor Token Output Geometric / Morph Behavior
Elongated Sphere (Anchor) [sphere][0,0,0,0,0,0,[1,1,1.5]] Base structure scaled taller on the Z-axis.
Fuzzy stuff on top [sphere][0,0,fh,0,0,0,[1,1,0.2],noise=0.3] Flattened canopy snapped to the full height with vertex displacement.
Two smaller flat spheres on side [sphere][sym(fw),0,hh,0,0,0,0.2] Symmetrically mirrored across the lateral profile at vertical midpoint.
Rounded cone in middle front [cone][0,hd,hh,0,0,0,0.25] Centered on X, pushed forward along the longitudinal face.

Summary

The model successfully decouples human descriptors like "peeking out from behind" into accurate math logic: offset back on Y, up on Z, shifted laterally on X, and scaled down by half ([hw, -hd, hh, 0, 0, 0, 0.5]).

Environmental Impact

  • Hardware Type: Nvidia Tesla T4
  • Cloud Provider: Google Cloud Platform (Dynamic Spot Instances)
  • Compute Region: us-central1-a / multi-region failover matrix

Technical Specifications

Model Architecture and Objective

A PEFT/LoRA adapter applied to a 0.5B parameter Qwen2.5 base model. The objective is deterministic syntax translation for rendering spatial environments on client-side GPUs or WebAssembly canvas layers.

Framework versions

  • PEFT 0.19.1
Downloads last month
10
GGUF
Model size
0.6B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for megamindbrian/josiefied-qwen-spatial-engine