Spaces:
Configuration error
Update README.md
Models & services
LayerModel / ServiceRoleGame logicCodex / GPT-5.5 (OpenAI)Stat block + behavior code generationConcept artFLUX.2 Klein 9B (Black Forest Labs)Multi-angle reference image from prompt3D generationHunyuan3D-2.1 (Tencent, 32B)PBR mesh from image-conditioned inputComputeModal (serverless GPU)Autoscaling inference β no cold-start painFrontendThree.js + GradioBrowser game loop + real-time 3D viewerSandboxModal SandboxesSafe execution of Codex-generated game code
Pipeline diagram
User Prompt
β
ββββΊ [Codex / GPT-5.5]
β β
β ββββΊ Stat Block JSON βββββββββββββββββββββββ
β Behavior Code βββΊ Modal Sandbox ββββ β
β β
ββββΊ [FLUX.2 Klein 9B] βββΊ Concept Image (512Γ512) β
β β β β
β ββββββββββββββββββββββββΊ [Hunyuan3D-2.1] β
β β β
β PBR Mesh (GLB) β
β β β
ββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββ
β
Three.js Game Scene
(orbit / inspect / play)
Architecture deep-dive
- Prompt ingestion & enrichment
The raw user prompt is passed through a lightweight preprocessing step that:
Injects RPG-specific context tokens (ITEM:, CHARACTER:, ENVIRONMENT:)
Detects asset type (weapon / character / prop / environment) via keyword classification
Expands vague descriptors using a small synonym/adjective bank (e.g. "magic sword" β "enchanted longsword with runic inscriptions and faint blue aura")
No separate LLM call needed β this runs client-side in Python with a 200-line rule engine.
- Codex / GPT-5.5 β game logic generation
Endpoint: POST https://api.openai.com/v1/responses (Codex agent mode)
Plugin: Hugging Face plugin for asset lookup; GitHub plugin for stat template retrieval
The prompt is structured as:
pythonsystem = """
You are a tabletop RPG game designer.
Given an item/character description, output ONLY valid JSON:
{
"name": str,
"type": "weapon" | "character" | "environment",
"stats": { "atk": int, "def": int, "spd": int, "mag": int },
"abilities": [{"name": str, "description": str, "cost": int}],
"lore": str (1 sentence),
"behavior_code": str (JavaScript, Three.js compatible)
}
"""
behavior_code is a self-contained JS function that defines how the asset animates or responds to player interaction in the Three.js scene. It is executed inside a Modal Sandbox (isolated container) before being injected into the browser β preventing arbitrary code execution on the client.
- FLUX.2 Klein 9B β concept art
Model: black-forest-labs/FLUX.2-Klein-distilled-9B
Hosted on: Modal A10G GPU (cold start ~4s, inference ~8s)
python@modal.function(gpu="A10G", image=flux_image)
def generate_concept(prompt: str) -> bytes:
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.2-Klein-distilled-9B")
prompt_full = f"RPG game asset concept art, {prompt}, front view, clean white background, detailed, 4K"
image = pipe(prompt_full, num_inference_steps=20, guidance_scale=3.5).images[0]
return image_to_bytes(image)
The image is returned as a 512Γ512 PNG and displayed in the Gradio UI immediately β so the user sees concept art while 3D generation runs in parallel.
- Hunyuan3D-2.1 β 3D mesh generation
Model: tencent/Hunyuan3D-2.1 (32B, image-conditioned mode)
Hosted on: Modal A100 GPU (80GB) β image-conditioned path is faster than text-only
Output: GLB with PBR maps (albedo, roughness, metallic, normal)
python@modal.function(gpu="A100", image=hunyuan_image, timeout=120)
def generate_3d(concept_image_bytes: bytes, prompt: str) -> bytes:
from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline
from hy3dgen.texgen import Hunyuan3DPaintPipeline
shape_pipe = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained("tencent/Hunyuan3D-2.1")
tex_pipe = Hunyuan3DPaintPipeline.from_pretrained("tencent/Hunyuan3D-2.1")
image = load_image(concept_image_bytes)
mesh = shape_pipe(image=image, prompt=prompt, num_inference_steps=30)
mesh = tex_pipe(mesh) # applies PBR texture bake
return export_glb(mesh) # returns GLB bytes
Using the concept image as conditioning (rather than raw text) consistently produces cleaner topology and better texture alignment β this is the key quality unlock vs. text-only 3D generation.
- Three.js browser scene
The GLB is loaded via THREE.GLTFLoader into a minimal browser game loop:
javascript// Injected into the Gradio HTML component
const loader = new THREE.GLTFLoader();
loader.load(assetUrl, (gltf) => {
scene.add(gltf.scene);
// Run sandboxed behavior code
const behaviorFn = new Function('scene', 'asset', behaviorCode);
behaviorFn(scene, gltf.scene);
});
The scene includes:
Orbit controls (rotate / zoom / pan)
PBR environment lighting (HDR studio preset)
Stat card overlay (HTML positioned over the canvas)
GLB download button
"Add to party" button β persists the asset to session state for multi-asset scenes
- Modal Sandboxes β safe code execution
Codex-generated behavior_code is never executed directly in the browser. It runs through a Modal Sandbox first:
python@modal.function()
def validate_behavior_code(code: str) -> dict:
sandbox = modal.Sandbox.create(
"python:3.11-slim",
timeout=10,
network_access=False, # no outbound calls
)Static analysis + dry-run
result = sandbox.exec("python", "-c", f"import ast; ast.parse({repr(code)})")
sandbox.terminate()
return {"safe": result.returncode == 0, "code": code}
Only validated code reaches the client. This keeps the sandbox prize track happy and prevents XSS via generated game logic.
Repository structure
promptforge-rpg/
βββ app.py # Gradio entrypoint
βββ pipeline/
β βββ prompt_enricher.py # Rule-based prompt preprocessing
β βββ codex_agent.py # GPT-5.5 stat block + code generation
β βββ flux_gen.py # FLUX.2 Klein concept art (Modal)
β βββ hunyuan_gen.py # Hunyuan3D-2.1 mesh generation (Modal)
β βββ sandbox.py # Modal Sandbox behavior code validation
βββ frontend/
β βββ scene.js # Three.js game scene
β βββ stat_card.js # Stat block overlay component
β βββ index.html # Injected into Gradio HTML block
βββ modal_stubs/
β βββ flux_stub.py # Modal function definitions (FLUX)
β βββ hunyuan_stub.py # Modal function definitions (Hunyuan3D)
βββ tests/
β βββ test_pipeline.py
β βββ test_sandbox.py
βββ requirements.txt
βββ README.md # β you are here
Quickstart
- Clone and install
bashgit clone https://huggingface.co/spaces//promptforge-rpg
cd promptforge-rpg
pip install -r requirements.txt - Configure secrets
In your HF Space settings β Repository secrets, add:
SecretValueOPENAI_API_KEYYour OpenAI key (Codex / GPT-5.5)MODAL_TOKEN_IDModal token IDMODAL_TOKEN_SECRETModal token secret - Deploy Modal functions
bashmodal deploy modal_stubs/flux_stub.py
modal deploy modal_stubs/hunyuan_stub.py - Launch locally
bashpython app.py
β http://localhost:7860
- Push to HF Space
bashgit add .
git commit -m "initial deploy"
git push
API reference
POST /generate β full pipeline
json{
"prompt": "a rusted mace with bone spikes dripping black ichor",
"asset_type": "weapon", // optional β auto-detected if omitted
"style": "dark fantasy", // optional β defaults to "fantasy"
"output_format": "glb" // "glb" | "obj" | "usdz"
}
Response:
json{
"name": "Bonecrusher's Blight",
"stats": { "atk": 18, "def": 4, "spd": 6, "mag": 2 },
"abilities": [
{ "name": "Ichor Burst", "description": "Poisons on hit for 3 turns", "cost": 2 }
],
"lore": "Forged in the marrow pits beneath the Ashfeld Fortress.",
"concept_art_url": "https://.../concept.png",
"model_url": "https://.../asset.glb",
"behavior_code": "function animate(scene, asset) { ... }"
}
Performance benchmarks
StepGPUTimePrompt enrichmentCPU0.1sCodex stat blockAPI2β4sFLUX.2 Klein concept artA10G8β12sHunyuan3D-2.1 meshA100 (80GB)35β55sThree.js scene loadBrowser1β2sEnd-to-endβ45β70s
FLUX and Hunyuan3D run in parallel after the stat block is returned, so the user sees the concept art at ~12s and the 3D model arrives ~40s later.
Prize eligibility
TrackPartnerQualifierBest Use of ModalModalInference + training + Sandboxes all usedCodex / OpenAI trackOpenAIGPT-5.5 Codex agent with HF + GitHub pluginsBest FLUX Build (if nominated)Black Forest LabsFLUX.2 Klein 9B for concept image generation
Known limitations & roadmap
Current limitations:
Characters with complex rigs (humanoids) produce lower-quality topology than props/weapons β image-conditioned Hunyuan3D works best on objects
Behavior code sandbox validation adds ~3s latency
Multi-asset party scenes (3+ meshes) can drop below 30fps in-browser on integrated GPU
Roadmap (post-hackathon):
Fine-tune FLUX.2 Klein on RPG concept art LoRA (ai-toolkit)
Add MiniCPM-V 4.6 for sketch-to-3D input path
Persist party to IndexedDB for multi-session campaigns
Export full scene as .zip (GLBs + stat JSONs + behavior scripts)
Multiplayer lobby via HF Spaces Persistent Storage
License
Apache 2.0 β models used are Apache 2.0 (FLUX.2 Klein, Hunyuan3D-2.1) or accessed via API (Codex/GPT-5.5, Modal).