PhillNet 2: Official AXIOM Multimodal GPT-OSS Runtime

Official Hugging Face Release

Created By Ayjays132 / Young Philly P. / Phillip A. Holland

A public custom-code multimodal model package built to run text, image, video, speech, audio, and routing from one loaded Transformers model object.

Transformers custom code One model object ImageGen route fixed CUDA bf16 verified KV cache enabled

Ayjays132/phillnet-2 · AutoTokenizer · AutoModelForCausalLM · trust_remote_code=True

Phillnet-2 generated image gallery

Phillnet-2 is an experimental AXIOM multimodal GPT-OSS runtime packaged as a Hugging Face transformers custom-code model. It is designed as a new-model competitor package: one public repository, one primary load path, and a single runtime object that exposes text generation, code guidance, image generation, short video generation, speech synthesis, audio listening, and route inspection.

The model card is written for builders who want to inspect and run the system locally. The examples below use the same public load path a Hugging Face user would use after upload:

repo_id = "Ayjays132/phillnet-2"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)

Phillnet-2 is not presented here as a closed leaderboard claim. It is presented as an integrated multimodal runtime with measured local validation, real generated image examples, bounded lm-eval slices, code smoke testing, public usage code, and a merged 1000-step text continued-training pass.

Release Identity Official public model path: Ayjays132/phillnet-2
Merged Training 1000-step LoRA continuation merged into root model.safetensors
Route Fusion Guidance, thinking, swarm, adapter, and multimodal state exposed for inspection
SmolAgent Mode Opt-in tool workflow over retrieval, media routes, speech, and indicators
Creator Credit Ayjays132 / Young Philly P. / Phillip A. Holland
100%HF folder validation
66.7%local code smoke
100%generated image artifacts
11bounded lm-eval task metrics reported

Validated Loader

Loads through AutoTokenizer and AutoModelForCausalLM with trust_remote_code=True.

Multimodal Runtime

Text, code guidance, image, video, speech, audio listening, and route history are exposed from one model object.

Measured, Not Inflated

Benchmarks are reported as bounded local validation and are not presented as SOTA leaderboard claims.

Load LayerAutoTokenizer + AutoModelForCausalLM
Guidance LayerCodeGuidanceSystem available during generation
Media LayerImage, video, speech, and audio routes
Inspection LayerRuntime context and route history

This repository includes the local modality runtimes used by the model:

  • 🖼️ ImageGen/: packaged image route with local UNet/VAE assets.
  • 🎬 VideoGen/: fused/composer video route.
  • 🔊 Audio/: speech, audio encoding, and Whisper listening route.
  • 🧭 AgenticScaffold/, CodeGuidanceSystem/, Tools/: local planning, retrieval, SmolAgent, and guidance runtime pieces.
  • 🧩 adapters/: training provenance for the merged 1000-step text LoRA adapter; normal loading already uses the merged root weights.
  • 📂 examples/: public example outputs generated through the loaded Phillnet-2 model object.
⚠️ The benchmark numbers below are release-readiness and smoke validation results. They are not SOTA claims and are not official EvalPlus, SWE-bench, OSWorld, WebArena, BrowserGym, T2I-CompBench, or VBench results.

✨ Model Card Snapshot

Builder-First

Designed for people who want to load, inspect, generate, benchmark, and build from the same package instead of chasing separate demo folders.

Multimodal Surface

Text, code guidance, image generation, video artifacts, speech synthesis, audio listening, route history, and cleanup are all exposed from the loaded model object.

Workflow Agent

The opt-in SmolAgent runtime wraps Phillnet's retrieval, text, image, video, speech, and indicator tools without changing standard text generation.

Local Runtime Resolution

Remote-code runtime roots resolve back to the uploaded repository folder so packaged ImageGen, Audio, VideoGen, and guidance assets are found reliably.

Honest Evaluation

The card separates smoke validation, bounded lm-eval slices, local vending-style agent smoke, and skipped official suites so users know exactly what was run.

Phillnet-2 is best understood as a custom-code multimodal runtime package, not just a raw text checkpoint. Use the public model path, keep outputs outside the repo, and call the model methods directly.

🚀 What Makes Phillnet-2 Different

Phillnet-2 is built around runtime fusion. The public API is intentionally simple: load the model once, then call the modality methods on that same object. Internally, the package resolves local assets from the loaded repository folder, not from an accidental Transformers dynamic-module cache location.

LayerPurpose
GPT-OSS text backbonePrimary causal generation, tokenizer path, KV cache, and Hugging Face model surface
CodeGuidanceSystemAlways-available internal coding and routing guidance; direct sidecar generation for code-only prompts
Retrieval pathOptional use_retrieval=True generation path plus SmolAgent retrieval_search tool for source-snippet grounding
ImageGenPackaged diffusion route using local SDXL Turbo text encoders/tokenizers, UNet/VAE, scheduler config, and Phillnet adapter conditioning
VideoGenShort fused/composer video artifact route exposed through model.generate_video(...)
AudioSpeech synthesis, audio encoding, and Whisper Large V3 Turbo listening assets packaged locally
SmolAgent workflowOpt-in tool-calling workflow over retrieval, base generation, image/video generation, speech, and thinking indicators
Inspection/QOLmultimodal_context, route_fusion_status, adapter_status, verify_output, and cleanup helpers for debugging

This design makes Phillnet-2 feel less like a loose demo folder and more like a single experimental model runtime. The goal is not only to answer text prompts, but to expose a coherent local surface for building, testing, generating media, and inspecting multimodal routing.

One Load Path Text, media routes, retrieval helpers, and inspection tools hang off the same model object.
Opt-In Power Tools Agentic, retrieval, and deliberate multi-pass generation are available without polluting benchmark-clean default generation.
Artifact-First Media Image, video, GIF, WAV, manifests, and route metadata are saved as inspectable files.
Honest Runtime State Loaded modules, recent routes, audio/video state, adapter provenance, and thinking indicators are queryable.

⚙️ Install

Get the core runtime up in a single command:

pip install torch transformers safetensors pillow accelerate

Recommended extras for the packaged media routes:

pip install numpy soundfile imageio imageio-ffmpeg diffusers sentencepiece protobuf

Recommended extra for the opt-in workflow agent:

pip install smolagents

For the exact local benchmark environment used in this release pass, see the companion harness at benchmarks_phillnet2/. The validated local stack used Windows, CUDA, torch 2.10.0+cu130, transformers 5.3.0, and an NVIDIA RTX 3060.

1. Install Runtime torch, transformers, accelerate, safetensors
2. Install Media Extras diffusers, imageio, soundfile, sentencepiece
3. Load Once AutoModelForCausalLM with trust_remote_code=True
4. Call Routes model.generate_image, model.generate_video, model.synthesize_speech
5. Use Agent Mode model.generate_agentic(..., return_details=True) for tool-aware workflows

🧠 Load One Multimodal Model

A single load gives you a model object that exposes every modality route:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "Ayjays132/phillnet-2"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( repo_id, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto", ) model.set_tokenizer(tokenizer) model.eval()

Once loaded, the same model object exposes:

model.generate(...)          # text
model.generate_image(...)    # image
model.generate_video(...)    # video
model.synthesize_speech(...) # speech
model.transcribe_audio(...)  # listening / ASR
model.generate_agentic(...)  # opt-in SmolAgent workflow
model.multimodal_context()   # route history
model.adapter_status()       # merged training/adapters
model.route_fusion_status("build a website")  # guidance + thinking + swarm + media routing

Important: this repository uses custom modeling code. Only load with trust_remote_code=True in environments where you trust the repository contents.

Local clone example:

from pathlib import Path
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_dir = Path(".")

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32, device_map="auto" if torch.cuda.is_available() else None, ) model.set_tokenizer(tokenizer) model.eval()

The user-facing model wrapper routes image generation to the packaged diffusion path by default. A plain call uses the cleaner public preset: 512x512, 1 step, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, use_memory=False, and image_quality_polish=False for stable image conditioning.

image = model.generate_image(
    "premium studio photo of a transparent glass AI crystal sphere, sharp reflections",
    output_path="outputs/default_image.png",
)

print(image.metadata)
</div>

🖥️ Runtime Notes

TopicRecommendation
Best local pathCUDA GPU with bf16 support for full multimodal usage
Validated GPUNVIDIA GeForce RTX 3060, CUDA available, bf16 load verified
Text-only useLoad with AutoModelForCausalLM; keep use_cache=True for generation
Optional retrievalUse model.generate(..., use_retrieval=True) when source snippets should be injected before generation
Agentic workflowUse model.generate_agentic(...) for tool-aware workflow orchestration; keep it opt-in because it is slower than normal generation
Trained text adapterThe 1000-step LoRA adapter is merged into model.safetensors; normal users do not need a separate adapter load
Image/video/audio useUse the model methods directly; do not separately publish or load internal route folders as standalone public models
Media previewsThe README includes GIF, MP4, and WAV embeds plus direct links so Hugging Face users can inspect generated artifacts
OutputsWrite generated files to an external outputs/ or benchmark folder to keep the model repository clean
SecurityThis is a custom-code model; review code before using trust_remote_code=True in production

The package was cleaned for upload: local benchmark outputs, caches, training clutter, and test artifacts were kept outside the public model folder. The public examples remain in examples/ so users can verify what the loaded model route is expected to produce.

🧩 API Surface

Textmodel.generate
Imagemodel.generate_image
Videomodel.generate_video
Speechmodel.synthesize_speech
Agentmodel.generate_agentic
Retrievaluse_retrieval=True
Inspectroute_fusion_status
CallUseOutput
model.generate(...)Text generation and code guidanceToken IDs decoded by the tokenizer
model.generate(..., use_retrieval=True)Optional retrieval-augmented prompt injectionToken IDs with the internal retrieval prefix removed from the returned prompt span
model.generate_agentic(...)Opt-in SmolAgent tool workflowToken IDs, or {"text", "metadata"} with return_details=True
model.generate_image(...)Diffusion image generationPNG path plus route metadata
model.generate_video(...)Short fused video generationMP4 path, optional GIF, route metadata
model.synthesize_speech(...)Text-to-speechWAV path plus audio metadata
model.transcribe_audio(...)Whisper listening routeTranscript payload
model.encode_audio(...)Audio token/code routeEncoded audio payload
model.multimodal_context()Runtime route inspectionRecent modality route history and loaded-state info
model.adapter_status()Training provenanceMerged-weight status and packaged adapter metadata
model.route_fusion_status(prompt)Router inspectionGuidance, thinking, swarm, adapter, and multimodal route state
model.generate_with_preset(prompt, preset=...)Task-aware text generationDecoded, repaired text with route-specific decoding defaults
model.deliberate_generate(prompt)Multi-pass generationDraft, critique/check, and revised final answer
model.verify_output(prompt, text)VerifierLightweight format/code/JSON/source-risk issue report
model.clear_multimodal_runtime()CleanupUnloads/reset heavy route state where supported
For the most polished outputs, use generate_with_preset or deliberate_generate. Raw generate stays available for standard Transformers compatibility.
Benchmark-Clean Default generate stays close to the normal Transformers path unless you explicitly request retrieval or guidance.
Agentic Metadata return_details=True returns tools, route, status, and visual/thinking indicators.
Route Cleanup clear_multimodal_runtime unloads heavy media routes after demos or batch generation.

🛠️ Builder Workflows

Phillnet-2 can be used as a local creative/coding engine inside a broader engineering loop. It can draft web pages, components, game prototypes, utility scripts, documentation, prompts, and media-generation ideas. For production work, pair it with a real editor, tests, linting, and a human review pass.

Websites

Useful for first-pass layouts, HTML/CSS/JS drafts, component ideas, and copy. Verify with a browser and real lint/tests.

Games

Good for small prototypes, mechanics, UI scaffolds, and asset prompts. Larger games still need an engine loop and iteration.

Code Routing

Code prompts route through the guidance sidecar and keep the main model generation path intact.

Agentic QOL

SmolAgent mode can call retrieval, text, image, video, speech, and indicator tools when a workflow needs more than one route.

Websites

Use it to draft HTML/CSS/JS, React/Vite components, landing-page copy, model demos, gallery pages, and documentation sections.

Games

Useful for simple browser games, canvas prototypes, turn-based mechanics, UI states, asset prompt ideas, and gameplay scaffolding.

Cursor / Codex Loop

Use Phillnet-2 for local drafts and creative multimodal generation, then use Cursor/Codex-style tooling to patch files, run builds, inspect failures, and iterate.

Media-Backed Apps

Generate gallery images, short video artifacts, speech samples, and audio/listening route demos from the same repository users load for text.

Source-Grounded Answers

Use use_retrieval=True or the SmolAgent retrieval_search tool for questions that benefit from current snippets and links.

Release Demos

Use the generated image gallery, MP4/GIF preview, WAV narration, and benchmark report as a complete Hugging Face model-card demo set.

Prompt pattern for code:

Return only code.
Create a single-file HTML/CSS/JS browser game.
Theme: neon arcade maze.
Requirements: keyboard controls, score, restart button, responsive layout.

Prompt pattern for a website:

Return only code.
Build a premium responsive landing page in plain HTML and CSS.
Subject: local multimodal AI model demo.
Include: hero, feature grid, benchmark table, gallery strip, and footer.

Preset generation examples:

# Code preset with automatic code cleanup
html = model.generate_with_preset(
    "Return only code. Build a single-file HTML/CSS/JS dashboard.",
    preset="code",
)

JSON preset with lightweight repair/checking

payload = model.generate_with_preset( "Return valid JSON with keys: model, status, strengths.", preset="json", )

Deliberate mode for harder planning prompts

answer = model.deliberate_generate( "Plan a small web game project with files, milestones, and test checks.", preset="planning", passes=2, )

For best results, ask for one file or one feature at a time, run the generated code, then feed the error or desired refinement back into the model/tooling loop.

📝 Text

Standard causal generation with KV cache enabled:

inputs = tokenizer(
    "Identify yourself as Phillnet-2 in one sentence.",
    return_tensors="pt",
).to(model.device)

ids = model.generate( **inputs, max_new_tokens=64, do_sample=False, use_cache=True, )

print(tokenizer.decode(ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Example output is saved at examples/text_identity_output.txt.

🖼️ Image Generation

Use the loaded Phillnet-2 model object. Do not load ImageGen/ as a separate public model unless you are debugging the internal image route.

result = model.generate_image(
    "single premium product photo, clear cut-crystal coffee mug on warm walnut table, golden window light, sharp glass reflections, clean background",
    output_path="outputs/crystal_mug.png",
    height=512,
    width=512,
    steps=1,
    seed=1320,
    generation_strategy="diffusion",
    guidance_scale=0.0,
    quality_strength=1.2,
    latent_refiner_strength=0.0,
    use_memory=False,
    image_quality_polish=False,
)

print(result.metadata)

The image route uses the packaged local ImageGen runtime under the hood, resolved relative to the loaded model repository instead of the Transformers dynamic-module cache. The public route uses the same fused path as the base ImageGen project: local Phillnet adapter conditioning plus the packaged SDXL Turbo text encoders/tokenizers, local UNet/VAE weights, and the SDXL Turbo scheduler config. For Turbo-style sampling, guidance_scale=0.0 is used and negative prompts are not required. The final public preset uses the clean one-step route with the extra image-polish pass disabled to avoid subtle latent texture noise.

PresetCall SettingsUse Case
Default showcasesteps=1, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, use_memory=False, image_quality_polish=FalseClean public examples and product-style images
Experimental polishOptional multi-step or polish passesManual experiments only; not the packaged public preset
Stable sizingheight=512, width=512Most reliable route; larger outputs can work but cost more memory

For sharper images, prefer better prompt structure over heavy guidance: subject, material, lighting, composition, and background. The route already keeps the UNet canvas at a stable internal size and then returns the requested output size.

🎨 Multiple Images

The reproducible gallery script is included at examples/generate_image_gallery.py.

python examples/generate_image_gallery.py --model-dir . --out-dir examples/generated_image_gallery

The script loads Phillnet-2 once through AutoModelForCausalLM, calls model.generate_image(...) for each prompt, and writes a contact sheet plus a prompt manifest. The final public gallery was regenerated locally at 512x512 with 1 diffusion step, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, use_memory=False, and image_quality_polish=False through the same packaged Phillnet image route a user calls from this repository. This follows the SDXL-Turbo one-step path and avoids the extra polish/refiner passes that made earlier drafts look noisier.

Current public image examples, generated through Phillnet-2 itself:

Phillnet-2 generated asset collage Phillnet-2 generated photoreal red mug

Generated gallery files:

🎬 Video

Generate short clips through the VideoGen route. Video keyframes use the same corrected ImageGen defaults as still images: 512x512, 1 image step, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, use_memory=False, and image_quality_polish=False.

video = model.generate_video(
    "premium product advertisement for a transparent AI crystal sphere on a matte black plinth, cinematic studio lighting, slow camera push in, elegant reflective floor, no text, no logos",
    output_path="outputs/video_showcase_ad.mp4",
    backend="composer",
    seconds=3,
    fps=8,
    width=512,
    height=512,
    image_steps=1,
    guidance_scale=0.0,
    quality_strength=1.2,
    latent_refiner_strength=0.0,
    use_memory=False,
    image_quality_polish=False,
    min_frames=16,
    export_gif=True,
    keep_loaded=False,
)

print(video.metadata)

Optional sound is routed through the same package: pass audio_path="examples/speech_identity.wav" to mux an existing track, or pass audio_text="..." to synthesize narration through model.synthesize_speech(...) and attach it to the MP4 when FFmpeg is available.

Visual route

ImageGen keyframes use the same clean 512x512, one-step route as the asset gallery, then VideoGen interpolates and stabilizes the timeline.

Audio route

audio_text="auto" creates prompt-aware narration from the actual scene and camera motion; audio_path muxes an existing track.

SFX timeline

sfx_prompt and inferred ambience cues are stored in metadata for future sound-effect fine-tuning, without claiming speech synthesis is an SFX model.

With align_audio=True, VideoGen resolves or synthesizes audio before interpolation, expands effective_seconds to match the audio duration, and computes the final frame count from effective_seconds * fps. The composer backend also raises planned_keyframes from the aligned duration through keyframe_interval_seconds, so long narration or input audio gets multiple visual anchors instead of one stretched frame sequence. With condition_on_audio=True, input audio is routed through listening/encoding first and its summary conditions the visual storyboard.

video = model.generate_video(
    "premium product advertisement for a transparent AI crystal sphere on a matte black plinth, cinematic studio lighting, slow camera push in, elegant reflective floor, no text, no logos",
    output_path="outputs/video_showcase_ad_with_narration.mp4",
    seconds=3,
    fps=8,
    min_frames=16,
    audio_text="auto",
    sfx_prompt="quiet premium studio ambience, subtle glass shimmer, gentle cinematic fade",
    align_audio=True,
    condition_on_audio=True,
    keyframe_interval_seconds=4.0,
)

print(video.metadata["timeline"])
print(video.metadata["audio_sfx_cues"])
print(video.metadata["planned_keyframes"])

Final example output generated through model.generate_video(...). The GIF is included for model-card preview; the MP4 includes the aligned AAC audio stream for local playback.

Phillnet-2 generated product video GIF

Files: examples/video_showcase_ad.gif, examples/video_showcase_ad.mp4, and the generated narration stem examples/video_showcase_ad.wav.

🔊 Speech And Audio

Synthesize speech directly from the loaded model. The examples below are short demo lines; video narration can also be created automatically from the video prompt with audio_text="auto".

speech = model.synthesize_speech(
    "A warm studio close-up frames a glossy ceramic mug as the camera slowly pushes toward the handle and reflected window light.",
    output_path="outputs/speech_identity.wav",
)

speech = model.synthesize_speech( "The multimodal route can generate clean image keyframes, compose a short video timeline, synthesize narration, and mux audio into the final MP4.", output_path="outputs/speech_multimodal_surface.wav", )

print(speech.metadata)

Audio examples generated through model.synthesize_speech(...):

Whisper listening assets are packaged under Audio/models/Phillnet-2-Whisper-Large-V3-Turbo.

🧩 SmolAgent Runtime

The opt-in agentic path now uses a smolagents adapter instead of the legacy custom loop. Normal model.generate(...) remains the benchmark-clean default. Use model.generate_agentic(...) when you want a tool-aware workflow over Phillnet's local retrieval, image, video, speech, and indicator routes.

agentic = model.generate_agentic(
    prompt="Plan a short product demo, generate the visuals, and report the route indicators.",
    max_steps=1,
    max_new_tokens=256,
    return_details=True,
)

print(agentic["text"]) print(agentic["metadata"]["tools"]) print(agentic["metadata"]["visual_indicators"])

Tools

retrieval_search, base_generate, generate_image, generate_video, synthesize_speech, and thinking_indicators are exposed as agent tools.

Indicators

The run metadata reports thinking budget, route state, recent multimodal calls, and loaded visual/audio/video modules for inspection.

Default Policy

Agentic mode is intentionally opt-in because tool loops are slower and can be brittle on a small local backbone. Use it for workflow orchestration, not closed-book benchmark scoring.

🧭 Runtime Context

After modality calls, inspect the route history:

print(model.multimodal_context())
print(model.adapter_status())
print(model.route_fusion_status("Return only code. Build a website."))
print(model.pack_runtime_context("Summarize this repo and produce a JSON report."))
model.clear_multimodal_runtime()

This confirms which modality routes were used, whether heavy image/video/audio modules remain loaded, which adapter provenance is packaged, and which route/preset the orchestration layer would choose.

📂 Example Files

📊 Benchmark Summary

Text Smoke 66.7% after long-form guidance routing fix
Code Smoke 66.7% with CodeGuidanceSystem route active
Multimodal Smoke 100.0% route-level public API validation
lm-eval Slice Bounded local evaluator results, not leaderboard submission

Benchmark harness folder used locally: benchmarks_phillnet2/ beside this upload folder.

ℹ️ These are release-readiness smoke results, not SOTA claims. The root language weights were continued-trained with a 1000-step LoRA run on teknium/openhermes and nvidia/OpenCodeInstruct, then merged back into model.safetensors so the standard public load path uses the trained weights directly. The upload-facing benchmark script was rerun against the final package after the long-form sidecar-scaffold routing fix. Image, video, and audio artifact checks passed as route/artifact validation; they are not official media-quality leaderboard scores.

PhillNet-2 Benchmark Visuals

Release-readiness benchmark visuals for the experimental AXIOM multimodal GPT-OSS runtime. These charts summarize local smoke validation and bounded lm-evaluation-harness slices.

PhillNet-2 premium benchmark panel with headline scores

High-level benchmark panel for quick model-card scanning.

PhillNet-2 release readiness benchmark scorecard

Local release-readiness smoke results. These are not SOTA or official leaderboard claims.

PhillNet-2 lm-evaluation-harness benchmark slice

Bounded lm-evaluation-harness slice using primary metrics per task.

PhillNet-2 multimodal runtime validation table

Visual status card for text, code, image, video, audio, and multimodal route validation.

PhillNet-2 actual one-shot website artifact

Actual one-shot website artifact from test_oneshot_website.py, rendered from examples/oneshot_website_demo.html.

SuiteScoreBar
HF folder validation100.0%██████████████████
Text local smoke66.7%████████████░░░░░░
Code local smoke66.7%████████████░░░░░░
Agent mock smoke100.0%██████████████████
Image artifacts100.0%██████████████████
Video artifacts100.0%██████████████████
Audio artifacts100.0%██████████████████
Multimodal smoke100.0%██████████████████

One-Shot Website Upload Test

The website test uses the packaged upload folder directly. The CodeGuidance sidecar is retained as an advisory scaffold, while the main model generates the final HTML artifact.
CheckResult
Commandpy test_oneshot_website.py
Prompt recipeReturn only code; build a complete single-file responsive HTML/CSS landing page for a local AI startup called Phillnet; include sticky nav, hero, CTA, three feature cards, three pricing tiers, footer, dark theme, and blue accent; end with </html>.
Generated artifactexamples/oneshot_website_demo.html
Raw model outputexamples/oneshot_website_raw_output.txt
Rendered screenshotexamples/oneshot_website_demo.png
Passes used1
HTML character count11,713
Deterministic repair usedNo
Closed HTML documentYes
Detected sectionshead, style, body, nav, hero, features, pricing, and footer

Continued Training Pass

The root model weights already include this training merge. The packaged adapter folder is included for provenance and auditability, not as an extra user requirement.
ItemResult
MethodLoRA continued training, rank 16, alpha 32, merged into the root model weights
Datasetsteknium/openhermes + nvidia/OpenCodeInstruct
Steps1000 optimizer steps, max sequence length 768, gradient accumulation 8
Trainable adapter parameters10,822,656 during training before merge
Runtime12024.5 seconds on local RTX 3060
Final train loss0.9148
Local text smoke55.6% before training, 66.7% after merged training and long-form guidance routing fix
Local code smoke83.3% after merged training in the earlier code-only route pass; 66.7% in the latest rerun after guidance-sidecar prompt tightening
Adapter provenanceadapters/phillnet2-text-continuation-lora-r16-openhermes-opencode-1000steps/ included for inspection; not required for normal loading

Fused Routing Layer

Route fusion here means a coordinated runtime path for guidance, thinking, swarm inspection, adapter provenance, and multimodal dispatch. It is not an official MoE leaderboard claim.
RouteStatusPurpose
Guidanceenabled selectively by default and always available with use_guidance=TrueUses compact internal guidance for complex tasks; long-form website/game/frontend prompts use the sidecar as an advisory scaffold while the main model remains responsible for the final generated artifact
Thinkingconfigured through show_thinking, extended_thinking, and thinking_budget_tokensKeeps planning budget visible and inspectable instead of hidden behind an undocumented route
Swarmuse_agentic_scaffold=TrueProvides task-signal detection and discussion scaffolding without mutating KV cache state
Adapter1000-step LoRA merged into root weightsNormal model loading receives the trained text behavior directly
Inspectionmodel.route_fusion_status(prompt)Reports guidance, thinking, swarm, adapter, and multimodal state for debugging and demos

Official lm-eval Benchmark Slice

HarnessTaskLimitMetricResult
lm-evaluation-harnessarc_easy10 examplesaccuracy0.700
lm-evaluation-harnessarc_easy10 examplesnormalized accuracy0.700
lm-evaluation-harnesshellaswag10 examplesaccuracy0.300
lm-evaluation-harnesshellaswag10 examplesnormalized accuracy0.400
lm-evaluation-harnesslambada_openai10 examplesaccuracy0.300
lm-evaluation-harnesslambada_openai10 examplesperplexity13.093
lm-evaluation-harnesspiqa10 examplesaccuracy0.700
lm-evaluation-harnesspiqa10 examplesnormalized accuracy0.700
lm-evaluation-harnesswinogrande10 examplesaccuracy0.800
lm-evaluation-harnessgsm8k10 examplesexact match, strict0.300
lm-evaluation-harnessgsm8k10 examplesexact match, flexible0.300
lm-evaluation-harnesstruthfulqa_mc110 examplesaccuracy0.300

This final upload-facing slice was rerun on 2026-05-25 with py run_lm_eval_compat.py --model-dir . --tasks arc_easy,hellaswag,lambada_openai,piqa,winogrande,gsm8k,truthfulqa_mc1 --limit 10 --device cuda:0 --out reports/final_lm_eval_slice. The JSON report is included at reports/final_lm_eval_slice/results.json. The wrapper supports lm_eval 0.4.9.2 + transformers 5.3.0, because that lm-eval release still references AutoModelForVision2Seq while Transformers 5 exposes AutoModelForImageTextToText.

How To Read These Scores

Release Readiness

HF validation, CUDA bf16 load, KV cache, runtime-root resolution, and route presence show that the package is structurally upload-ready.

Model Quality

The bounded lm-eval slices are real evaluator runs, but they use limits and should not be treated as full benchmark-suite leaderboards.

Media Routes

Image generation was rerun with real generated outputs. Video/audio numbers are artifact-route validation unless an official media-quality evaluator is installed and run.

Competition Framing

Phillnet-2 is positioned as a serious experimental multimodal runtime. It should be compared to other models only with identical tasks, limits, prompts, and hardware.

Code Guidance Route

CheckResult
Local code smoke4 / 6 tasks passed in the latest rerun
Measured pass rate66.7%
Routesmall code-only prompts can use the CodeGuidanceSystem sidecar; long-form website/game/frontend prompts remain on the main model path with compact routing guidance
Language routingPython, JavaScript, TypeScript, Java, C++, C#, Go, Rust, SQL, HTML, CSS, Bash, and PowerShell prompt detection
Optional coding suite statusEvalPlus installed; full HumanEval+/MBPP+ generation was not run in this pass
Skipped official coding suitesSWE-bench was not downloaded/run

Local Vending-Style Agent Smoke

MetricResult
Official Vending-BenchNo, local smoke only
Days completed10 / 10
Valid JSON decision rate20.0%
Final cash$521.30
BankruptNo

Image Generation Details (Real Generated Run)

Prompt IDResult
muggenerated 128x128 PNG
headphonesgenerated 128x128 PNG
citygenerated 128x128 PNG
spatialgenerated 128x128 PNG

Validated Release Properties

PropertyResult
Model identityPhillnet-2
HF custom-code loadtrust_remote_code=True
Public load pathAutoTokenizer / AutoModelForCausalLM
Runtime rootloaded repository folder, not dynamic-module cache
CUDA dtype smoketorch.bfloat16 verified locally
KV cacheenabled for text generation
Image routeartifact present, packaged local ImageGen route
Image prompt routeSDXL text encoders/tokenizers + local UNet/VAE + Phillnet adapter conditioning
Video routeMP4/GIF artifact present in the upload smoke run
Audio routeWAV artifact present in the upload smoke run
Cleanupimage/video/audio runtimes can be unloaded after calls

Skipped Or Partial Official Integrations

  • lm-evaluation-harness: bounded benchmark slices were run successfully after the 1000-step merged training pass. Full official task suites were not run.
  • EvalPlus HumanEval+/MBPP+: package installed; full generated-code suite not run in this pass.
  • BrowserGym / Inspect stack: install attempted, but Python 3.14 Windows native wheels for greenlet/lxml were not available and local compilation failed.
  • Vending-Bench / Vending-Bench 2: official suite not installed/run locally. A short local vending-style smoke was run and reported above, but it is not the Andon Labs benchmark.
  • SWE-bench, OSWorld, WebArena, BrowserGym, T2I-CompBench, VBench: heavy optional suites; install and run explicitly for leaderboard-style comparisons.

🚧 Limitations

Phillnet-2 is an experimental custom-code multimodal runtime. The included examples prove package loading, route wiring, artifact generation, CUDA bf16 loading, and local runtime resolution. They do not establish official leaderboard quality or SOTA status. Use trust_remote_code=True only in environments where you trust this repository.

  • Full official leaderboards such as SWE-bench, OSWorld, WebArena, BrowserGym, T2I-CompBench, VBench, and official Vending-Bench were not completed in this release pass.
  • The image route is validated through real local generation, but the bundled gallery is not an official T2I benchmark.
  • Video and audio routes expose packaged generation/listening APIs, but the current public numbers are route/artifact validation rather than official media-quality leaderboard scores.
  • Model quality should be compared using the exact benchmark command, task limit, hardware, dependency versions, and prompt format reported by each evaluator.

✅ Public Release Checklist

StatusRelease Item
ReadyLoads through AutoTokenizer and AutoModelForCausalLM with trust_remote_code=True
ReadyRuntime root resolves to the uploaded repository folder
ReadyCUDA bf16 load verified locally
ReadyKV cache remains enabled for text generation
ReadyImage route uses packaged local text encoders/tokenizers, UNet/VAE, scheduler config, and adapter conditioning
ReadyExamples and generated media are included under examples/
ReadyBenchmark visuals and bounded benchmark tables are included with honest labels
ReadyCache, local results, test clutter, and training clutter were kept out of the upload folder
ReadyCreator attribution and suggested citation are included

📜 Attribution, Usage, And License

Phillnet-2 is released under the Apache 2.0 license. The official public model package is Ayjays132/phillnet-2.

Created by Ayjays132, also known as Young Philly P. and Phillip A. Holland. If you use Phillnet-2 in research, demos, apps, derivatives, benchmarks, screenshots, generated media showcases, public writeups, or downstream model packages, please credit the creator and link back to the official model repository.

Credit Line

Phillnet-2 by Ayjays132 / Young Philly P. / Phillip A. Holland.

Official Source

Ayjays132/phillnet-2 is the canonical public Hugging Face package for this release.

Derivative Use

If you modify, fine-tune, wrap, benchmark, or redistribute the model, keep clear attribution to the original Phillnet-2 work.

Generated Media

When showing examples generated through this package, note that they were generated with Phillnet-2 where practical.

Suggested citation text:

Phillnet-2, an AXIOM multimodal GPT-OSS runtime by Ayjays132
(Young Philly P. / Phillip A. Holland), available as Ayjays132/phillnet-2.

Suggested BibTeX:

@misc{phillnet2_2026,
  title        = {Phillnet-2: AXIOM Multimodal GPT-OSS Runtime},
  author       = {Ayjays132 / Young Philly P. / Phillip A. Holland},
  year         = {2026},
  howpublished = {Hugging Face model package: Ayjays132/phillnet-2},
  note         = {Custom-code multimodal runtime with text, code guidance, image, video, speech, and audio routes}
}
Public release note: this model card, examples, benchmark visuals, and usage snippets are prepared so users can load the official package, understand what was tested, credit the creator, and reproduce the same public API path.
Downloads last month
167
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support