Instructions to use ayjays132/Phillnet-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ayjays132/Phillnet-2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ayjays132/Phillnet-2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ayjays132/Phillnet-2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("ayjays132/Phillnet-2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ayjays132/Phillnet-2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ayjays132/Phillnet-2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ayjays132/Phillnet-2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ayjays132/Phillnet-2

SGLang

How to use ayjays132/Phillnet-2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ayjays132/Phillnet-2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ayjays132/Phillnet-2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ayjays132/Phillnet-2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ayjays132/Phillnet-2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ayjays132/Phillnet-2 with Docker Model Runner:
```
docker model run hf.co/ayjays132/Phillnet-2
```

PhillNet 2: Official AXIOM Multimodal GPT-OSS Runtime

Official Hugging Face Release

Created By Ayjays132 / Young Philly P. / Phillip A. Holland

A public custom-code multimodal model package built to run text, image, video, speech, audio, and routing from one loaded Transformers model object.

Transformers custom code One model object ImageGen route fixed CUDA bf16 verified KV cache enabled

Ayjays132/phillnet-2 · AutoTokenizer · AutoModelForCausalLM · trust_remote_code=True

Phillnet-2 is an experimental AXIOM multimodal GPT-OSS runtime packaged as a Hugging Face transformers custom-code model. It is designed as a new-model competitor package: one public repository, one primary load path, and a single runtime object that exposes text generation, code guidance, image generation, short video generation, speech synthesis, audio listening, and route inspection.

The model card is written for builders who want to inspect and run the system locally. The examples below use the same public load path a Hugging Face user would use after upload:

repo_id = "Ayjays132/phillnet-2"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)

Phillnet-2 is not presented here as a closed leaderboard claim. It is presented as an integrated multimodal runtime with measured local validation, real generated image examples, bounded lm-eval slices, code smoke testing, public usage code, and a merged 1000-step text continued-training pass.

Release Identity Official public model path: Ayjays132/phillnet-2

Merged Training 1000-step LoRA continuation merged into root model.safetensors

Route Fusion Guidance, thinking, swarm, adapter, and multimodal state exposed for inspection

SmolAgent Mode Opt-in tool workflow over retrieval, media routes, speech, and indicators

Creator Credit Ayjays132 / Young Philly P. / Phillip A. Holland

100%HF folder validation

66.7%local code smoke

100%generated image artifacts

11bounded lm-eval task metrics reported

Validated Loader

Loads through AutoTokenizer and AutoModelForCausalLM with trust_remote_code=True.

Multimodal Runtime

Text, code guidance, image, video, speech, audio listening, and route history are exposed from one model object.

Measured, Not Inflated

Benchmarks are reported as bounded local validation and are not presented as SOTA leaderboard claims.

Load LayerAutoTokenizer + AutoModelForCausalLM

Guidance LayerCodeGuidanceSystem available during generation

Media LayerImage, video, speech, and audio routes

Inspection LayerRuntime context and route history

This repository includes the local modality runtimes used by the model:

🖼️ ImageGen/: packaged image route with local UNet/VAE assets.
🎬 VideoGen/: fused/composer video route.
🔊 Audio/: speech, audio encoding, and Whisper listening route.
🧭 AgenticScaffold/, CodeGuidanceSystem/, Tools/: local planning, retrieval, SmolAgent, and guidance runtime pieces.
🧩 adapters/: training provenance for the merged 1000-step text LoRA adapter; normal loading already uses the merged root weights.
📂 examples/: public example outputs generated through the loaded Phillnet-2 model object.

⚠️ The benchmark numbers below are release-readiness and smoke validation results. They are not SOTA claims and are not official EvalPlus, SWE-bench, OSWorld, WebArena, BrowserGym, T2I-CompBench, or VBench results.

✨ Model Card Snapshot

Builder-First

Designed for people who want to load, inspect, generate, benchmark, and build from the same package instead of chasing separate demo folders.

Multimodal Surface

Text, code guidance, image generation, video artifacts, speech synthesis, audio listening, route history, and cleanup are all exposed from the loaded model object.

Workflow Agent

The opt-in SmolAgent runtime wraps Phillnet's retrieval, text, image, video, speech, and indicator tools without changing standard text generation.

Local Runtime Resolution

Remote-code runtime roots resolve back to the uploaded repository folder so packaged ImageGen, Audio, VideoGen, and guidance assets are found reliably.

Honest Evaluation

The card separates smoke validation, bounded lm-eval slices, local vending-style agent smoke, and skipped official suites so users know exactly what was run.

Phillnet-2 is best understood as a custom-code multimodal runtime package, not just a raw text checkpoint. Use the public model path, keep outputs outside the repo, and call the model methods directly.

🚀 What Makes Phillnet-2 Different

Phillnet-2 is built around runtime fusion. The public API is intentionally simple: load the model once, then call the modality methods on that same object. Internally, the package resolves local assets from the loaded repository folder, not from an accidental Transformers dynamic-module cache location.

Layer	Purpose
GPT-OSS text backbone	Primary causal generation, tokenizer path, KV cache, and Hugging Face model surface
CodeGuidanceSystem	Always-available internal coding and routing guidance; direct sidecar generation for code-only prompts
Retrieval path	Optional `use_retrieval=True` generation path plus SmolAgent `retrieval_search` tool for source-snippet grounding
ImageGen	Packaged diffusion route using local SDXL Turbo text encoders/tokenizers, UNet/VAE, scheduler config, and Phillnet adapter conditioning
VideoGen	Short fused/composer video artifact route exposed through `model.generate_video(...)`
Audio	Speech synthesis, audio encoding, and Whisper Large V3 Turbo listening assets packaged locally
SmolAgent workflow	Opt-in tool-calling workflow over retrieval, base generation, image/video generation, speech, and thinking indicators
Inspection/QOL	`multimodal_context`, `route_fusion_status`, `adapter_status`, `verify_output`, and cleanup helpers for debugging

This design makes Phillnet-2 feel less like a loose demo folder and more like a single experimental model runtime. The goal is not only to answer text prompts, but to expose a coherent local surface for building, testing, generating media, and inspecting multimodal routing.

One Load Path Text, media routes, retrieval helpers, and inspection tools hang off the same model object.

Opt-In Power Tools Agentic, retrieval, and deliberate multi-pass generation are available without polluting benchmark-clean default generation.

Artifact-First Media Image, video, GIF, WAV, manifests, and route metadata are saved as inspectable files.

Honest Runtime State Loaded modules, recent routes, audio/video state, adapter provenance, and thinking indicators are queryable.

⚙️ Install

Get the core runtime up in a single command:

pip install torch transformers safetensors pillow accelerate

Recommended extras for the packaged media routes:

pip install numpy soundfile imageio imageio-ffmpeg diffusers sentencepiece protobuf

Recommended extra for the opt-in workflow agent:

pip install smolagents

For the exact local benchmark environment used in this release pass, see the companion harness at benchmarks_phillnet2/. The validated local stack used Windows, CUDA, torch 2.10.0+cu130, transformers 5.3.0, and an NVIDIA RTX 3060.

1. Install Runtime torch, transformers, accelerate, safetensors

2. Install Media Extras diffusers, imageio, soundfile, sentencepiece

3. Load Once AutoModelForCausalLM with trust_remote_code=True

4. Call Routes model.generate_image, model.generate_video, model.synthesize_speech

5. Use Agent Mode model.generate_agentic(..., return_details=True) for tool-aware workflows

🧠 Load One Multimodal Model

A single load gives you a model object that exposes every modality route:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "Ayjays132/phillnet-2"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.set_tokenizer(tokenizer)
model.eval()

Once loaded, the same model object exposes:

model.generate(...)          # text
model.generate_image(...)    # image
model.generate_video(...)    # video
model.synthesize_speech(...) # speech
model.transcribe_audio(...)  # listening / ASR
model.generate_agentic(...)  # opt-in SmolAgent workflow
model.multimodal_context()   # route history
model.adapter_status()       # merged training/adapters
model.route_fusion_status("build a website")  # guidance + thinking + swarm + media routing

Important: this repository uses custom modeling code. Only load with trust_remote_code=True in environments where you trust the repository contents.

Local clone example:

from pathlib import Path
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_dir = Path(".")
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    device_map="auto" if torch.cuda.is_available() else None,
)
model.set_tokenizer(tokenizer)
model.eval()

The user-facing model wrapper routes image generation to the packaged diffusion path by default. A plain call uses the cleaner public preset: 512x512, 1 step, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, use_memory=False, and image_quality_polish=False for stable image conditioning.

image = model.generate_image(
    "premium studio photo of a transparent glass AI crystal sphere, sharp reflections",
    output_path="outputs/default_image.png",
)

print(image.metadata)

</div>

🖥️ Runtime Notes

Topic	Recommendation
Best local path	CUDA GPU with bf16 support for full multimodal usage
Validated GPU	NVIDIA GeForce RTX 3060, CUDA available, bf16 load verified
Text-only use	Load with `AutoModelForCausalLM`; keep `use_cache=True` for generation
Optional retrieval	Use `model.generate(..., use_retrieval=True)` when source snippets should be injected before generation
Agentic workflow	Use `model.generate_agentic(...)` for tool-aware workflow orchestration; keep it opt-in because it is slower than normal generation
Trained text adapter	The 1000-step LoRA adapter is merged into `model.safetensors`; normal users do not need a separate adapter load
Image/video/audio use	Use the model methods directly; do not separately publish or load internal route folders as standalone public models
Media previews	The README includes GIF, MP4, and WAV embeds plus direct links so Hugging Face users can inspect generated artifacts
Outputs	Write generated files to an external `outputs/` or benchmark folder to keep the model repository clean
Security	This is a custom-code model; review code before using `trust_remote_code=True` in production

The package was cleaned for upload: local benchmark outputs, caches, training clutter, and test artifacts were kept outside the public model folder. The public examples remain in examples/ so users can verify what the loaded model route is expected to produce.

🧩 API Surface

Textmodel.generate

Imagemodel.generate_image

Videomodel.generate_video

Speechmodel.synthesize_speech

Agentmodel.generate_agentic

Retrievaluse_retrieval=True

Inspectroute_fusion_status

Call	Use	Output
`model.generate(...)`	Text generation and code guidance	Token IDs decoded by the tokenizer
`model.generate(..., use_retrieval=True)`	Optional retrieval-augmented prompt injection	Token IDs with the internal retrieval prefix removed from the returned prompt span
`model.generate_agentic(...)`	Opt-in SmolAgent tool workflow	Token IDs, or `{"text", "metadata"}` with `return_details=True`
`model.generate_image(...)`	Diffusion image generation	PNG path plus route metadata
`model.generate_video(...)`	Short fused video generation	MP4 path, optional GIF, route metadata
`model.synthesize_speech(...)`	Text-to-speech	WAV path plus audio metadata
`model.transcribe_audio(...)`	Whisper listening route	Transcript payload
`model.encode_audio(...)`	Audio token/code route	Encoded audio payload
`model.multimodal_context()`	Runtime route inspection	Recent modality route history and loaded-state info
`model.adapter_status()`	Training provenance	Merged-weight status and packaged adapter metadata
`model.route_fusion_status(prompt)`	Router inspection	Guidance, thinking, swarm, adapter, and multimodal route state
`model.generate_with_preset(prompt, preset=...)`	Task-aware text generation	Decoded, repaired text with route-specific decoding defaults
`model.deliberate_generate(prompt)`	Multi-pass generation	Draft, critique/check, and revised final answer
`model.verify_output(prompt, text)`	Verifier	Lightweight format/code/JSON/source-risk issue report
`model.clear_multimodal_runtime()`	Cleanup	Unloads/reset heavy route state where supported

For the most polished outputs, use generate_with_preset or deliberate_generate. Raw generate stays available for standard Transformers compatibility.

Benchmark-Clean Default generate stays close to the normal Transformers path unless you explicitly request retrieval or guidance.

Agentic Metadata return_details=True returns tools, route, status, and visual/thinking indicators.

Route Cleanup clear_multimodal_runtime unloads heavy media routes after demos or batch generation.

🛠️ Builder Workflows

Phillnet-2 can be used as a local creative/coding engine inside a broader engineering loop. It can draft web pages, components, game prototypes, utility scripts, documentation, prompts, and media-generation ideas. For production work, pair it with a real editor, tests, linting, and a human review pass.

Websites

Useful for first-pass layouts, HTML/CSS/JS drafts, component ideas, and copy. Verify with a browser and real lint/tests.

Games

Good for small prototypes, mechanics, UI scaffolds, and asset prompts. Larger games still need an engine loop and iteration.

Code Routing

Code prompts route through the guidance sidecar and keep the main model generation path intact.

Agentic QOL

SmolAgent mode can call retrieval, text, image, video, speech, and indicator tools when a workflow needs more than one route.

Websites

Use it to draft HTML/CSS/JS, React/Vite components, landing-page copy, model demos, gallery pages, and documentation sections.

Games

Useful for simple browser games, canvas prototypes, turn-based mechanics, UI states, asset prompt ideas, and gameplay scaffolding.

Cursor / Codex Loop

Use Phillnet-2 for local drafts and creative multimodal generation, then use Cursor/Codex-style tooling to patch files, run builds, inspect failures, and iterate.

Media-Backed Apps

Generate gallery images, short video artifacts, speech samples, and audio/listening route demos from the same repository users load for text.

Source-Grounded Answers

Use use_retrieval=True or the SmolAgent retrieval_search tool for questions that benefit from current snippets and links.

Release Demos

Use the generated image gallery, MP4/GIF preview, WAV narration, and benchmark report as a complete Hugging Face model-card demo set.

Prompt pattern for code:

Return only code.
Create a single-file HTML/CSS/JS browser game.
Theme: neon arcade maze.
Requirements: keyboard controls, score, restart button, responsive layout.

Prompt pattern for a website:

Return only code.
Build a premium responsive landing page in plain HTML and CSS.
Subject: local multimodal AI model demo.
Include: hero, feature grid, benchmark table, gallery strip, and footer.

Preset generation examples:

# Code preset with automatic code cleanup html = model.generate_with_preset( "Return only code. Build a single-file HTML/CSS/JS dashboard.", preset="code", )

JSON preset with lightweight repair/checking

payload = model.generate_with_preset( "Return valid JSON with keys: model, status, strengths.", preset="json", )

Deliberate mode for harder planning prompts

answer = model.deliberate_generate( "Plan a small web game project with files, milestones, and test checks.", preset="planning", passes=2, )

For best results, ask for one file or one feature at a time, run the generated code, then feed the error or desired refinement back into the model/tooling loop.

📝 Text

Standard causal generation with KV cache enabled:

inputs = tokenizer(
    "Identify yourself as Phillnet-2 in one sentence.",
    return_tensors="pt",
).to(model.device)

ids = model.generate(
    **inputs,
    max_new_tokens=64,
    do_sample=False,
    use_cache=True,
)
print(tokenizer.decode(ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Example output is saved at examples/text_identity_output.txt.

🖼️ Image Generation

Use the loaded Phillnet-2 model object. Do not load ImageGen/ as a separate public model unless you are debugging the internal image route.

result = model.generate_image(
    "single premium product photo, clear cut-crystal coffee mug on warm walnut table, golden window light, sharp glass reflections, clean background",
    output_path="outputs/crystal_mug.png",
    height=512,
    width=512,
    steps=1,
    seed=1320,
    generation_strategy="diffusion",
    guidance_scale=0.0,
    quality_strength=1.2,
    latent_refiner_strength=0.0,
    use_memory=False,
    image_quality_polish=False,
)

print(result.metadata)

The image route uses the packaged local ImageGen runtime under the hood, resolved relative to the loaded model repository instead of the Transformers dynamic-module cache. The public route uses the same fused path as the base ImageGen project: local Phillnet adapter conditioning plus the packaged SDXL Turbo text encoders/tokenizers, local UNet/VAE weights, and the SDXL Turbo scheduler config. For Turbo-style sampling, guidance_scale=0.0 is used and negative prompts are not required. The final public preset uses the clean one-step route with the extra image-polish pass disabled to avoid subtle latent texture noise.

Preset	Call Settings	Use Case
Default showcase	`steps=1`, `guidance_scale=0.0`, `quality_strength=1.2`, `latent_refiner_strength=0.0`, `use_memory=False`, `image_quality_polish=False`	Clean public examples and product-style images
Experimental polish	Optional multi-step or polish passes	Manual experiments only; not the packaged public preset
Stable sizing	`height=512`, `width=512`	Most reliable route; larger outputs can work but cost more memory

For sharper images, prefer better prompt structure over heavy guidance: subject, material, lighting, composition, and background. The route already keeps the UNet canvas at a stable internal size and then returns the requested output size.

🎨 Multiple Images

The reproducible gallery script is included at examples/generate_image_gallery.py.

python examples/generate_image_gallery.py --model-dir . --out-dir examples/generated_image_gallery

The script loads Phillnet-2 once through AutoModelForCausalLM, calls model.generate_image(...) for each prompt, and writes a contact sheet plus a prompt manifest. The final public gallery was regenerated locally at 512x512 with 1 diffusion step, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, use_memory=False, and image_quality_polish=False through the same packaged Phillnet image route a user calls from this repository. This follows the SDXL-Turbo one-step path and avoids the extra polish/refiner passes that made earlier drafts look noisier.

Current public image examples, generated through Phillnet-2 itself:

Generated gallery files:

🎬 Video

Generate short clips through the VideoGen route. Video keyframes use the same corrected ImageGen defaults as still images: 512x512, 1 image step, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, use_memory=False, and image_quality_polish=False.

video = model.generate_video(
    "premium product advertisement for a transparent AI crystal sphere on a matte black plinth, cinematic studio lighting, slow camera push in, elegant reflective floor, no text, no logos",
    output_path="outputs/video_showcase_ad.mp4",
    backend="composer",
    seconds=3,
    fps=8,
    width=512,
    height=512,
    image_steps=1,
    guidance_scale=0.0,
    quality_strength=1.2,
    latent_refiner_strength=0.0,
    use_memory=False,
    image_quality_polish=False,
    min_frames=16,
    export_gif=True,
    keep_loaded=False,
)

print(video.metadata)

Optional sound is routed through the same package: pass audio_path="examples/speech_identity.wav" to mux an existing track, or pass audio_text="..." to synthesize narration through model.synthesize_speech(...) and attach it to the MP4 when FFmpeg is available.

Visual route

ImageGen keyframes use the same clean 512x512, one-step route as the asset gallery, then VideoGen interpolates and stabilizes the timeline.

Audio route

audio_text="auto" creates prompt-aware narration from the actual scene and camera motion; audio_path muxes an existing track.

SFX timeline

sfx_prompt and inferred ambience cues are stored in metadata for future sound-effect fine-tuning, without claiming speech synthesis is an SFX model.

With align_audio=True, VideoGen resolves or synthesizes audio before interpolation, expands effective_seconds to match the audio duration, and computes the final frame count from effective_seconds * fps. The composer backend also raises planned_keyframes from the aligned duration through keyframe_interval_seconds, so long narration or input audio gets multiple visual anchors instead of one stretched frame sequence. With condition_on_audio=True, input audio is routed through listening/encoding first and its summary conditions the visual storyboard.

video = model.generate_video(
    "premium product advertisement for a transparent AI crystal sphere on a matte black plinth, cinematic studio lighting, slow camera push in, elegant reflective floor, no text, no logos",
    output_path="outputs/video_showcase_ad_with_narration.mp4",
    seconds=3,
    fps=8,
    min_frames=16,
    audio_text="auto",
    sfx_prompt="quiet premium studio ambience, subtle glass shimmer, gentle cinematic fade",
    align_audio=True,
    condition_on_audio=True,
    keyframe_interval_seconds=4.0,
)

print(video.metadata["timeline"])
print(video.metadata["audio_sfx_cues"])
print(video.metadata["planned_keyframes"])

Final example output generated through model.generate_video(...). The GIF is included for model-card preview; the MP4 includes the aligned AAC audio stream for local playback.

Files: examples/video_showcase_ad.gif, examples/video_showcase_ad.mp4, and the generated narration stem examples/video_showcase_ad.wav.

🔊 Speech And Audio

Synthesize speech directly from the loaded model. The examples below are short demo lines; video narration can also be created automatically from the video prompt with audio_text="auto".

speech = model.synthesize_speech( "A warm studio close-up frames a glossy ceramic mug as the camera slowly pushes toward the handle and reflected window light.", output_path="outputs/speech_identity.wav", )

speech = model.synthesize_speech( "The multimodal route can generate clean image keyframes, compose a short video timeline, synthesize narration, and mux audio into the final MP4.", output_path="outputs/speech_multimodal_surface.wav", )

print(speech.metadata)

Audio examples generated through model.synthesize_speech(...):

Whisper listening assets are packaged under Audio/models/Phillnet-2-Whisper-Large-V3-Turbo.

🧩 SmolAgent Runtime

The opt-in agentic path now uses a smolagents adapter instead of the legacy custom loop. Normal model.generate(...) remains the benchmark-clean default. Use model.generate_agentic(...) when you want a tool-aware workflow over Phillnet's local retrieval, image, video, speech, and indicator routes.

agentic = model.generate_agentic(
    prompt="Plan a short product demo, generate the visuals, and report the route indicators.",
    max_steps=1,
    max_new_tokens=256,
    return_details=True,
)

print(agentic["text"])
print(agentic["metadata"]["tools"])
print(agentic["metadata"]["visual_indicators"])

Tools

retrieval_search, base_generate, generate_image, generate_video, synthesize_speech, and thinking_indicators are exposed as agent tools.

Indicators

The run metadata reports thinking budget, route state, recent multimodal calls, and loaded visual/audio/video modules for inspection.

Default Policy

Agentic mode is intentionally opt-in because tool loops are slower and can be brittle on a small local backbone. Use it for workflow orchestration, not closed-book benchmark scoring.

🧭 Runtime Context

After modality calls, inspect the route history:

print(model.multimodal_context())
print(model.adapter_status())
print(model.route_fusion_status("Return only code. Build a website."))
print(model.pack_runtime_context("Summarize this repo and produce a JSON report."))
model.clear_multimodal_runtime()

This confirms which modality routes were used, whether heavy image/video/audio modules remain loaded, which adapter provenance is packaged, and which route/preset the orchestration layer would choose.

📂 Example Files

examples/generate_image_gallery.py: loads the full model and generates multiple images through model.generate_image.
examples/run_multimodal_demo.py: one-file text, image, video, and speech smoke demo through the loaded model object.
examples/contact_sheet.png: public image contact sheet.
examples/generated_image_gallery/manifest.md: prompts used for the public image gallery.
examples/image_mug.png: generated image example.
examples/video_showcase_ad.gif: generated aligned video GIF preview.
examples/video_showcase_ad.mp4: generated aligned MP4 with AAC audio.
examples/video_showcase_ad.wav: generated narration stem used for the final MP4.
examples/speech_identity.wav: generated speech identity example.
examples/speech_multimodal_surface.wav: generated speech capability example.
examples/text_identity_output.txt: generated text example.
examples/oneshot_website_demo.html: actual one-shot website artifact generated by the upload package.
examples/oneshot_website_demo.png: browser screenshot of the generated website artifact.

📊 Benchmark Summary

Text Smoke 66.7% after long-form guidance routing fix

Code Smoke 66.7% with CodeGuidanceSystem route active

Multimodal Smoke 100.0% route-level public API validation

lm-eval Slice Bounded local evaluator results, not leaderboard submission

Benchmark harness folder used locally: benchmarks_phillnet2/ beside this upload folder.

ℹ️ These are release-readiness smoke results, not SOTA claims. The root language weights were continued-trained with a 1000-step LoRA run on teknium/openhermes and nvidia/OpenCodeInstruct, then merged back into model.safetensors so the standard public load path uses the trained weights directly. The upload-facing benchmark script was rerun against the final package after the long-form sidecar-scaffold routing fix. Image, video, and audio artifact checks passed as route/artifact validation; they are not official media-quality leaderboard scores.

PhillNet-2 Benchmark Visuals

Release-readiness benchmark visuals for the experimental AXIOM multimodal GPT-OSS runtime. These charts summarize local smoke validation and bounded lm-evaluation-harness slices.

High-level benchmark panel for quick model-card scanning.

Local release-readiness smoke results. These are not SOTA or official leaderboard claims.

Bounded lm-evaluation-harness slice using primary metrics per task.

Visual status card for text, code, image, video, audio, and multimodal route validation.

Actual one-shot website artifact from test_oneshot_website.py, rendered from examples/oneshot_website_demo.html.

Suite	Score	Bar
HF folder validation	100.0%	██████████████████
Text local smoke	66.7%	████████████░░░░░░
Code local smoke	66.7%	████████████░░░░░░
Agent mock smoke	100.0%	██████████████████
Image artifacts	100.0%	██████████████████
Video artifacts	100.0%	██████████████████
Audio artifacts	100.0%	██████████████████
Multimodal smoke	100.0%	██████████████████

One-Shot Website Upload Test

The website test uses the packaged upload folder directly. The CodeGuidance sidecar is retained as an advisory scaffold, while the main model generates the final HTML artifact.

Check	Result
Command	`py test_oneshot_website.py`
Prompt recipe	Return only code; build a complete single-file responsive HTML/CSS landing page for a local AI startup called Phillnet; include sticky nav, hero, CTA, three feature cards, three pricing tiers, footer, dark theme, and blue accent; end with `</html>`.
Generated artifact	`examples/oneshot_website_demo.html`
Raw model output	`examples/oneshot_website_raw_output.txt`
Rendered screenshot	`examples/oneshot_website_demo.png`
Passes used	1
HTML character count	11,713
Deterministic repair used	No
Closed HTML document	Yes
Detected sections	`head`, `style`, `body`, `nav`, `hero`, `features`, `pricing`, and `footer`

Continued Training Pass

The root model weights already include this training merge. The packaged adapter folder is included for provenance and auditability, not as an extra user requirement.

Item	Result
Method	LoRA continued training, rank 16, alpha 32, merged into the root model weights
Datasets	`teknium/openhermes` + `nvidia/OpenCodeInstruct`
Steps	1000 optimizer steps, max sequence length 768, gradient accumulation 8
Trainable adapter parameters	10,822,656 during training before merge
Runtime	12024.5 seconds on local RTX 3060
Final train loss	0.9148
Local text smoke	55.6% before training, 66.7% after merged training and long-form guidance routing fix
Local code smoke	83.3% after merged training in the earlier code-only route pass; 66.7% in the latest rerun after guidance-sidecar prompt tightening
Adapter provenance	`adapters/phillnet2-text-continuation-lora-r16-openhermes-opencode-1000steps/` included for inspection; not required for normal loading

Fused Routing Layer

Route fusion here means a coordinated runtime path for guidance, thinking, swarm inspection, adapter provenance, and multimodal dispatch. It is not an official MoE leaderboard claim.

Route	Status	Purpose
Guidance	enabled selectively by default and always available with `use_guidance=True`	Uses compact internal guidance for complex tasks; long-form website/game/frontend prompts use the sidecar as an advisory scaffold while the main model remains responsible for the final generated artifact
Thinking	configured through `show_thinking`, `extended_thinking`, and `thinking_budget_tokens`	Keeps planning budget visible and inspectable instead of hidden behind an undocumented route
Swarm	`use_agentic_scaffold=True`	Provides task-signal detection and discussion scaffolding without mutating KV cache state
Adapter	1000-step LoRA merged into root weights	Normal model loading receives the trained text behavior directly
Inspection	`model.route_fusion_status(prompt)`	Reports guidance, thinking, swarm, adapter, and multimodal state for debugging and demos

Official lm-eval Benchmark Slice

Harness	Task	Limit	Metric	Result
`lm-evaluation-harness`	`arc_easy`	10 examples	accuracy	0.700
`lm-evaluation-harness`	`arc_easy`	10 examples	normalized accuracy	0.700
`lm-evaluation-harness`	`hellaswag`	10 examples	accuracy	0.300
`lm-evaluation-harness`	`hellaswag`	10 examples	normalized accuracy	0.400
`lm-evaluation-harness`	`lambada_openai`	10 examples	accuracy	0.300
`lm-evaluation-harness`	`lambada_openai`	10 examples	perplexity	13.093
`lm-evaluation-harness`	`piqa`	10 examples	accuracy	0.700
`lm-evaluation-harness`	`piqa`	10 examples	normalized accuracy	0.700
`lm-evaluation-harness`	`winogrande`	10 examples	accuracy	0.800
`lm-evaluation-harness`	`gsm8k`	10 examples	exact match, strict	0.300
`lm-evaluation-harness`	`gsm8k`	10 examples	exact match, flexible	0.300
`lm-evaluation-harness`	`truthfulqa_mc1`	10 examples	accuracy	0.300

This final upload-facing slice was rerun on 2026-05-25 with py run_lm_eval_compat.py --model-dir . --tasks arc_easy,hellaswag,lambada_openai,piqa,winogrande,gsm8k,truthfulqa_mc1 --limit 10 --device cuda:0 --out reports/final_lm_eval_slice. The JSON report is included at reports/final_lm_eval_slice/results.json. The wrapper supports lm_eval 0.4.9.2 + transformers 5.3.0, because that lm-eval release still references AutoModelForVision2Seq while Transformers 5 exposes AutoModelForImageTextToText.

How To Read These Scores

Release Readiness

HF validation, CUDA bf16 load, KV cache, runtime-root resolution, and route presence show that the package is structurally upload-ready.

Model Quality

The bounded lm-eval slices are real evaluator runs, but they use limits and should not be treated as full benchmark-suite leaderboards.

Media Routes

Image generation was rerun with real generated outputs. Video/audio numbers are artifact-route validation unless an official media-quality evaluator is installed and run.

Competition Framing

Phillnet-2 is positioned as a serious experimental multimodal runtime. It should be compared to other models only with identical tasks, limits, prompts, and hardware.

Code Guidance Route

Check	Result
Local code smoke	4 / 6 tasks passed in the latest rerun
Measured pass rate	66.7%
Route	small code-only prompts can use the CodeGuidanceSystem sidecar; long-form website/game/frontend prompts remain on the main model path with compact routing guidance
Language routing	Python, JavaScript, TypeScript, Java, C++, C#, Go, Rust, SQL, HTML, CSS, Bash, and PowerShell prompt detection
Optional coding suite status	EvalPlus installed; full HumanEval+/MBPP+ generation was not run in this pass
Skipped official coding suites	SWE-bench was not downloaded/run

Local Vending-Style Agent Smoke

Metric	Result
Official Vending-Bench	No, local smoke only
Days completed	10 / 10
Valid JSON decision rate	20.0%
Final cash	$521.30
Bankrupt	No

Image Generation Details (Real Generated Run)

Prompt ID	Result
`mug`	generated 128x128 PNG
`headphones`	generated 128x128 PNG
`city`	generated 128x128 PNG
`spatial`	generated 128x128 PNG

Validated Release Properties

Property	Result
Model identity	Phillnet-2
HF custom-code load	`trust_remote_code=True`
Public load path	`AutoTokenizer` / `AutoModelForCausalLM`
Runtime root	loaded repository folder, not dynamic-module cache
CUDA dtype smoke	`torch.bfloat16` verified locally
KV cache	enabled for text generation
Image route	artifact present, packaged local ImageGen route
Image prompt route	SDXL text encoders/tokenizers + local UNet/VAE + Phillnet adapter conditioning
Video route	MP4/GIF artifact present in the upload smoke run
Audio route	WAV artifact present in the upload smoke run
Cleanup	image/video/audio runtimes can be unloaded after calls

Skipped Or Partial Official Integrations

lm-evaluation-harness: bounded benchmark slices were run successfully after the 1000-step merged training pass. Full official task suites were not run.
EvalPlus HumanEval+/MBPP+: package installed; full generated-code suite not run in this pass.
BrowserGym / Inspect stack: install attempted, but Python 3.14 Windows native wheels for greenlet/lxml were not available and local compilation failed.
Vending-Bench / Vending-Bench 2: official suite not installed/run locally. A short local vending-style smoke was run and reported above, but it is not the Andon Labs benchmark.
SWE-bench, OSWorld, WebArena, BrowserGym, T2I-CompBench, VBench: heavy optional suites; install and run explicitly for leaderboard-style comparisons.

🚧 Limitations

Phillnet-2 is an experimental custom-code multimodal runtime. The included examples prove package loading, route wiring, artifact generation, CUDA bf16 loading, and local runtime resolution. They do not establish official leaderboard quality or SOTA status. Use trust_remote_code=True only in environments where you trust this repository.

Full official leaderboards such as SWE-bench, OSWorld, WebArena, BrowserGym, T2I-CompBench, VBench, and official Vending-Bench were not completed in this release pass.
The image route is validated through real local generation, but the bundled gallery is not an official T2I benchmark.
Video and audio routes expose packaged generation/listening APIs, but the current public numbers are route/artifact validation rather than official media-quality leaderboard scores.
Model quality should be compared using the exact benchmark command, task limit, hardware, dependency versions, and prompt format reported by each evaluator.

✅ Public Release Checklist

Status	Release Item
Ready	Loads through `AutoTokenizer` and `AutoModelForCausalLM` with `trust_remote_code=True`
Ready	Runtime root resolves to the uploaded repository folder
Ready	CUDA bf16 load verified locally
Ready	KV cache remains enabled for text generation
Ready	Image route uses packaged local text encoders/tokenizers, UNet/VAE, scheduler config, and adapter conditioning
Ready	Examples and generated media are included under `examples/`
Ready	Benchmark visuals and bounded benchmark tables are included with honest labels
Ready	Cache, local results, test clutter, and training clutter were kept out of the upload folder
Ready	Creator attribution and suggested citation are included

📜 Attribution, Usage, And License

Phillnet-2 is released under the Apache 2.0 license. The official public model package is Ayjays132/phillnet-2.

Created by Ayjays132, also known as Young Philly P. and Phillip A. Holland. If you use Phillnet-2 in research, demos, apps, derivatives, benchmarks, screenshots, generated media showcases, public writeups, or downstream model packages, please credit the creator and link back to the official model repository.

Credit Line

Phillnet-2 by Ayjays132 / Young Philly P. / Phillip A. Holland.

Official Source

Ayjays132/phillnet-2 is the canonical public Hugging Face package for this release.

Derivative Use

If you modify, fine-tune, wrap, benchmark, or redistribute the model, keep clear attribution to the original Phillnet-2 work.

Generated Media

When showing examples generated through this package, note that they were generated with Phillnet-2 where practical.

Suggested citation text:

Phillnet-2, an AXIOM multimodal GPT-OSS runtime by Ayjays132
(Young Philly P. / Phillip A. Holland), available as Ayjays132/phillnet-2.

Suggested BibTeX:

@misc{phillnet2_2026,
  title        = {Phillnet-2: AXIOM Multimodal GPT-OSS Runtime},
  author       = {Ayjays132 / Young Philly P. / Phillip A. Holland},
  year         = {2026},
  howpublished = {Hugging Face model package: Ayjays132/phillnet-2},
  note         = {Custom-code multimodal runtime with text, code guidance, image, video, speech, and audio routes}
}

Public release note: this model card, examples, benchmark visuals, and usage snippets are prepared so users can load the official package, understand what was tested, credit the creator, and reproduce the same public API path.

Downloads last month: 167

Safetensors

Model size

1B params

Tensor type

BF16