Instructions to use code-and-canvas/Walkyrie-1.3B-v2.0-CoreML-Int8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use code-and-canvas/Walkyrie-1.3B-v2.0-CoreML-Int8 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("code-and-canvas/Walkyrie-1.3B-v2.0-CoreML-Int8", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Walkyrie-1.3B-v2.0 Core ML (Int8 Quantized)
This repository contains the first native Apple Silicon Core ML conversion of the Walkyrie-1.3B-v2.0 core transformer brain, an image model built on top of the Wan 2.1 Diffusion Transformer (DiT) framework.
The model weights have been quantized to Int8 integers, compressing the block footprint down to ~1.5 GB. This allows the model to run inside native Apple apps with massive memory headroom on standard 16GB Apple Silicon devices (M-series Macs, iPads).
Repository Layout
Walkyrie_1.3B_v2.0_Int8.mlpackage: The complete 30-block core DiT transformer layer, fully optimized to execute on the Apple Neural Engine (ANE) and Apple Graphics Processor (GPU).
Implementation & Pipeline Notes
This asset contains only the core transformer block. To build a complete text-to-image pipeline inside a native Swift application, you will need to pair this core package with a text tokenizer and a VAE decoder:
- Text Encoder (UMT5-XXL): Because compiling an 11B parameter text encoder directly to a static Core ML graph triggers high memory overhead during compilation on 16GB machines, it is highly recommended to run the UMT5 text layer as a raw weight array processed on the CPU/GPU via libraries like
swift-tokenizersormlx-swift. - VAE Decoder: Can be mapped natively via standard Core ML convolutional upsampling to translate the finished transformer latents into viewable RGB images.
๐ ๏ธ Replication & Conversion Process
If you want to re-compile or modify this setup from scratch using the silicon-alloy converter or direct coremltools tracing, you must bypass several legacy architectural structural mismatches hardcoded into older diffusion conversion scripts.
The original codebase must be patched with the following workflow modifications:
1. Alignment with Modern Diffusers Layer Naming
The newer Wan 2.1 architecture uses updated property names. Legacy scripts searching for sub-modules will throw immediate AttributeErrors unless mapped to the following properties:
- Change
.transformer_blocksreferences to.blocks - Change
.patch_embedreferences to.patch_embedding
2. Migrating to the Unified Condition Embedder
Older models process prompt token arrays and timesteps via isolated .text_embed() and .time_embed() functions. Wan 2.1 consolidates these into a single unified block.
- Remove the standalone text and time embedding calls.
- Call the unified module directly:
temb, timestep_proj, encoder_hidden_states, _ = self.model.condition_embedder(timestep, encoder_hidden_states, None) - Unflatten the resulting projection matrix into its multi-head layout before passing it along:
timestep_proj = timestep_proj.unflatten(1, (6, -1))
3. Spatial Tensor Flattening vs. 5D RoPE Tracking
The patch embedding layer outputs a 5D spatial video matrix structured as [Batch, Hidden_Dim, Frames, Height, Width]. The transformer blocks, however, expect a flattened 3D sequence token vector [Batch, Sequence_Length, Hidden_Dim]. Crucially, the Rotary Position Embedding (.rope) module still requires the 5D spatial layout to calculate coordinates.
- The correct execution sequence: Pass the 5D spatial matrix into the
.rope()module first to extract your rotary embedding parameters:image_rotary_emb = self.model.rope(hidden_states_5d) - Flatten and transpose the spatial matrix into sequence tokens second, right before launching your core transformer blocks loop:
hidden_states = hidden_states_5d.flatten(2).transpose(1, 2)
Acknowledgements
- Original model weights trained and released by kpsss34.
- Core ML compilation achieved via the
silicon-alloyframework.
- Downloads last month
- 7