ModelPiper commited on
Commit
cff3e76
Β·
verified Β·
1 Parent(s): f4e8bb7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +200 -0
README.md ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: coreml
3
+ pipeline_tag: image-to-image
4
+ tags:
5
+ - super-resolution
6
+ - apple-silicon
7
+ - neural-engine
8
+ - ane
9
+ - coreml
10
+ - real-time
11
+ - video-upscaling
12
+ - macos
13
+ license: apache-2.0
14
+ datasets:
15
+ - eugenesiow/Div2k
16
+ metrics:
17
+ - psnr
18
+ - ssim
19
+ model-index:
20
+ - name: PiperSR-2x
21
+ results:
22
+ - task:
23
+ type: image-super-resolution
24
+ name: Image Super-Resolution
25
+ dataset:
26
+ type: Set5
27
+ name: Set5
28
+ metrics:
29
+ - type: psnr
30
+ value: 37.54
31
+ name: PSNR
32
+ - task:
33
+ type: image-super-resolution
34
+ name: Image Super-Resolution
35
+ dataset:
36
+ type: Set14
37
+ name: Set14
38
+ metrics:
39
+ - type: psnr
40
+ value: 33.21
41
+ name: PSNR
42
+ - task:
43
+ type: image-super-resolution
44
+ name: Image Super-Resolution
45
+ dataset:
46
+ type: BSD100
47
+ name: BSD100
48
+ metrics:
49
+ - type: psnr
50
+ value: 31.98
51
+ name: PSNR
52
+ - task:
53
+ type: image-super-resolution
54
+ name: Image Super-Resolution
55
+ dataset:
56
+ type: Urban100
57
+ name: Urban100
58
+ metrics:
59
+ - type: psnr
60
+ value: 31.38
61
+ name: PSNR
62
+ ---
63
+
64
+ # PiperSR-2x: ANE-Native Super Resolution for Apple Silicon
65
+
66
+ Real-time 2x AI upscaling on Apple's Neural Engine. 44.4 FPS at 720p on M2 Max, 928 KB model, every op runs natively on ANE with zero CPU/GPU fallback.
67
+
68
+ Not a converted PyTorch model β€” an architecture designed from ANE hardware measurements. Every dimension, operation, and data type is dictated by Neural Engine characteristics.
69
+
70
+ ## Key Results
71
+
72
+ | Model | Params | Set5 | Set14 | BSD100 | Urban100 |
73
+ |-------|--------|------|-------|--------|----------|
74
+ | Bicubic | β€” | 33.66 | 30.24 | 29.56 | 26.88 |
75
+ | FSRCNN | 13K | 37.05 | 32.66 | 31.53 | 29.88 |
76
+ | **PiperSR** | **453K** | **37.54** | **33.21** | **31.98** | **31.38** |
77
+ | SAFMN | 228K | 38.00 | ~33.7 | ~32.2 | β€” |
78
+
79
+ Beats FSRCNN across all benchmarks. Within 0.46 dB of SAFMN on Set5 β€” below the perceptual threshold for most content.
80
+
81
+ ## Performance
82
+
83
+ | Configuration | FPS | Hardware | Notes |
84
+ |--------------|-----|----------|-------|
85
+ | Full-frame 640Γ—360 β†’ 1280Γ—720 | 44.4 | M2 Max | ANE predict 20.8 ms |
86
+ | 128Γ—128 tiles (static weights) | 125.6 | M2 | Baked weights, 2.82Γ— vs dynamic |
87
+ | 128Γ—128 tiles (dynamic weights) | 44.5 | M2 | CoreML default |
88
+
89
+ Real-time 2Γ— upscaling at 30+ FPS on any Mac with Apple Silicon. The ANE sits idle during video playback β€” PiperSR puts it to work.
90
+
91
+ ## Architecture
92
+
93
+ 453K-parameter network: 6 residual blocks at 64 channels with BatchNorm and SiLU activations, upscaling via PixelShuffle.
94
+
95
+ ```
96
+ Input (128Γ—128Γ—3 FP16)
97
+ β†’ Head: Conv 3Γ—3 (3 β†’ 64)
98
+ β†’ Body: 6Γ— ResBlock [Conv 3Γ—3 β†’ BatchNorm β†’ SiLU β†’ Conv 3Γ—3 β†’ BatchNorm β†’ Residual Add]
99
+ β†’ Tail: Conv 3Γ—3 (64 β†’ 12) β†’ PixelShuffle(2)
100
+ Output (256Γ—256Γ—3)
101
+ ```
102
+
103
+ Compiles to 5 MIL ops: `conv`, `add`, `silu`, `pixel_shuffle`, `const`. All verified ANE-native.
104
+
105
+ ### Why ANE-native matters
106
+
107
+ Off-the-shelf super resolution models (SPAN, Real-ESRGAN) were designed for CUDA GPUs and converted to CoreML after the fact. They waste the ANE:
108
+
109
+ - **Misaligned channels** (48 instead of 64) waste 25%+ of each ANE tile
110
+ - **Monolithic full-frame** tensors serialize the ANE's parallel compute lanes
111
+ - **Silent CPU fallback** from unsupported ops can 5-10Γ— latency
112
+ - **No batched tiles** means 60Γ— dispatch overhead
113
+
114
+ PiperSR addresses every one of these by designing around ANE constraints.
115
+
116
+ ## Model Variants
117
+
118
+ | File | Use Case | Input β†’ Output |
119
+ |------|----------|----------------|
120
+ | `PiperSR_2x.mlpackage` | Static images (128px tiles) | 128Γ—128 β†’ 256Γ—256 |
121
+ | `PiperSR_2x_video_720p.mlpackage` | Video (full-frame, BN-fused) | 640Γ—360 β†’ 1280Γ—720 |
122
+ | `PiperSR_2x_256.mlpackage` | Static images (256px tiles) | 256Γ—256 β†’ 512Γ—512 |
123
+
124
+ ## Usage
125
+
126
+ ### With ToolPiper (recommended)
127
+
128
+ PiperSR is integrated into [ToolPiper](https://modelpiper.com), a local macOS AI toolkit. Install ToolPiper, enable the MediaPiper browser extension, and every 720p video on the web is upscaled to 1440p in real time.
129
+
130
+ ```bash
131
+ # Via MCP tool
132
+ mcp__toolpiper__image_upscale image=/path/to/image.png
133
+
134
+ # Via REST API
135
+ curl -X POST http://127.0.0.1:9998/v1/images/upscale \
136
+ -F "image=@input.png" \
137
+ -o upscaled.png
138
+ ```
139
+
140
+ ### With CoreML (Swift)
141
+
142
+ ```swift
143
+ import CoreML
144
+
145
+ let config = MLModelConfiguration()
146
+ config.computeUnits = .cpuAndNeuralEngine // NOT .all β€” .all is 23.6% slower
147
+
148
+ let model = try PiperSR_2x(configuration: config)
149
+ let input = try PiperSR_2xInput(x: pixelBuffer)
150
+ let output = try model.prediction(input: input)
151
+ // output.var_185 contains the 2Γ— upscaled image
152
+ ```
153
+
154
+ > **Important:** Use `.cpuAndNeuralEngine`, not `.all`. CoreML's `.all` silently misroutes pure-ANE ops onto the GPU, causing a 23.6% slowdown for this model.
155
+
156
+ ### With coremltools (Python)
157
+
158
+ ```python
159
+ import coremltools as ct
160
+ from PIL import Image
161
+ import numpy as np
162
+
163
+ model = ct.models.MLModel("PiperSR_2x.mlpackage")
164
+
165
+ img = Image.open("input.png").resize((128, 128))
166
+ arr = np.array(img).astype(np.float32) / 255.0
167
+ arr = np.transpose(arr, (2, 0, 1))[np.newaxis] # NCHW
168
+
169
+ result = model.predict({"x": arr})
170
+ ```
171
+
172
+ ## Training
173
+
174
+ Trained on DIV2K (800 training images) with L1 loss and random augmentation (flips, rotations). Total training cost: ~$6 on RunPod A6000 instances. Full training journey documented from 33.46 dB to 37.54 dB across 12 experiment findings.
175
+
176
+ ## Technical Details
177
+
178
+ - **Compute units:** `.cpuAndNeuralEngine` (ANE primary, CPU for I/O only)
179
+ - **Precision:** Float16
180
+ - **Input format:** NCHW, normalized to [0, 1]
181
+ - **Output format:** NCHW, [0, 1]
182
+ - **Model size:** 928 KB (compiled .mlmodelc)
183
+ - **Parameters:** 453K
184
+ - **ANE ops used:** conv, batch_norm (fused at inference), silu, add, pixel_shuffle, const
185
+ - **CPU fallback ops:** None
186
+
187
+ ## License
188
+
189
+ Apache 2.0
190
+
191
+ ## Citation
192
+
193
+ ```bibtex
194
+ @software{pipersr2025,
195
+ title={PiperSR: ANE-Native Super Resolution for Apple Silicon},
196
+ author={ModelPiper},
197
+ year={2025},
198
+ url={https://huggingface.co/ModelPiper/PiperSR-2x}
199
+ }
200
+ ```