+

SwiGLU Activation Benchmarks - Aggregated Results

+

This document combines benchmark results from multiple SwiGLU activation implementations.

+

Combined Summary and Visualization

+
+ + + + + + + 2025-10-29T00:37:20.527749 + image/svg+xml + + + Matplotlib v3.10.7, https://matplotlib.org/ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cuda_T128_D768 + + + + + + + + + + + + + cuda_T128_D1024 + + + + + + + + + + + + + cuda_T128_D2048 + + + + + + + + + + + + + cuda_T256_D768 + + + + + + + + + + + + + cuda_T256_D1024 + + + + + + + + + + + + + cuda_T256_D2048 + + + + + + + + + + + + + cuda_T512_D768 + + + + + + + + + + + + + cuda_T512_D1024 + + + + + + + + + + + + + cuda_T512_D2048 + + + + Workload + + + + + + + + + + + + + + + + + 0.025 + + + + + + + + + + + + + 0.030 + + + + + + + + + + + + + 0.035 + + + + + + + + + + + + + 0.040 + + + + + + + + + + + + + 0.045 + + + + + + + + + + + + + 0.050 + + + + Latency P50 (ms) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Attention Implementation Latency + + + + + + + + + + + + + hf_kernels_swiglu + + + + + + + + + torch_eager + + + + + + + + + + +
+ +
+
+ +▶ code +▼ output + ▶ uv-logs + | +Cell: combine | 4.29s + | + +Raw +
+ +
+
======================================================================
+LOADING BENCHMARK DATA
+======================================================================
+✓ HF Kernels SwiGLU             : /__w/kernels-benchmarks/kernels-benchmarks/benches/activation/impls/.uvnote/cache/2775e6386f1caf1fda935a997130c06dcaf7641efb0db21560c35301fdabfd9b
+✓ PyTorch SwiGLU                : /__w/kernels-benchmarks/kernels-benchmarks/benches/activation/impls/.uvnote/cache/661ca38adec8893d7c284140e922da661f0afcea4aaff6a3bf48a6494ce7c6eb
+
+  ✓ Found HF Kernels SwiGLU
+     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/activation/impls/.uvnote/cache/2775e6386f1caf1fda935a997130c06dcaf7641efb0db21560c35301fdabfd9b/activation.jsonl
+  ✓ Found PyTorch SwiGLU
+     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/activation/impls/.uvnote/cache/661ca38adec8893d7c284140e922da661f0afcea4aaff6a3bf48a6494ce7c6eb/activation.jsonl
+
+======================================================================
+Summary: 2 found, 0 skipped, 0 missing
+======================================================================
+
+COMBINED BENCHMARK SUMMARY
+
+impl                     wl                  p50(ms)  ok
+hf_kernels_swiglu        cuda_T128_D1024        0.03  True
+hf_kernels_swiglu        cuda_T128_D2048        0.03  True
+hf_kernels_swiglu        cuda_T128_D768         0.02  True
+hf_kernels_swiglu        cuda_T256_D1024        0.03  True
+hf_kernels_swiglu        cuda_T256_D2048        0.03  True
+hf_kernels_swiglu        cuda_T256_D768         0.03  True
+hf_kernels_swiglu        cuda_T512_D1024        0.03  True
+hf_kernels_swiglu        cuda_T512_D2048        0.03  True
+hf_kernels_swiglu        cuda_T512_D768         0.03  True
+torch_eager              cuda_T128_D1024        0.05  True
+torch_eager              cuda_T128_D2048        0.05  True
+torch_eager              cuda_T128_D768         0.04  True
+torch_eager              cuda_T256_D1024        0.05  True
+torch_eager              cuda_T256_D2048        0.05  True
+torch_eager              cuda_T256_D768         0.05  True
+torch_eager              cuda_T512_D1024        0.05  True
+torch_eager              cuda_T512_D2048        0.05  True
+torch_eager              cuda_T512_D768         0.05  True
+
+GENERATING COMBINED VISUALIZATION
+
+Loaded 18 records
+✓ Visualization saved as latency.svg
+Saved latency.png
+✓ Visualization saved as latency.svg
+✓ SVG visualization ready!
+
+ANALYSIS COMPLETE
+Total implementations analyzed: 2
+
+Implementations included:
+  ✓ HF Kernels SwiGLU
+  ✓ PyTorch SwiGLU
+
+
+
▶ UV Install Logs
+ +
+
+

Artifacts:

+latency.svg +
+ + + + + + + 2025-10-29T00:37:20.527749 + image/svg+xml + + + Matplotlib v3.10.7, https://matplotlib.org/ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cuda_T128_D768 + + + + + + + + + + + + + cuda_T128_D1024 + + + + + + + + + + + + + cuda_T128_D2048 + + + + + + + + + + + + + cuda_T256_D768 + + + + + + + + + + + + + cuda_T256_D1024 + + + + + + + + + + + + + cuda_T256_D2048 + + + + + + + + + + + + + cuda_T512_D768 + + + + + + + + + + + + + cuda_T512_D1024 + + + + + + + + + + + + + cuda_T512_D2048 + + + + Workload + + + + + + + + + + + + + + + + + 0.025 + + + + + + + + + + + + + 0.030 + + + + + + + + + + + + + 0.035 + + + + + + + + + + + + + 0.040 + + + + + + + + + + + + + 0.045 + + + + + + + + + + + + + 0.050 + + + + Latency P50 (ms) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Attention Implementation Latency + + + + + + + + + + + + + hf_kernels_swiglu + + + + + + + + + torch_eager + + + + + + + + + + +
+
+
+
+