+

Causal Conv1D Benchmarks - Aggregated Results

+

This document combines benchmark results from multiple Causal Conv1D implementations.

+

Combined Summary and Visualization

+
+ + + + + + + 2025-10-29T00:37:16.145885 + image/svg+xml + + + Matplotlib v3.10.7, https://matplotlib.org/ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cuda_B2_D64_S128_W2 + + + + + + + + + + + + + cuda_B2_D64_S128_W4 + + + + + + + + + + + + + cuda_B2_D64_S512_W2 + + + + + + + + + + + + + cuda_B2_D64_S512_W4 + + + + + + + + + + + + + cuda_B2_D64_S2048_W2 + + + + + + + + + + + + + cuda_B2_D64_S2048_W4 + + + + + + + + + + + + + cuda_B2_D2048_S128_W2 + + + + + + + + + + + + + cuda_B2_D2048_S128_W4 + + + + + + + + + + + + + cuda_B2_D2048_S512_W2 + + + + + + + + + + + + + cuda_B2_D2048_S512_W4 + + + + + + + + + + + + + cuda_B2_D2048_S2048_W2 + + + + + + + + + + + + + cuda_B2_D2048_S2048_W4 + + + + + + + + + + + + + cuda_B4_D64_S128_W2 + + + + + + + + + + + + + cuda_B4_D64_S128_W4 + + + + + + + + + + + + + cuda_B4_D64_S512_W2 + + + + + + + + + + + + + cuda_B4_D64_S512_W4 + + + + + + + + + + + + + cuda_B4_D64_S2048_W2 + + + + + + + + + + + + + cuda_B4_D64_S2048_W4 + + + + + + + + + + + + + cuda_B4_D2048_S128_W2 + + + + + + + + + + + + + cuda_B4_D2048_S128_W4 + + + + + + + + + + + + + cuda_B4_D2048_S512_W2 + + + + + + + + + + + + + cuda_B4_D2048_S512_W4 + + + + + + + + + + + + + cuda_B4_D2048_S2048_W2 + + + + + + + + + + + + + cuda_B4_D2048_S2048_W4 + + + + Workload + + + + + + + + + + + + + + + + + 0.1 + + + + + + + + + + + + + 0.2 + + + + + + + + + + + + + 0.3 + + + + + + + + + + + + + 0.4 + + + + + + + + + + + + + 0.5 + + + + Latency P50 (ms) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Attention Implementation Latency + + + + + + + + + + + + + hf_kernels_causal_conv1d + + + + + + + + + torch_eager + + + + + + + + + + +
+ +
+
+ +▶ code +▼ output + ▶ uv-logs + | +Cell: combine | 4.43s + | + +Raw +
+ +
+
======================================================================
+LOADING BENCHMARK DATA
+======================================================================
+✓ HF Kernels Causal Conv1D      : /__w/kernels-benchmarks/kernels-benchmarks/benches/causal_conv1d/impls/.uvnote/cache/7a691bd653e23c412c5d29fbc92ea1454823ea437864cf9473fc561b116ef3d9
+✓ PyTorch Causal Conv1D         : /__w/kernels-benchmarks/kernels-benchmarks/benches/causal_conv1d/impls/.uvnote/cache/70757e27f2df1dfde4905a24527bb4ca6f0f8df7dac2e2ecaa0ddc359c7d5e64
+
+  ✓ Found HF Kernels Causal Conv1D
+     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/causal_conv1d/impls/.uvnote/cache/7a691bd653e23c412c5d29fbc92ea1454823ea437864cf9473fc561b116ef3d9/causal_conv1d.jsonl
+  ✓ Found PyTorch Causal Conv1D
+     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/causal_conv1d/impls/.uvnote/cache/70757e27f2df1dfde4905a24527bb4ca6f0f8df7dac2e2ecaa0ddc359c7d5e64/causal_conv1d.jsonl
+
+======================================================================
+Summary: 2 found, 0 skipped, 0 missing
+======================================================================
+
+COMBINED BENCHMARK SUMMARY
+
+impl                     wl                  p50(ms)  ok
+hf_kernels_causal_conv1d cuda_B2_D2048_S128_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D2048_S128_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D2048_S2048_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D2048_S2048_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D2048_S512_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D2048_S512_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D64_S128_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D64_S128_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D64_S2048_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D64_S2048_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D64_S512_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B2_D64_S512_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D2048_S128_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D2048_S128_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D2048_S2048_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D2048_S2048_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D2048_S512_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D2048_S512_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D64_S128_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D64_S128_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D64_S2048_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D64_S2048_W4     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D64_S512_W2     0.05  True
+hf_kernels_causal_conv1d cuda_B4_D64_S512_W4     0.05  True
+torch_eager              cuda_B2_D2048_S128_W2     0.08  True
+torch_eager              cuda_B2_D2048_S128_W4     0.09  True
+torch_eager              cuda_B2_D2048_S2048_W2     0.15  True
+torch_eager              cuda_B2_D2048_S2048_W4     0.16  True
+torch_eager              cuda_B2_D2048_S512_W2     0.08  True
+torch_eager              cuda_B2_D2048_S512_W4     0.08  True
+torch_eager              cuda_B2_D64_S128_W2     0.07  True
+torch_eager              cuda_B2_D64_S128_W4     0.09  True
+torch_eager              cuda_B2_D64_S2048_W2     0.09  True
+torch_eager              cuda_B2_D64_S2048_W4     0.08  True
+torch_eager              cuda_B2_D64_S512_W2     0.09  True
+torch_eager              cuda_B2_D64_S512_W4     0.09  True
+torch_eager              cuda_B4_D2048_S128_W2     0.09  True
+torch_eager              cuda_B4_D2048_S128_W4     0.08  True
+torch_eager              cuda_B4_D2048_S2048_W2     0.49  True
+torch_eager              cuda_B4_D2048_S2048_W4     0.50  True
+torch_eager              cuda_B4_D2048_S512_W2     0.09  True
+torch_eager              cuda_B4_D2048_S512_W4     0.10  True
+torch_eager              cuda_B4_D64_S128_W2     0.08  True
+torch_eager              cuda_B4_D64_S128_W4     0.08  True
+torch_eager              cuda_B4_D64_S2048_W2     0.08  True
+torch_eager              cuda_B4_D64_S2048_W4     0.09  True
+torch_eager              cuda_B4_D64_S512_W2     0.08  True
+torch_eager              cuda_B4_D64_S512_W4     0.08  True
+
+GENERATING COMBINED VISUALIZATION
+
+Loaded 48 records
+✓ Visualization saved as latency.svg
+Saved latency.png
+✓ Visualization saved as latency.svg
+✓ SVG visualization ready!
+
+ANALYSIS COMPLETE
+Total implementations analyzed: 2
+
+Implementations included:
+  ✓ HF Kernels Causal Conv1D
+  ✓ PyTorch Causal Conv1D
+
+
+
▶ UV Install Logs
+ +
+
+

Artifacts:

+latency.svg +
+ + + + + + + 2025-10-29T00:37:16.145885 + image/svg+xml + + + Matplotlib v3.10.7, https://matplotlib.org/ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cuda_B2_D64_S128_W2 + + + + + + + + + + + + + cuda_B2_D64_S128_W4 + + + + + + + + + + + + + cuda_B2_D64_S512_W2 + + + + + + + + + + + + + cuda_B2_D64_S512_W4 + + + + + + + + + + + + + cuda_B2_D64_S2048_W2 + + + + + + + + + + + + + cuda_B2_D64_S2048_W4 + + + + + + + + + + + + + cuda_B2_D2048_S128_W2 + + + + + + + + + + + + + cuda_B2_D2048_S128_W4 + + + + + + + + + + + + + cuda_B2_D2048_S512_W2 + + + + + + + + + + + + + cuda_B2_D2048_S512_W4 + + + + + + + + + + + + + cuda_B2_D2048_S2048_W2 + + + + + + + + + + + + + cuda_B2_D2048_S2048_W4 + + + + + + + + + + + + + cuda_B4_D64_S128_W2 + + + + + + + + + + + + + cuda_B4_D64_S128_W4 + + + + + + + + + + + + + cuda_B4_D64_S512_W2 + + + + + + + + + + + + + cuda_B4_D64_S512_W4 + + + + + + + + + + + + + cuda_B4_D64_S2048_W2 + + + + + + + + + + + + + cuda_B4_D64_S2048_W4 + + + + + + + + + + + + + cuda_B4_D2048_S128_W2 + + + + + + + + + + + + + cuda_B4_D2048_S128_W4 + + + + + + + + + + + + + cuda_B4_D2048_S512_W2 + + + + + + + + + + + + + cuda_B4_D2048_S512_W4 + + + + + + + + + + + + + cuda_B4_D2048_S2048_W2 + + + + + + + + + + + + + cuda_B4_D2048_S2048_W4 + + + + Workload + + + + + + + + + + + + + + + + + 0.1 + + + + + + + + + + + + + 0.2 + + + + + + + + + + + + + 0.3 + + + + + + + + + + + + + 0.4 + + + + + + + + + + + + + 0.5 + + + + Latency P50 (ms) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Attention Implementation Latency + + + + + + + + + + + + + hf_kernels_causal_conv1d + + + + + + + + + torch_eager + + + + + + + + + + +
+
+
+
+