Instructions to use thad0ctor/torch2.12-cu133-cp312-wheels with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use thad0ctor/torch2.12-cu133-cp312-wheels with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("thad0ctor/torch2.12-cu133-cp312-wheels", dtype="auto") - Notebooks
- Google Colab
- Kaggle
PyTorch 2.12.0 + CUDA 13.3 Prebuilt Wheels (cp312)
Prebuilt Linux x86_64 Python wheels for the PyTorch 2.12.0 / CUDA 13.3 GPU stack,
compiled against CPython 3.12 (cp312). These let you skip multi-hour source builds of
flash-attn, causal-conv1d, triton, torchao, and friends on CUDA 13.3.
β οΈ These are not a model β this repo hosts Python
.whlfiles. The HF "model" repo type is used purely as LFS-backed storage.
Build matrix
| Field | Value |
|---|---|
| PyTorch | 2.12.0+cu133 |
| CUDA toolkit | 13.3 |
| Python ABI | CPython 3.12 (cp312) |
| Platform | Linux x86_64 |
| CUDA arch list | 8.6; 8.9; 9.0; 10.0; 12.0 |
| Build date | 2026-06-14 |
Build environment
| Field | Value |
|---|---|
| Distro | Ubuntu 24.04.4 LTS (Noble Numbat) |
| Kernel | 6.17.0-35-generic (x86_64) |
| glibc | 2.39 |
| Compiler | gcc 13.3.0 |
| nvcc | CUDA 13.3, V13.3.33 |
Runtime requirement: the
linux_x86_64wheels (torch, flash-attn, causal-conv1d, bitsandbytes, torchao, torch{audio,vision}, mslk) are glibc β₯ 2.39 builds β they need a distro at least as new as Ubuntu 24.04 / Debian 13 / RHEL 10. Older glibc will fail to load them.tritonismanylinux_2_27/manylinux_2_28and is more portable (glibc β₯ 2.28).
Target GPU architectures
| SM | Arch | Example GPUs |
|---|---|---|
8.6 |
Ampere | RTX 30-series, A10 / A40 |
8.9 |
Ada Lovelace | RTX 40-series, L4 / L40(S) |
9.0 |
Hopper | H100 / H200 |
10.0 |
Blackwell (datacenter) | B100 / B200 / GB200 |
12.0 |
Blackwell (consumer) | RTX 50-series |
Wheels
CUDA arches below are the SM targets verified from the compiled binaries (cuobjdump),
not just the requested build list.
| Package | Version | File | Size | CUDA arches (SM) |
|---|---|---|---|---|
| torch | 2.12.0+cu133 | torch-2.12.0+cu133-cp312-cp312-linux_x86_64.whl |
653M | 8.6 / 8.9 / 9.0 / 10.0 / 12.0 (+ 90a 100a/f 103a 120a 121a variants) |
| vllm | 0.1.dev1+gc621af169 | vllm-0.1.dev1+gc621af169.cu133.torch2.12.0.cu133-cp312-cp312-linux_x86_64.whl |
548M | 8.6 / 8.9 / 9.0 / 10.0 / 12.0 (+ 100f) |
| flashinfer-jit-cache | 0.6.12 | flashinfer_jit_cache-0.6.12+torch2.12.0.cu133-cp39-abi3-manylinux_2_28_x86_64.whl |
1.0G | 8.6 / 8.9 / 9.0 / 10.0 / 12.0 (90a 100a 120f variants) |
| flashinfer-python | 0.6.12 | flashinfer_python-0.6.12+torch2.12.0.cu133-py3-none-any.whl |
14M | pure Python (uses flashinfer-jit-cache / runtime JIT for kernels) |
| triton | 3.7.0 | triton-3.7.0+torch2.12.0.cu133-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl |
193M | JIT β bundles ptxas + ptxas-blackwell (compiles per-GPU at runtime) |
| flash-attn | 2.8.3 | flash_attn-2.8.3+torch2.12.0.cu133-cp312-cp312-linux_x86_64.whl |
234M | 8.6 / 8.9 / 9.0 / 10.0 / 12.0 |
| causal-conv1d | 1.6.2.post1 | causal_conv1d-1.6.2.post1+torch2.12.0.cu133-cp312-cp312-linux_x86_64.whl |
201M | 8.6 / 8.9 / 9.0 / 10.0 / 12.0 |
| bitsandbytes | 0.49.1 | bitsandbytes-0.49.1+torch2.12.0.cu133-cp312-cp312-linux_x86_64.whl |
2.4M | 8.6 / 8.9 / 9.0 / 10.0 / 12.0 |
| torchao | 0.18.0+gitc92676e | torchao-0.18.0+gitc92676e.torch2.12.0.cu133-cp310-abi3-linux_x86_64.whl |
3.6M | 8.6 / 8.9 / 9.0 / 10.0 / 12.0 |
| torchvision | 0.27.0+cu133 | torchvision-0.27.0+cu133-cp312-cp312-linux_x86_64.whl |
2.6M | 8.6 / 8.9 / 9.0 / 10.0 / 12.0 |
| torchaudio | 2.11.0+cu133 | torchaudio-2.11.0+cu133-cp312-cp312-linux_x86_64.whl |
2.9M | 8.6 / 8.9 / 9.0 / 10.0 / 12.0 |
| mslk-cuda-nightly | 2026.6.14 | mslk_cuda_nightly-2026.6.14+torch2.12.0.cu133-cp312-cp312-linux_x86_64.whl |
19M | 10.0 / 12.0 (Blackwell only) |
Installation
Requires Python 3.12 on Linux x86_64 with an NVIDIA driver supporting CUDA 13.3.
Install torch first so the other wheels resolve against it:
# 1. Grab the wheels
pip install huggingface_hub
hf download thad0ctor/torch2.12-cu133-cp312-wheels --local-dir wheels --repo-type model
# 2. Install torch first, then the rest
pip install wheels/torch-2.12.0+cu133-cp312-cp312-linux_x86_64.whl
pip install wheels/*.whl
Or install a single wheel straight from the Hub:
pip install \
https://huggingface.co/thad0ctor/torch2.12-cu133-cp312-wheels/resolve/main/flash_attn-2.8.3+torch2.12.0.cu133-cp312-cp312-linux_x86_64.whl
Notes
- Local version tags (e.g.
+torch2.12.0.cu133) encode the exact torch/CUDA combo the wheel was built against β keep the whole stack on the same combo to avoid ABI mismatches. torchaois built ascp310-abi3(stable ABI) so it loads under cp312.- Most wheels cover
8.6 β 12.0; pre-Ampere cards (sm < 8.6) are not supported. mslk-cuda-nightlyis Blackwell-only (sm_100 / sm_120) β it will not run on Ampere/Ada/Hopper.tritonis a JIT compiler: it bundlesptxas(incl. a Blackwell build) and compiles kernels for your actual GPU at runtime, so it has no fixed baked-in arch set.vllmis a from-source build off vLLMmainat commitc621af169(vcs version0.1.dev1, i.e. an untagged dev build). Pin the exact commit if you need to reproduce this wheel βpip install vllm==0.1.dev1+gc621af169.cu133is not resolvable from PyPI.- flashinfer ships as two wheels β install both:
flashinfer-pythonis the pure-Python frontend (py3-none-any, portable).flashinfer-jit-cacheis the matching ahead-of-time precompiled kernel cache (1 GB of cubins) so you don't pay runtime JIT compilation. It iscp39-abi3/manylinux_2_28(glibc β₯ 2.28) and its cubins target8.6 / 8.9 / 9.0a / 10.0a / 12.0f. The two versions must match (0.6.12).