DGX Spark FlashAttention wheel

This repository hosts a DGX Spark compatible FlashAttention wheel used by the PII fine-tuning tutorial stack.

The wheel was taken from the known-good unsloth-bench environment on DGX Spark Alice and is intended for:

NVIDIA DGX Spark / Linux aarch64
Python 3.12 / CPython cp312
CUDA 13 PyTorch stack
flash_attn==2.8.3.post1

Wheel URL:

https://huggingface.co/N8Programs/dgx-flash-attn-wheel/resolve/main/flash_attn-2.8.3.post1-cp312-cp312-linux_aarch64.whl

1. Create a clean conda environment

On the DGX Spark host:

conda create -n piiEnv python=3.12 pip -y
conda activate piiEnv
python -m pip install --upgrade pip

Use python -m pip from inside the conda environment. If pip install reports an externally-managed-environment error, the environment is probably missing its own Python/pip and is falling back to system Python.

2. Install the CUDA 13 PyTorch base

python -m pip install \
  --extra-index-url https://download.pytorch.org/whl/cu130 \
  torch==2.12.0+cu130 \
  triton==3.7.0 \
  packaging==26.2 \
  ninja==1.13.0 \
  psutil==7.2.2

3. Install the DGX Spark FlashAttention wheel

python -m pip install \
  https://huggingface.co/N8Programs/dgx-flash-attn-wheel/resolve/main/flash_attn-2.8.3.post1-cp312-cp312-linux_aarch64.whl

This avoids rebuilding FlashAttention from source. On DGX Spark, source builds can be slow and easy to misconfigure because FlashAttention may try to compile multiple CUDA architectures unless carefully constrained.

4. Install the training stack

python -m pip install \
  transformers==5.5.0 \
  accelerate==1.13.0 \
  peft==0.19.1 \
  trl==0.24.0 \
  datasets==5.0.0 \
  bitsandbytes==0.49.2 \
  cut-cross-entropy==25.1.1 \
  sentencepiece==0.2.1 \
  tokenizers==0.22.2 \
  safetensors==0.8.0 \
  protobuf==7.35.0 \
  numpy==2.4.6 \
  pandas==3.0.3 \
  Pillow==12.2.0 \
  PyYAML==6.0.3 \
  hf_transfer==0.1.9 \
  msgspec==0.21.1 \
  tqdm==4.68.2 \
  tyro==1.0.13 \
  wandb==0.27.2

5. Install Unsloth without dependency resolution

The known-good DGX Spark stack uses torch==2.12.0+cu130, but the current Unsloth package metadata may declare an older Torch upper bound. Preserve the working CUDA stack by installing Unsloth without dependencies:

python -m pip install --no-deps unsloth_zoo==2026.6.2 unsloth==2026.6.2

Pip may warn about dependency metadata mismatches if you later inspect the environment. The important part is that Torch, FlashAttention, Transformers, Unsloth, and CUDA import together successfully.

6. Smoke test

python - <<PY
import torch
import transformers
import flash_attn
import unsloth
import unsloth_zoo

print("torch", torch.__version__, "cuda", torch.version.cuda, "available", torch.cuda.is_available())
print("transformers", transformers.__version__)
print("flash_attn", flash_attn.__version__)
print("unsloth", unsloth.__version__)
print("unsloth_zoo", unsloth_zoo.__version__)
PY

Expected shape of a healthy DGX Spark setup:

torch 2.12.0+cu130 cuda 13.0 available True
transformers 5.5.0
flash_attn 2.8.3.post1
unsloth 2026.6.2
unsloth_zoo 2026.6.2

Optional one-shot installer

cat > install_dgx_spark_env.sh <<SH
#!/usr/bin/env bash
set -euo pipefail

python -m pip install --upgrade pip
python -m pip install \
  --extra-index-url https://download.pytorch.org/whl/cu130 \
  torch==2.12.0+cu130 triton==3.7.0 packaging==26.2 ninja==1.13.0 psutil==7.2.2
python -m pip install \
  https://huggingface.co/N8Programs/dgx-flash-attn-wheel/resolve/main/flash_attn-2.8.3.post1-cp312-cp312-linux_aarch64.whl
python -m pip install \
  transformers==5.5.0 accelerate==1.13.0 peft==0.19.1 trl==0.24.0 datasets==5.0.0 \
  bitsandbytes==0.49.2 cut-cross-entropy==25.1.1 sentencepiece==0.2.1 tokenizers==0.22.2 \
  safetensors==0.8.0 protobuf==7.35.0 numpy==2.4.6 pandas==3.0.3 Pillow==12.2.0 \
  PyYAML==6.0.3 hf_transfer==0.1.9 msgspec==0.21.1 tqdm==4.68.2 tyro==1.0.13 wandb==0.27.2
python -m pip install --no-deps unsloth_zoo==2026.6.2 unsloth==2026.6.2
SH
bash install_dgx_spark_env.sh

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support