YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
DGX Spark FlashAttention wheel
This repository hosts a DGX Spark compatible FlashAttention wheel used by the PII fine-tuning tutorial stack.
The wheel was taken from the known-good unsloth-bench environment on DGX Spark Alice and is intended for:
- NVIDIA DGX Spark / Linux
aarch64 - Python 3.12 / CPython
cp312 - CUDA 13 PyTorch stack
flash_attn==2.8.3.post1
Wheel URL:
https://huggingface.co/N8Programs/dgx-flash-attn-wheel/resolve/main/flash_attn-2.8.3.post1-cp312-cp312-linux_aarch64.whl
1. Create a clean conda environment
On the DGX Spark host:
conda create -n piiEnv python=3.12 pip -y
conda activate piiEnv
python -m pip install --upgrade pip
Use python -m pip from inside the conda environment. If pip install reports an externally-managed-environment error, the environment is probably missing its own Python/pip and is falling back to system Python.
2. Install the CUDA 13 PyTorch base
python -m pip install \
--extra-index-url https://download.pytorch.org/whl/cu130 \
torch==2.12.0+cu130 \
triton==3.7.0 \
packaging==26.2 \
ninja==1.13.0 \
psutil==7.2.2
3. Install the DGX Spark FlashAttention wheel
python -m pip install \
https://huggingface.co/N8Programs/dgx-flash-attn-wheel/resolve/main/flash_attn-2.8.3.post1-cp312-cp312-linux_aarch64.whl
This avoids rebuilding FlashAttention from source. On DGX Spark, source builds can be slow and easy to misconfigure because FlashAttention may try to compile multiple CUDA architectures unless carefully constrained.
4. Install the training stack
python -m pip install \
transformers==5.5.0 \
accelerate==1.13.0 \
peft==0.19.1 \
trl==0.24.0 \
datasets==5.0.0 \
bitsandbytes==0.49.2 \
cut-cross-entropy==25.1.1 \
sentencepiece==0.2.1 \
tokenizers==0.22.2 \
safetensors==0.8.0 \
protobuf==7.35.0 \
numpy==2.4.6 \
pandas==3.0.3 \
Pillow==12.2.0 \
PyYAML==6.0.3 \
hf_transfer==0.1.9 \
msgspec==0.21.1 \
tqdm==4.68.2 \
tyro==1.0.13 \
wandb==0.27.2
5. Install Unsloth without dependency resolution
The known-good DGX Spark stack uses torch==2.12.0+cu130, but the current Unsloth package metadata may declare an older Torch upper bound. Preserve the working CUDA stack by installing Unsloth without dependencies:
python -m pip install --no-deps unsloth_zoo==2026.6.2 unsloth==2026.6.2
Pip may warn about dependency metadata mismatches if you later inspect the environment. The important part is that Torch, FlashAttention, Transformers, Unsloth, and CUDA import together successfully.
6. Smoke test
python - <<PY
import torch
import transformers
import flash_attn
import unsloth
import unsloth_zoo
print("torch", torch.__version__, "cuda", torch.version.cuda, "available", torch.cuda.is_available())
print("transformers", transformers.__version__)
print("flash_attn", flash_attn.__version__)
print("unsloth", unsloth.__version__)
print("unsloth_zoo", unsloth_zoo.__version__)
PY
Expected shape of a healthy DGX Spark setup:
torch 2.12.0+cu130 cuda 13.0 available True
transformers 5.5.0
flash_attn 2.8.3.post1
unsloth 2026.6.2
unsloth_zoo 2026.6.2
Optional one-shot installer
cat > install_dgx_spark_env.sh <<SH
#!/usr/bin/env bash
set -euo pipefail
python -m pip install --upgrade pip
python -m pip install \
--extra-index-url https://download.pytorch.org/whl/cu130 \
torch==2.12.0+cu130 triton==3.7.0 packaging==26.2 ninja==1.13.0 psutil==7.2.2
python -m pip install \
https://huggingface.co/N8Programs/dgx-flash-attn-wheel/resolve/main/flash_attn-2.8.3.post1-cp312-cp312-linux_aarch64.whl
python -m pip install \
transformers==5.5.0 accelerate==1.13.0 peft==0.19.1 trl==0.24.0 datasets==5.0.0 \
bitsandbytes==0.49.2 cut-cross-entropy==25.1.1 sentencepiece==0.2.1 tokenizers==0.22.2 \
safetensors==0.8.0 protobuf==7.35.0 numpy==2.4.6 pandas==3.0.3 Pillow==12.2.0 \
PyYAML==6.0.3 hf_transfer==0.1.9 msgspec==0.21.1 tqdm==4.68.2 tyro==1.0.13 wandb==0.27.2
python -m pip install --no-deps unsloth_zoo==2026.6.2 unsloth==2026.6.2
SH
bash install_dgx_spark_env.sh