Spaces:

danielrosehill
/

Claude-Code-Slash-Commands-Linux-Desktop

Running

App Files Files Community

Claude-Code-Slash-Commands-Linux-Desktop / commands /ai-tools /gpu-ai-ml-assessment.md

danielrosehill

commit

279efce about 1 month ago

preview code

raw

history blame contribute delete

14.3 kB

You are assessing GPU driver status and AI/ML workload capabilities.

Your Task

Evaluate the GPU's driver configuration and suitability for AI/ML workloads, including deep learning frameworks, compute capabilities, and performance optimization.

1. Driver Status Assessment

Installed driver: Type (proprietary/open-source) and version
Driver source: Distribution package, vendor installer, or compiled
Driver status: Loaded, functioning, errors
Kernel module: Module name and status
Driver age: Release date and recency
Latest driver: Compare installed vs. available
Driver compatibility: Kernel version compatibility
Secure boot status: Impact on driver loading

2. Compute Framework Support

CUDA availability: CUDA Toolkit installation status
CUDA version: Installed CUDA version
CUDA compatibility: GPU compute capability vs. CUDA requirements
ROCm availability: For AMD GPUs
ROCm version: Installed ROCm version
OpenCL support: OpenCL runtime and version
oneAPI: Intel oneAPI toolkit status
Framework libraries: cuDNN, cuBLAS, TensorRT, etc.

3. GPU Compute Capabilities

Compute capability: NVIDIA CUDA compute version (e.g., 8.6, 8.9)
Architecture suitability: Architecture generation for AI/ML
Tensor cores: Presence and version (Gen 1/2/3/4)
RT cores: Ray tracing acceleration (less relevant for ML)
Memory bandwidth: Critical for ML workloads
VRAM capacity: Memory size for model loading
FP64/FP32/FP16/INT8: Precision support
TF32: Tensor Float 32 support (Ampere+)
Mixed precision: Automatic mixed precision capability

4. Deep Learning Framework Compatibility

PyTorch: Installation status and CUDA/ROCm support
TensorFlow: Installation and GPU backend
JAX: Google JAX framework support
ONNX Runtime: ONNX with GPU acceleration
MXNet: Apache MXNet support
Hugging Face: Transformers library GPU support
Framework versions: Installed versions and compatibility

5. AI/ML Library Ecosystem

cuDNN: NVIDIA Deep Neural Network library
cuBLAS: CUDA Basic Linear Algebra Subprograms
TensorRT: High-performance deep learning inference
NCCL: NVIDIA Collective Communications Library (multi-GPU)
MIOpen: AMD GPU-accelerated primitives
rocBLAS: AMD GPU BLAS library
oneDNN: Intel Deep Neural Network library

6. Performance Characteristics

Memory bandwidth: GB/s for data transfer
Compute throughput: TFLOPS for different precisions
- FP64 (double precision)
- FP32 (single precision)
- FP16 (half precision)
- INT8 (integer quantization)
- TF32 (Tensor Float 32)
Tensor core performance: Dedicated AI acceleration
Sparse tensor support: Structured sparsity acceleration

7. Model Size Compatibility

VRAM capacity: Total GPU memory
Practical model sizes: Estimated model capacity
- Small models: < 1B parameters
- Medium models: 1B-7B parameters
- Large models: 7B-70B parameters
- Very large models: > 70B parameters
Batch size implications: VRAM for different batch sizes
Multi-GPU potential: Scaling across GPUs

8. Container and Virtualization Support

Docker NVIDIA runtime: nvidia-docker/NVIDIA Container Toolkit
Docker ROCm runtime: ROCm Docker support
Podman GPU support: GPU passthrough capability
Kubernetes GPU: Device plugin support
GPU passthrough: VM GPU assignment capability
vGPU support: Virtual GPU for multi-tenancy

9. Monitoring and Profiling Tools

nvidia-smi: Real-time monitoring (NVIDIA)
rocm-smi: ROCm system management (AMD)
Nsight Systems: NVIDIA profiling suite
Nsight Compute: CUDA kernel profiler
nvtop/radeontop: Terminal GPU monitoring
PyTorch profiler: Framework-level profiling
TensorBoard: Training visualization

10. Optimization Features

Automatic mixed precision: AMP support
Gradient checkpointing: Memory optimization
Flash Attention: Optimized attention mechanisms
Quantization support: INT8, INT4 inference
Model compilation: TorchScript, XLA, TensorRT
Distributed training: Multi-GPU training support
CUDA graphs: Kernel launch optimization

11. Workload Suitability Assessment

Training capability: Suitable for training workloads
Inference capability: Suitable for inference
Model type suitability:
- Computer vision (CNNs)
- Natural language processing (Transformers)
- Generative AI (Diffusion models, LLMs)
- Reinforcement learning
Performance tier: Consumer, Professional, Data Center

12. Bottleneck and Limitation Analysis

Memory bottlenecks: VRAM limitations for large models
Compute bottlenecks: GPU power for training speed
PCIe bandwidth: Data transfer limitations
Driver limitations: Missing features or bugs
Power throttling: Thermal or power constraints
Multi-GPU scaling: Efficiency of multi-GPU setup

Commands to Use

GPU and driver detection:

nvidia-smi (NVIDIA)
rocm-smi (AMD)
lspci | grep -i vga
lspci -v | grep -A 20 VGA

NVIDIA driver details:

nvidia-smi -q
cat /proc/driver/nvidia/version
modinfo nvidia
nvidia-smi --query-gpu=driver_version --format=csv,noheader

AMD driver details:

modinfo amdgpu
rocminfo
/opt/rocm/bin/rocm-smi --showdriverversion

CUDA/ROCm installation:

nvcc --version (CUDA compiler)
which nvcc
ls /usr/local/cuda*/
echo $CUDA_HOME
hipcc --version (ROCm)
ls /opt/rocm/

Compute capability:

nvidia-smi --query-gpu=compute_cap --format=csv,noheader
nvidia-smi -q | grep "Compute Capability"

Libraries check:

ldconfig -p | grep cudnn
ldconfig -p | grep cublas
ldconfig -p | grep tensorrt
ldconfig -p | grep nccl
ls /usr/lib/x86_64-linux-gnu/ | grep -i cuda

Python framework check:

python3 -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}, Version: {torch.version.cuda}')"
python3 -c "import tensorflow as tf; print(f'TensorFlow: {tf.__version__}, GPU: {tf.config.list_physical_devices(\"GPU\")}')"
python3 -c "import torch; print(f'Tensor Cores: {torch.cuda.get_device_capability()}')"

Container runtime:

docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
which nvidia-container-cli
nvidia-container-cli info

OpenCL:

clinfo
clinfo | grep "Device Name"

System libraries:

dpkg -l | grep -i cuda
dpkg -l | grep -i nvidia
dpkg -l | grep -i rocm

Performance info:

nvidia-smi --query-gpu=name,memory.total,memory.free,driver_version,compute_cap --format=csv
nvidia-smi dmon -s pucvmet (dynamic monitoring)

Output Format

Executive Summary

GPU: [model]
Driver: [proprietary/open] v[version] ([status])
Compute: [CUDA/ROCm] v[version] (Compute [capability])
AI/ML Readiness: [Ready/Partial/Not Ready]
Best For: [Training/Inference/Both]
Recommended Frameworks: [PyTorch, TensorFlow, etc.]

Detailed AI/ML Assessment

Driver Status:

Type: [Proprietary/Open Source]
Version: [version number]
Release Date: [date]
Status: [Loaded/Error]
Kernel Module: [module] ([loaded/not loaded])
Latest Available: [version]
Update Recommended: [Yes/No]
Secure Boot: [Compatible/Issue]

Compute Framework Availability:

CUDA Toolkit: [Installed/Not Installed] - v[version]
CUDA Driver API: v[version]
ROCm: [Installed/Not Installed] - v[version]
OpenCL: [Available/Not Available] - v[version]
Compute Capability: [X.X] ([architecture name])

GPU Compute Specifications:

Architecture: [Turing/Ampere/Ada/RDNA3/Xe]
Tensor Cores: [Yes/No] - [Generation]
CUDA Cores / SPs: [count]
VRAM: [GB] [memory type]
Memory Bandwidth: [GB/s]
Precision Support:
- FP64: [TFLOPS]
- FP32: [TFLOPS]
- FP16: [TFLOPS]
- INT8: [TOPS]
- TF32: [Yes/No]

AI/ML Libraries:

cuDNN: [version] ([installed/missing])
cuBLAS: [version] ([installed/missing])
TensorRT: [version] ([installed/missing])
NCCL: [version] ([installed/missing])
MIOpen: [version] (AMD only)
rocBLAS: [version] (AMD only)

Deep Learning Framework Support:

PyTorch: [version]
- CUDA Enabled: [Yes/No]
- CUDA Version: [version]
- cuDNN Version: [version]
TensorFlow: [version]
- GPU Support: [Yes/No]
- CUDA Version: [version]
JAX: [installed/not installed]
ONNX Runtime: [GPU backend available]

Container Support:

NVIDIA Container Toolkit: [installed/not installed]
Docker GPU Access: [working/not working]
Podman GPU Support: [available]

Model Capacity Estimates:

Small Models (< 1B params): [batch size X]
Medium Models (1B-7B params): [batch size X]
Large Models (7B-13B params): [batch size X]
Very Large Models (13B-70B params): [requires multi-GPU or not possible]

Example workload estimates based on [GB] VRAM:

LLaMA 7B: [inference only/training possible]
Stable Diffusion: [batch size X]
BERT Base: [batch size X]
GPT-2: [batch size X]

Workload Suitability:

Training:
- Small models: [Excellent/Good/Fair/Poor]
- Medium models: [rating]
- Large models: [rating]
Inference:
- Real-time: [Excellent/Good/Fair/Poor]
- Batch: [rating]
- Low-latency: [rating]

Use Case Recommendations:

Computer Vision (CNNs): [Excellent/Good/Fair/Poor]
NLP (Transformers): [rating]
Generative AI (LLMs): [rating]
Diffusion Models: [rating]
Reinforcement Learning: [rating]

Performance Tier:

Category: [Consumer/Professional/Data Center]
Training Performance: [rating]
Inference Performance: [rating]
Multi-GPU Scaling: [available/not available]

Optimization Features Available:

Automatic Mixed Precision: [Yes/No]
Tensor Core Utilization: [Yes/No]
TensorRT Optimization: [Available]
Flash Attention: [Supported]
INT8 Quantization: [Supported]
Multi-GPU Training: [Possible with [count] GPUs]

Limitations and Bottlenecks:

VRAM Constraint: [assessment]
Memory Bandwidth: [adequate/limited]
Compute Throughput: [assessment]
PCIe Bottleneck: [yes/no]
Driver Limitations: [any known issues]
Power/Thermal: [throttling concerns]

Recommendations:

[Driver update/optimization suggestions]
[Framework installation recommendations]
[Workload optimization suggestions]
[Hardware upgrade path if applicable]
[Container/virtualization setup if beneficial]

AI/ML Readiness Scorecard

Driver Setup:        [✓/✗/⚠] [details]
CUDA/ROCm Install:   [✓/✗/⚠] [details]
Framework Support:   [✓/✗/⚠] [details]
Library Ecosystem:   [✓/✗/⚠] [details]
Container Runtime:   [✓/✗/⚠] [details]
VRAM Capacity:       [✓/✗/⚠] [details]
Compute Performance: [✓/✗/⚠] [details]

Overall Readiness: [Ready/Needs Setup/Limited/Not Suitable]

AI-Readable JSON

{
  "driver": {
    "type": "proprietary|open_source",
    "version": "",
    "status": "loaded|error",
    "latest_available": "",
    "update_recommended": false
  },
  "compute_platform": {
    "cuda": {
      "installed": false,
      "version": "",
      "compute_capability": ""
    },
    "rocm": {
      "installed": false,
      "version": ""
    },
    "opencl": {
      "available": false,
      "version": ""
    }
  },
  "gpu_specs": {
    "architecture": "",
    "tensor_cores": false,
    "vram_gb": 0,
    "memory_bandwidth_gbs": 0,
    "fp32_tflops": 0,
    "fp16_tflops": 0,
    "int8_tops": 0,
    "tf32_support": false
  },
  "libraries": {
    "cudnn": "",
    "cublas": "",
    "tensorrt": "",
    "nccl": ""
  },
  "frameworks": {
    "pytorch": {
      "installed": false,
      "version": "",
      "cuda_available": false
    },
    "tensorflow": {
      "installed": false,
      "version": "",
      "gpu_available": false
    }
  },
  "container_support": {
    "nvidia_container_toolkit": false,
    "docker_gpu_working": false
  },
  "workload_suitability": {
    "training": {
      "small_models": "excellent|good|fair|poor",
      "medium_models": "",
      "large_models": ""
    },
    "inference": {
      "real_time": "",
      "batch": ""
    }
  },
  "model_capacity": {
    "vram_gb": 0,
    "small_model_batch_size": 0,
    "llama_7b_possible": false,
    "stable_diffusion_batch": 0
  },
  "optimization_features": {
    "amp_support": false,
    "tensor_core_utilization": false,
    "tensorrt_available": false,
    "int8_quantization": false
  },
  "bottlenecks": {
    "vram_limited": false,
    "compute_limited": false,
    "pcie_bottleneck": false
  },
  "ai_ml_readiness": "ready|needs_setup|limited|not_suitable"
}

Execution Guidelines

Identify GPU vendor first: NVIDIA, AMD, or Intel
Check driver installation: Verify driver is loaded and working
Assess compute platform: CUDA for NVIDIA, ROCm for AMD
Query compute capability: Critical for framework compatibility
Check library installation: cuDNN, TensorRT, etc.
Test framework access: Try importing PyTorch/TensorFlow with GPU
Evaluate VRAM capacity: Estimate model sizes
Check container support: Important for ML workflows
Identify bottlenecks: VRAM, compute, or driver issues
Provide actionable recommendations: Setup steps or optimizations

Important Notes

NVIDIA GPUs have the most mature AI/ML ecosystem
CUDA compute capability determines supported features
cuDNN is critical for deep learning performance
VRAM is often the primary bottleneck for large models
Container runtimes simplify framework management
AMD ROCm support is improving but less mature than CUDA
Intel GPUs are emerging in AI/ML space
Tensor cores provide significant speedup for mixed precision
Driver version must match CUDA toolkit requirements
Some features require specific GPU generations
Multi-GPU setups require additional configuration
Consumer GPUs can be effective for smaller workloads
Professional/datacenter GPUs offer better reliability and support

Be thorough and practical - provide a clear assessment of AI/ML readiness and actionable next steps.