danielrosehill's picture
commit
279efce
You are assessing GPU driver status and AI/ML workload capabilities.
## Your Task
Evaluate the GPU's driver configuration and suitability for AI/ML workloads, including deep learning frameworks, compute capabilities, and performance optimization.
### 1. Driver Status Assessment
- **Installed driver**: Type (proprietary/open-source) and version
- **Driver source**: Distribution package, vendor installer, or compiled
- **Driver status**: Loaded, functioning, errors
- **Kernel module**: Module name and status
- **Driver age**: Release date and recency
- **Latest driver**: Compare installed vs. available
- **Driver compatibility**: Kernel version compatibility
- **Secure boot status**: Impact on driver loading
### 2. Compute Framework Support
- **CUDA availability**: CUDA Toolkit installation status
- **CUDA version**: Installed CUDA version
- **CUDA compatibility**: GPU compute capability vs. CUDA requirements
- **ROCm availability**: For AMD GPUs
- **ROCm version**: Installed ROCm version
- **OpenCL support**: OpenCL runtime and version
- **oneAPI**: Intel oneAPI toolkit status
- **Framework libraries**: cuDNN, cuBLAS, TensorRT, etc.
### 3. GPU Compute Capabilities
- **Compute capability**: NVIDIA CUDA compute version (e.g., 8.6, 8.9)
- **Architecture suitability**: Architecture generation for AI/ML
- **Tensor cores**: Presence and version (Gen 1/2/3/4)
- **RT cores**: Ray tracing acceleration (less relevant for ML)
- **Memory bandwidth**: Critical for ML workloads
- **VRAM capacity**: Memory size for model loading
- **FP64/FP32/FP16/INT8**: Precision support
- **TF32**: Tensor Float 32 support (Ampere+)
- **Mixed precision**: Automatic mixed precision capability
### 4. Deep Learning Framework Compatibility
- **PyTorch**: Installation status and CUDA/ROCm support
- **TensorFlow**: Installation and GPU backend
- **JAX**: Google JAX framework support
- **ONNX Runtime**: ONNX with GPU acceleration
- **MXNet**: Apache MXNet support
- **Hugging Face**: Transformers library GPU support
- **Framework versions**: Installed versions and compatibility
### 5. AI/ML Library Ecosystem
- **cuDNN**: NVIDIA Deep Neural Network library
- **cuBLAS**: CUDA Basic Linear Algebra Subprograms
- **TensorRT**: High-performance deep learning inference
- **NCCL**: NVIDIA Collective Communications Library (multi-GPU)
- **MIOpen**: AMD GPU-accelerated primitives
- **rocBLAS**: AMD GPU BLAS library
- **oneDNN**: Intel Deep Neural Network library
### 6. Performance Characteristics
- **Memory bandwidth**: GB/s for data transfer
- **Compute throughput**: TFLOPS for different precisions
- FP64 (double precision)
- FP32 (single precision)
- FP16 (half precision)
- INT8 (integer quantization)
- TF32 (Tensor Float 32)
- **Tensor core performance**: Dedicated AI acceleration
- **Sparse tensor support**: Structured sparsity acceleration
### 7. Model Size Compatibility
- **VRAM capacity**: Total GPU memory
- **Practical model sizes**: Estimated model capacity
- Small models: < 1B parameters
- Medium models: 1B-7B parameters
- Large models: 7B-70B parameters
- Very large models: > 70B parameters
- **Batch size implications**: VRAM for different batch sizes
- **Multi-GPU potential**: Scaling across GPUs
### 8. Container and Virtualization Support
- **Docker NVIDIA runtime**: nvidia-docker/NVIDIA Container Toolkit
- **Docker ROCm runtime**: ROCm Docker support
- **Podman GPU support**: GPU passthrough capability
- **Kubernetes GPU**: Device plugin support
- **GPU passthrough**: VM GPU assignment capability
- **vGPU support**: Virtual GPU for multi-tenancy
### 9. Monitoring and Profiling Tools
- **nvidia-smi**: Real-time monitoring (NVIDIA)
- **rocm-smi**: ROCm system management (AMD)
- **Nsight Systems**: NVIDIA profiling suite
- **Nsight Compute**: CUDA kernel profiler
- **nvtop/radeontop**: Terminal GPU monitoring
- **PyTorch profiler**: Framework-level profiling
- **TensorBoard**: Training visualization
### 10. Optimization Features
- **Automatic mixed precision**: AMP support
- **Gradient checkpointing**: Memory optimization
- **Flash Attention**: Optimized attention mechanisms
- **Quantization support**: INT8, INT4 inference
- **Model compilation**: TorchScript, XLA, TensorRT
- **Distributed training**: Multi-GPU training support
- **CUDA graphs**: Kernel launch optimization
### 11. Workload Suitability Assessment
- **Training capability**: Suitable for training workloads
- **Inference capability**: Suitable for inference
- **Model type suitability**:
- Computer vision (CNNs)
- Natural language processing (Transformers)
- Generative AI (Diffusion models, LLMs)
- Reinforcement learning
- **Performance tier**: Consumer, Professional, Data Center
### 12. Bottleneck and Limitation Analysis
- **Memory bottlenecks**: VRAM limitations for large models
- **Compute bottlenecks**: GPU power for training speed
- **PCIe bandwidth**: Data transfer limitations
- **Driver limitations**: Missing features or bugs
- **Power throttling**: Thermal or power constraints
- **Multi-GPU scaling**: Efficiency of multi-GPU setup
## Commands to Use
**GPU and driver detection:**
- `nvidia-smi` (NVIDIA)
- `rocm-smi` (AMD)
- `lspci | grep -i vga`
- `lspci -v | grep -A 20 VGA`
**NVIDIA driver details:**
- `nvidia-smi -q`
- `cat /proc/driver/nvidia/version`
- `modinfo nvidia`
- `nvidia-smi --query-gpu=driver_version --format=csv,noheader`
**AMD driver details:**
- `modinfo amdgpu`
- `rocminfo`
- `/opt/rocm/bin/rocm-smi --showdriverversion`
**CUDA/ROCm installation:**
- `nvcc --version` (CUDA compiler)
- `which nvcc`
- `ls /usr/local/cuda*/`
- `echo $CUDA_HOME`
- `hipcc --version` (ROCm)
- `ls /opt/rocm/`
**Compute capability:**
- `nvidia-smi --query-gpu=compute_cap --format=csv,noheader`
- `nvidia-smi -q | grep "Compute Capability"`
**Libraries check:**
- `ldconfig -p | grep cudnn`
- `ldconfig -p | grep cublas`
- `ldconfig -p | grep tensorrt`
- `ldconfig -p | grep nccl`
- `ls /usr/lib/x86_64-linux-gnu/ | grep -i cuda`
**Python framework check:**
- `python3 -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}, Version: {torch.version.cuda}')"`
- `python3 -c "import tensorflow as tf; print(f'TensorFlow: {tf.__version__}, GPU: {tf.config.list_physical_devices(\"GPU\")}')"`
- `python3 -c "import torch; print(f'Tensor Cores: {torch.cuda.get_device_capability()}')"`
**Container runtime:**
- `docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi`
- `which nvidia-container-cli`
- `nvidia-container-cli info`
**OpenCL:**
- `clinfo`
- `clinfo | grep "Device Name"`
**System libraries:**
- `dpkg -l | grep -i cuda`
- `dpkg -l | grep -i nvidia`
- `dpkg -l | grep -i rocm`
**Performance info:**
- `nvidia-smi --query-gpu=name,memory.total,memory.free,driver_version,compute_cap --format=csv`
- `nvidia-smi dmon -s pucvmet` (dynamic monitoring)
## Output Format
### Executive Summary
```
GPU: [model]
Driver: [proprietary/open] v[version] ([status])
Compute: [CUDA/ROCm] v[version] (Compute [capability])
AI/ML Readiness: [Ready/Partial/Not Ready]
Best For: [Training/Inference/Both]
Recommended Frameworks: [PyTorch, TensorFlow, etc.]
```
### Detailed AI/ML Assessment
**Driver Status:**
- Type: [Proprietary/Open Source]
- Version: [version number]
- Release Date: [date]
- Status: [Loaded/Error]
- Kernel Module: [module] ([loaded/not loaded])
- Latest Available: [version]
- Update Recommended: [Yes/No]
- Secure Boot: [Compatible/Issue]
**Compute Framework Availability:**
- CUDA Toolkit: [Installed/Not Installed] - v[version]
- CUDA Driver API: v[version]
- ROCm: [Installed/Not Installed] - v[version]
- OpenCL: [Available/Not Available] - v[version]
- Compute Capability: [X.X] ([architecture name])
**GPU Compute Specifications:**
- Architecture: [Turing/Ampere/Ada/RDNA3/Xe]
- Tensor Cores: [Yes/No] - [Generation]
- CUDA Cores / SPs: [count]
- VRAM: [GB] [memory type]
- Memory Bandwidth: [GB/s]
- Precision Support:
- FP64: [TFLOPS]
- FP32: [TFLOPS]
- FP16: [TFLOPS]
- INT8: [TOPS]
- TF32: [Yes/No]
**AI/ML Libraries:**
- cuDNN: [version] ([installed/missing])
- cuBLAS: [version] ([installed/missing])
- TensorRT: [version] ([installed/missing])
- NCCL: [version] ([installed/missing])
- MIOpen: [version] (AMD only)
- rocBLAS: [version] (AMD only)
**Deep Learning Framework Support:**
- PyTorch: [version]
- CUDA Enabled: [Yes/No]
- CUDA Version: [version]
- cuDNN Version: [version]
- TensorFlow: [version]
- GPU Support: [Yes/No]
- CUDA Version: [version]
- JAX: [installed/not installed]
- ONNX Runtime: [GPU backend available]
**Container Support:**
- NVIDIA Container Toolkit: [installed/not installed]
- Docker GPU Access: [working/not working]
- Podman GPU Support: [available]
**Model Capacity Estimates:**
- Small Models (< 1B params): [batch size X]
- Medium Models (1B-7B params): [batch size X]
- Large Models (7B-13B params): [batch size X]
- Very Large Models (13B-70B params): [requires multi-GPU or not possible]
Example workload estimates based on [GB] VRAM:
- LLaMA 7B: [inference only/training possible]
- Stable Diffusion: [batch size X]
- BERT Base: [batch size X]
- GPT-2: [batch size X]
**Workload Suitability:**
- Training:
- Small models: [Excellent/Good/Fair/Poor]
- Medium models: [rating]
- Large models: [rating]
- Inference:
- Real-time: [Excellent/Good/Fair/Poor]
- Batch: [rating]
- Low-latency: [rating]
**Use Case Recommendations:**
- Computer Vision (CNNs): [Excellent/Good/Fair/Poor]
- NLP (Transformers): [rating]
- Generative AI (LLMs): [rating]
- Diffusion Models: [rating]
- Reinforcement Learning: [rating]
**Performance Tier:**
- Category: [Consumer/Professional/Data Center]
- Training Performance: [rating]
- Inference Performance: [rating]
- Multi-GPU Scaling: [available/not available]
**Optimization Features Available:**
- Automatic Mixed Precision: [Yes/No]
- Tensor Core Utilization: [Yes/No]
- TensorRT Optimization: [Available]
- Flash Attention: [Supported]
- INT8 Quantization: [Supported]
- Multi-GPU Training: [Possible with [count] GPUs]
**Limitations and Bottlenecks:**
- VRAM Constraint: [assessment]
- Memory Bandwidth: [adequate/limited]
- Compute Throughput: [assessment]
- PCIe Bottleneck: [yes/no]
- Driver Limitations: [any known issues]
- Power/Thermal: [throttling concerns]
**Recommendations:**
1. [Driver update/optimization suggestions]
2. [Framework installation recommendations]
3. [Workload optimization suggestions]
4. [Hardware upgrade path if applicable]
5. [Container/virtualization setup if beneficial]
### AI/ML Readiness Scorecard
```
Driver Setup: [✓/✗/⚠] [details]
CUDA/ROCm Install: [✓/✗/⚠] [details]
Framework Support: [✓/✗/⚠] [details]
Library Ecosystem: [✓/✗/⚠] [details]
Container Runtime: [✓/✗/⚠] [details]
VRAM Capacity: [✓/✗/⚠] [details]
Compute Performance: [✓/✗/⚠] [details]
Overall Readiness: [Ready/Needs Setup/Limited/Not Suitable]
```
### AI-Readable JSON
```json
{
"driver": {
"type": "proprietary|open_source",
"version": "",
"status": "loaded|error",
"latest_available": "",
"update_recommended": false
},
"compute_platform": {
"cuda": {
"installed": false,
"version": "",
"compute_capability": ""
},
"rocm": {
"installed": false,
"version": ""
},
"opencl": {
"available": false,
"version": ""
}
},
"gpu_specs": {
"architecture": "",
"tensor_cores": false,
"vram_gb": 0,
"memory_bandwidth_gbs": 0,
"fp32_tflops": 0,
"fp16_tflops": 0,
"int8_tops": 0,
"tf32_support": false
},
"libraries": {
"cudnn": "",
"cublas": "",
"tensorrt": "",
"nccl": ""
},
"frameworks": {
"pytorch": {
"installed": false,
"version": "",
"cuda_available": false
},
"tensorflow": {
"installed": false,
"version": "",
"gpu_available": false
}
},
"container_support": {
"nvidia_container_toolkit": false,
"docker_gpu_working": false
},
"workload_suitability": {
"training": {
"small_models": "excellent|good|fair|poor",
"medium_models": "",
"large_models": ""
},
"inference": {
"real_time": "",
"batch": ""
}
},
"model_capacity": {
"vram_gb": 0,
"small_model_batch_size": 0,
"llama_7b_possible": false,
"stable_diffusion_batch": 0
},
"optimization_features": {
"amp_support": false,
"tensor_core_utilization": false,
"tensorrt_available": false,
"int8_quantization": false
},
"bottlenecks": {
"vram_limited": false,
"compute_limited": false,
"pcie_bottleneck": false
},
"ai_ml_readiness": "ready|needs_setup|limited|not_suitable"
}
```
## Execution Guidelines
1. **Identify GPU vendor first**: NVIDIA, AMD, or Intel
2. **Check driver installation**: Verify driver is loaded and working
3. **Assess compute platform**: CUDA for NVIDIA, ROCm for AMD
4. **Query compute capability**: Critical for framework compatibility
5. **Check library installation**: cuDNN, TensorRT, etc.
6. **Test framework access**: Try importing PyTorch/TensorFlow with GPU
7. **Evaluate VRAM capacity**: Estimate model sizes
8. **Check container support**: Important for ML workflows
9. **Identify bottlenecks**: VRAM, compute, or driver issues
10. **Provide actionable recommendations**: Setup steps or optimizations
## Important Notes
- NVIDIA GPUs have the most mature AI/ML ecosystem
- CUDA compute capability determines supported features
- cuDNN is critical for deep learning performance
- VRAM is often the primary bottleneck for large models
- Container runtimes simplify framework management
- AMD ROCm support is improving but less mature than CUDA
- Intel GPUs are emerging in AI/ML space
- Tensor cores provide significant speedup for mixed precision
- Driver version must match CUDA toolkit requirements
- Some features require specific GPU generations
- Multi-GPU setups require additional configuration
- Consumer GPUs can be effective for smaller workloads
- Professional/datacenter GPUs offer better reliability and support
Be thorough and practical - provide a clear assessment of AI/ML readiness and actionable next steps.