You are assessing GPU driver status and AI/ML workload capabilities.
Your Task
Evaluate the GPU's driver configuration and suitability for AI/ML workloads, including deep learning frameworks, compute capabilities, and performance optimization.
1. Driver Status Assessment
- Installed driver: Type (proprietary/open-source) and version
- Driver source: Distribution package, vendor installer, or compiled
- Driver status: Loaded, functioning, errors
- Kernel module: Module name and status
- Driver age: Release date and recency
- Latest driver: Compare installed vs. available
- Driver compatibility: Kernel version compatibility
- Secure boot status: Impact on driver loading
2. Compute Framework Support
- CUDA availability: CUDA Toolkit installation status
- CUDA version: Installed CUDA version
- CUDA compatibility: GPU compute capability vs. CUDA requirements
- ROCm availability: For AMD GPUs
- ROCm version: Installed ROCm version
- OpenCL support: OpenCL runtime and version
- oneAPI: Intel oneAPI toolkit status
- Framework libraries: cuDNN, cuBLAS, TensorRT, etc.
3. GPU Compute Capabilities
- Compute capability: NVIDIA CUDA compute version (e.g., 8.6, 8.9)
- Architecture suitability: Architecture generation for AI/ML
- Tensor cores: Presence and version (Gen 1/2/3/4)
- RT cores: Ray tracing acceleration (less relevant for ML)
- Memory bandwidth: Critical for ML workloads
- VRAM capacity: Memory size for model loading
- FP64/FP32/FP16/INT8: Precision support
- TF32: Tensor Float 32 support (Ampere+)
- Mixed precision: Automatic mixed precision capability
4. Deep Learning Framework Compatibility
- PyTorch: Installation status and CUDA/ROCm support
- TensorFlow: Installation and GPU backend
- JAX: Google JAX framework support
- ONNX Runtime: ONNX with GPU acceleration
- MXNet: Apache MXNet support
- Hugging Face: Transformers library GPU support
- Framework versions: Installed versions and compatibility
5. AI/ML Library Ecosystem
- cuDNN: NVIDIA Deep Neural Network library
- cuBLAS: CUDA Basic Linear Algebra Subprograms
- TensorRT: High-performance deep learning inference
- NCCL: NVIDIA Collective Communications Library (multi-GPU)
- MIOpen: AMD GPU-accelerated primitives
- rocBLAS: AMD GPU BLAS library
- oneDNN: Intel Deep Neural Network library
6. Performance Characteristics
- Memory bandwidth: GB/s for data transfer
- Compute throughput: TFLOPS for different precisions
- FP64 (double precision)
- FP32 (single precision)
- FP16 (half precision)
- INT8 (integer quantization)
- TF32 (Tensor Float 32)
- Tensor core performance: Dedicated AI acceleration
- Sparse tensor support: Structured sparsity acceleration
7. Model Size Compatibility
- VRAM capacity: Total GPU memory
- Practical model sizes: Estimated model capacity
- Small models: < 1B parameters
- Medium models: 1B-7B parameters
- Large models: 7B-70B parameters
- Very large models: > 70B parameters
- Batch size implications: VRAM for different batch sizes
- Multi-GPU potential: Scaling across GPUs
8. Container and Virtualization Support
- Docker NVIDIA runtime: nvidia-docker/NVIDIA Container Toolkit
- Docker ROCm runtime: ROCm Docker support
- Podman GPU support: GPU passthrough capability
- Kubernetes GPU: Device plugin support
- GPU passthrough: VM GPU assignment capability
- vGPU support: Virtual GPU for multi-tenancy
9. Monitoring and Profiling Tools
- nvidia-smi: Real-time monitoring (NVIDIA)
- rocm-smi: ROCm system management (AMD)
- Nsight Systems: NVIDIA profiling suite
- Nsight Compute: CUDA kernel profiler
- nvtop/radeontop: Terminal GPU monitoring
- PyTorch profiler: Framework-level profiling
- TensorBoard: Training visualization
10. Optimization Features
- Automatic mixed precision: AMP support
- Gradient checkpointing: Memory optimization
- Flash Attention: Optimized attention mechanisms
- Quantization support: INT8, INT4 inference
- Model compilation: TorchScript, XLA, TensorRT
- Distributed training: Multi-GPU training support
- CUDA graphs: Kernel launch optimization
11. Workload Suitability Assessment
- Training capability: Suitable for training workloads
- Inference capability: Suitable for inference
- Model type suitability:
- Computer vision (CNNs)
- Natural language processing (Transformers)
- Generative AI (Diffusion models, LLMs)
- Reinforcement learning
- Performance tier: Consumer, Professional, Data Center
12. Bottleneck and Limitation Analysis
- Memory bottlenecks: VRAM limitations for large models
- Compute bottlenecks: GPU power for training speed
- PCIe bandwidth: Data transfer limitations
- Driver limitations: Missing features or bugs
- Power throttling: Thermal or power constraints
- Multi-GPU scaling: Efficiency of multi-GPU setup
Commands to Use
GPU and driver detection:
nvidia-smi(NVIDIA)rocm-smi(AMD)lspci | grep -i vgalspci -v | grep -A 20 VGA
NVIDIA driver details:
nvidia-smi -qcat /proc/driver/nvidia/versionmodinfo nvidianvidia-smi --query-gpu=driver_version --format=csv,noheader
AMD driver details:
modinfo amdgpurocminfo/opt/rocm/bin/rocm-smi --showdriverversion
CUDA/ROCm installation:
nvcc --version(CUDA compiler)which nvccls /usr/local/cuda*/echo $CUDA_HOMEhipcc --version(ROCm)ls /opt/rocm/
Compute capability:
nvidia-smi --query-gpu=compute_cap --format=csv,noheadernvidia-smi -q | grep "Compute Capability"
Libraries check:
ldconfig -p | grep cudnnldconfig -p | grep cublasldconfig -p | grep tensorrtldconfig -p | grep ncclls /usr/lib/x86_64-linux-gnu/ | grep -i cuda
Python framework check:
python3 -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}, Version: {torch.version.cuda}')"python3 -c "import tensorflow as tf; print(f'TensorFlow: {tf.__version__}, GPU: {tf.config.list_physical_devices(\"GPU\")}')"python3 -c "import torch; print(f'Tensor Cores: {torch.cuda.get_device_capability()}')"
Container runtime:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smiwhich nvidia-container-clinvidia-container-cli info
OpenCL:
clinfoclinfo | grep "Device Name"
System libraries:
dpkg -l | grep -i cudadpkg -l | grep -i nvidiadpkg -l | grep -i rocm
Performance info:
nvidia-smi --query-gpu=name,memory.total,memory.free,driver_version,compute_cap --format=csvnvidia-smi dmon -s pucvmet(dynamic monitoring)
Output Format
Executive Summary
GPU: [model]
Driver: [proprietary/open] v[version] ([status])
Compute: [CUDA/ROCm] v[version] (Compute [capability])
AI/ML Readiness: [Ready/Partial/Not Ready]
Best For: [Training/Inference/Both]
Recommended Frameworks: [PyTorch, TensorFlow, etc.]
Detailed AI/ML Assessment
Driver Status:
- Type: [Proprietary/Open Source]
- Version: [version number]
- Release Date: [date]
- Status: [Loaded/Error]
- Kernel Module: [module] ([loaded/not loaded])
- Latest Available: [version]
- Update Recommended: [Yes/No]
- Secure Boot: [Compatible/Issue]
Compute Framework Availability:
- CUDA Toolkit: [Installed/Not Installed] - v[version]
- CUDA Driver API: v[version]
- ROCm: [Installed/Not Installed] - v[version]
- OpenCL: [Available/Not Available] - v[version]
- Compute Capability: [X.X] ([architecture name])
GPU Compute Specifications:
- Architecture: [Turing/Ampere/Ada/RDNA3/Xe]
- Tensor Cores: [Yes/No] - [Generation]
- CUDA Cores / SPs: [count]
- VRAM: [GB] [memory type]
- Memory Bandwidth: [GB/s]
- Precision Support:
- FP64: [TFLOPS]
- FP32: [TFLOPS]
- FP16: [TFLOPS]
- INT8: [TOPS]
- TF32: [Yes/No]
AI/ML Libraries:
- cuDNN: [version] ([installed/missing])
- cuBLAS: [version] ([installed/missing])
- TensorRT: [version] ([installed/missing])
- NCCL: [version] ([installed/missing])
- MIOpen: [version] (AMD only)
- rocBLAS: [version] (AMD only)
Deep Learning Framework Support:
- PyTorch: [version]
- CUDA Enabled: [Yes/No]
- CUDA Version: [version]
- cuDNN Version: [version]
- TensorFlow: [version]
- GPU Support: [Yes/No]
- CUDA Version: [version]
- JAX: [installed/not installed]
- ONNX Runtime: [GPU backend available]
Container Support:
- NVIDIA Container Toolkit: [installed/not installed]
- Docker GPU Access: [working/not working]
- Podman GPU Support: [available]
Model Capacity Estimates:
- Small Models (< 1B params): [batch size X]
- Medium Models (1B-7B params): [batch size X]
- Large Models (7B-13B params): [batch size X]
- Very Large Models (13B-70B params): [requires multi-GPU or not possible]
Example workload estimates based on [GB] VRAM:
- LLaMA 7B: [inference only/training possible]
- Stable Diffusion: [batch size X]
- BERT Base: [batch size X]
- GPT-2: [batch size X]
Workload Suitability:
- Training:
- Small models: [Excellent/Good/Fair/Poor]
- Medium models: [rating]
- Large models: [rating]
- Inference:
- Real-time: [Excellent/Good/Fair/Poor]
- Batch: [rating]
- Low-latency: [rating]
Use Case Recommendations:
- Computer Vision (CNNs): [Excellent/Good/Fair/Poor]
- NLP (Transformers): [rating]
- Generative AI (LLMs): [rating]
- Diffusion Models: [rating]
- Reinforcement Learning: [rating]
Performance Tier:
- Category: [Consumer/Professional/Data Center]
- Training Performance: [rating]
- Inference Performance: [rating]
- Multi-GPU Scaling: [available/not available]
Optimization Features Available:
- Automatic Mixed Precision: [Yes/No]
- Tensor Core Utilization: [Yes/No]
- TensorRT Optimization: [Available]
- Flash Attention: [Supported]
- INT8 Quantization: [Supported]
- Multi-GPU Training: [Possible with [count] GPUs]
Limitations and Bottlenecks:
- VRAM Constraint: [assessment]
- Memory Bandwidth: [adequate/limited]
- Compute Throughput: [assessment]
- PCIe Bottleneck: [yes/no]
- Driver Limitations: [any known issues]
- Power/Thermal: [throttling concerns]
Recommendations:
- [Driver update/optimization suggestions]
- [Framework installation recommendations]
- [Workload optimization suggestions]
- [Hardware upgrade path if applicable]
- [Container/virtualization setup if beneficial]
AI/ML Readiness Scorecard
Driver Setup: [β/β/β ] [details]
CUDA/ROCm Install: [β/β/β ] [details]
Framework Support: [β/β/β ] [details]
Library Ecosystem: [β/β/β ] [details]
Container Runtime: [β/β/β ] [details]
VRAM Capacity: [β/β/β ] [details]
Compute Performance: [β/β/β ] [details]
Overall Readiness: [Ready/Needs Setup/Limited/Not Suitable]
AI-Readable JSON
{
"driver": {
"type": "proprietary|open_source",
"version": "",
"status": "loaded|error",
"latest_available": "",
"update_recommended": false
},
"compute_platform": {
"cuda": {
"installed": false,
"version": "",
"compute_capability": ""
},
"rocm": {
"installed": false,
"version": ""
},
"opencl": {
"available": false,
"version": ""
}
},
"gpu_specs": {
"architecture": "",
"tensor_cores": false,
"vram_gb": 0,
"memory_bandwidth_gbs": 0,
"fp32_tflops": 0,
"fp16_tflops": 0,
"int8_tops": 0,
"tf32_support": false
},
"libraries": {
"cudnn": "",
"cublas": "",
"tensorrt": "",
"nccl": ""
},
"frameworks": {
"pytorch": {
"installed": false,
"version": "",
"cuda_available": false
},
"tensorflow": {
"installed": false,
"version": "",
"gpu_available": false
}
},
"container_support": {
"nvidia_container_toolkit": false,
"docker_gpu_working": false
},
"workload_suitability": {
"training": {
"small_models": "excellent|good|fair|poor",
"medium_models": "",
"large_models": ""
},
"inference": {
"real_time": "",
"batch": ""
}
},
"model_capacity": {
"vram_gb": 0,
"small_model_batch_size": 0,
"llama_7b_possible": false,
"stable_diffusion_batch": 0
},
"optimization_features": {
"amp_support": false,
"tensor_core_utilization": false,
"tensorrt_available": false,
"int8_quantization": false
},
"bottlenecks": {
"vram_limited": false,
"compute_limited": false,
"pcie_bottleneck": false
},
"ai_ml_readiness": "ready|needs_setup|limited|not_suitable"
}
Execution Guidelines
- Identify GPU vendor first: NVIDIA, AMD, or Intel
- Check driver installation: Verify driver is loaded and working
- Assess compute platform: CUDA for NVIDIA, ROCm for AMD
- Query compute capability: Critical for framework compatibility
- Check library installation: cuDNN, TensorRT, etc.
- Test framework access: Try importing PyTorch/TensorFlow with GPU
- Evaluate VRAM capacity: Estimate model sizes
- Check container support: Important for ML workflows
- Identify bottlenecks: VRAM, compute, or driver issues
- Provide actionable recommendations: Setup steps or optimizations
Important Notes
- NVIDIA GPUs have the most mature AI/ML ecosystem
- CUDA compute capability determines supported features
- cuDNN is critical for deep learning performance
- VRAM is often the primary bottleneck for large models
- Container runtimes simplify framework management
- AMD ROCm support is improving but less mature than CUDA
- Intel GPUs are emerging in AI/ML space
- Tensor cores provide significant speedup for mixed precision
- Driver version must match CUDA toolkit requirements
- Some features require specific GPU generations
- Multi-GPU setups require additional configuration
- Consumer GPUs can be effective for smaller workloads
- Professional/datacenter GPUs offer better reliability and support
Be thorough and practical - provide a clear assessment of AI/ML readiness and actionable next steps.