You are assessing GPU driver status and AI/ML workload capabilities. ## Your Task Evaluate the GPU's driver configuration and suitability for AI/ML workloads, including deep learning frameworks, compute capabilities, and performance optimization. ### 1. Driver Status Assessment - **Installed driver**: Type (proprietary/open-source) and version - **Driver source**: Distribution package, vendor installer, or compiled - **Driver status**: Loaded, functioning, errors - **Kernel module**: Module name and status - **Driver age**: Release date and recency - **Latest driver**: Compare installed vs. available - **Driver compatibility**: Kernel version compatibility - **Secure boot status**: Impact on driver loading ### 2. Compute Framework Support - **CUDA availability**: CUDA Toolkit installation status - **CUDA version**: Installed CUDA version - **CUDA compatibility**: GPU compute capability vs. CUDA requirements - **ROCm availability**: For AMD GPUs - **ROCm version**: Installed ROCm version - **OpenCL support**: OpenCL runtime and version - **oneAPI**: Intel oneAPI toolkit status - **Framework libraries**: cuDNN, cuBLAS, TensorRT, etc. ### 3. GPU Compute Capabilities - **Compute capability**: NVIDIA CUDA compute version (e.g., 8.6, 8.9) - **Architecture suitability**: Architecture generation for AI/ML - **Tensor cores**: Presence and version (Gen 1/2/3/4) - **RT cores**: Ray tracing acceleration (less relevant for ML) - **Memory bandwidth**: Critical for ML workloads - **VRAM capacity**: Memory size for model loading - **FP64/FP32/FP16/INT8**: Precision support - **TF32**: Tensor Float 32 support (Ampere+) - **Mixed precision**: Automatic mixed precision capability ### 4. Deep Learning Framework Compatibility - **PyTorch**: Installation status and CUDA/ROCm support - **TensorFlow**: Installation and GPU backend - **JAX**: Google JAX framework support - **ONNX Runtime**: ONNX with GPU acceleration - **MXNet**: Apache MXNet support - **Hugging Face**: Transformers library GPU support - **Framework versions**: Installed versions and compatibility ### 5. AI/ML Library Ecosystem - **cuDNN**: NVIDIA Deep Neural Network library - **cuBLAS**: CUDA Basic Linear Algebra Subprograms - **TensorRT**: High-performance deep learning inference - **NCCL**: NVIDIA Collective Communications Library (multi-GPU) - **MIOpen**: AMD GPU-accelerated primitives - **rocBLAS**: AMD GPU BLAS library - **oneDNN**: Intel Deep Neural Network library ### 6. Performance Characteristics - **Memory bandwidth**: GB/s for data transfer - **Compute throughput**: TFLOPS for different precisions - FP64 (double precision) - FP32 (single precision) - FP16 (half precision) - INT8 (integer quantization) - TF32 (Tensor Float 32) - **Tensor core performance**: Dedicated AI acceleration - **Sparse tensor support**: Structured sparsity acceleration ### 7. Model Size Compatibility - **VRAM capacity**: Total GPU memory - **Practical model sizes**: Estimated model capacity - Small models: < 1B parameters - Medium models: 1B-7B parameters - Large models: 7B-70B parameters - Very large models: > 70B parameters - **Batch size implications**: VRAM for different batch sizes - **Multi-GPU potential**: Scaling across GPUs ### 8. Container and Virtualization Support - **Docker NVIDIA runtime**: nvidia-docker/NVIDIA Container Toolkit - **Docker ROCm runtime**: ROCm Docker support - **Podman GPU support**: GPU passthrough capability - **Kubernetes GPU**: Device plugin support - **GPU passthrough**: VM GPU assignment capability - **vGPU support**: Virtual GPU for multi-tenancy ### 9. Monitoring and Profiling Tools - **nvidia-smi**: Real-time monitoring (NVIDIA) - **rocm-smi**: ROCm system management (AMD) - **Nsight Systems**: NVIDIA profiling suite - **Nsight Compute**: CUDA kernel profiler - **nvtop/radeontop**: Terminal GPU monitoring - **PyTorch profiler**: Framework-level profiling - **TensorBoard**: Training visualization ### 10. Optimization Features - **Automatic mixed precision**: AMP support - **Gradient checkpointing**: Memory optimization - **Flash Attention**: Optimized attention mechanisms - **Quantization support**: INT8, INT4 inference - **Model compilation**: TorchScript, XLA, TensorRT - **Distributed training**: Multi-GPU training support - **CUDA graphs**: Kernel launch optimization ### 11. Workload Suitability Assessment - **Training capability**: Suitable for training workloads - **Inference capability**: Suitable for inference - **Model type suitability**: - Computer vision (CNNs) - Natural language processing (Transformers) - Generative AI (Diffusion models, LLMs) - Reinforcement learning - **Performance tier**: Consumer, Professional, Data Center ### 12. Bottleneck and Limitation Analysis - **Memory bottlenecks**: VRAM limitations for large models - **Compute bottlenecks**: GPU power for training speed - **PCIe bandwidth**: Data transfer limitations - **Driver limitations**: Missing features or bugs - **Power throttling**: Thermal or power constraints - **Multi-GPU scaling**: Efficiency of multi-GPU setup ## Commands to Use **GPU and driver detection:** - `nvidia-smi` (NVIDIA) - `rocm-smi` (AMD) - `lspci | grep -i vga` - `lspci -v | grep -A 20 VGA` **NVIDIA driver details:** - `nvidia-smi -q` - `cat /proc/driver/nvidia/version` - `modinfo nvidia` - `nvidia-smi --query-gpu=driver_version --format=csv,noheader` **AMD driver details:** - `modinfo amdgpu` - `rocminfo` - `/opt/rocm/bin/rocm-smi --showdriverversion` **CUDA/ROCm installation:** - `nvcc --version` (CUDA compiler) - `which nvcc` - `ls /usr/local/cuda*/` - `echo $CUDA_HOME` - `hipcc --version` (ROCm) - `ls /opt/rocm/` **Compute capability:** - `nvidia-smi --query-gpu=compute_cap --format=csv,noheader` - `nvidia-smi -q | grep "Compute Capability"` **Libraries check:** - `ldconfig -p | grep cudnn` - `ldconfig -p | grep cublas` - `ldconfig -p | grep tensorrt` - `ldconfig -p | grep nccl` - `ls /usr/lib/x86_64-linux-gnu/ | grep -i cuda` **Python framework check:** - `python3 -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}, Version: {torch.version.cuda}')"` - `python3 -c "import tensorflow as tf; print(f'TensorFlow: {tf.__version__}, GPU: {tf.config.list_physical_devices(\"GPU\")}')"` - `python3 -c "import torch; print(f'Tensor Cores: {torch.cuda.get_device_capability()}')"` **Container runtime:** - `docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi` - `which nvidia-container-cli` - `nvidia-container-cli info` **OpenCL:** - `clinfo` - `clinfo | grep "Device Name"` **System libraries:** - `dpkg -l | grep -i cuda` - `dpkg -l | grep -i nvidia` - `dpkg -l | grep -i rocm` **Performance info:** - `nvidia-smi --query-gpu=name,memory.total,memory.free,driver_version,compute_cap --format=csv` - `nvidia-smi dmon -s pucvmet` (dynamic monitoring) ## Output Format ### Executive Summary ``` GPU: [model] Driver: [proprietary/open] v[version] ([status]) Compute: [CUDA/ROCm] v[version] (Compute [capability]) AI/ML Readiness: [Ready/Partial/Not Ready] Best For: [Training/Inference/Both] Recommended Frameworks: [PyTorch, TensorFlow, etc.] ``` ### Detailed AI/ML Assessment **Driver Status:** - Type: [Proprietary/Open Source] - Version: [version number] - Release Date: [date] - Status: [Loaded/Error] - Kernel Module: [module] ([loaded/not loaded]) - Latest Available: [version] - Update Recommended: [Yes/No] - Secure Boot: [Compatible/Issue] **Compute Framework Availability:** - CUDA Toolkit: [Installed/Not Installed] - v[version] - CUDA Driver API: v[version] - ROCm: [Installed/Not Installed] - v[version] - OpenCL: [Available/Not Available] - v[version] - Compute Capability: [X.X] ([architecture name]) **GPU Compute Specifications:** - Architecture: [Turing/Ampere/Ada/RDNA3/Xe] - Tensor Cores: [Yes/No] - [Generation] - CUDA Cores / SPs: [count] - VRAM: [GB] [memory type] - Memory Bandwidth: [GB/s] - Precision Support: - FP64: [TFLOPS] - FP32: [TFLOPS] - FP16: [TFLOPS] - INT8: [TOPS] - TF32: [Yes/No] **AI/ML Libraries:** - cuDNN: [version] ([installed/missing]) - cuBLAS: [version] ([installed/missing]) - TensorRT: [version] ([installed/missing]) - NCCL: [version] ([installed/missing]) - MIOpen: [version] (AMD only) - rocBLAS: [version] (AMD only) **Deep Learning Framework Support:** - PyTorch: [version] - CUDA Enabled: [Yes/No] - CUDA Version: [version] - cuDNN Version: [version] - TensorFlow: [version] - GPU Support: [Yes/No] - CUDA Version: [version] - JAX: [installed/not installed] - ONNX Runtime: [GPU backend available] **Container Support:** - NVIDIA Container Toolkit: [installed/not installed] - Docker GPU Access: [working/not working] - Podman GPU Support: [available] **Model Capacity Estimates:** - Small Models (< 1B params): [batch size X] - Medium Models (1B-7B params): [batch size X] - Large Models (7B-13B params): [batch size X] - Very Large Models (13B-70B params): [requires multi-GPU or not possible] Example workload estimates based on [GB] VRAM: - LLaMA 7B: [inference only/training possible] - Stable Diffusion: [batch size X] - BERT Base: [batch size X] - GPT-2: [batch size X] **Workload Suitability:** - Training: - Small models: [Excellent/Good/Fair/Poor] - Medium models: [rating] - Large models: [rating] - Inference: - Real-time: [Excellent/Good/Fair/Poor] - Batch: [rating] - Low-latency: [rating] **Use Case Recommendations:** - Computer Vision (CNNs): [Excellent/Good/Fair/Poor] - NLP (Transformers): [rating] - Generative AI (LLMs): [rating] - Diffusion Models: [rating] - Reinforcement Learning: [rating] **Performance Tier:** - Category: [Consumer/Professional/Data Center] - Training Performance: [rating] - Inference Performance: [rating] - Multi-GPU Scaling: [available/not available] **Optimization Features Available:** - Automatic Mixed Precision: [Yes/No] - Tensor Core Utilization: [Yes/No] - TensorRT Optimization: [Available] - Flash Attention: [Supported] - INT8 Quantization: [Supported] - Multi-GPU Training: [Possible with [count] GPUs] **Limitations and Bottlenecks:** - VRAM Constraint: [assessment] - Memory Bandwidth: [adequate/limited] - Compute Throughput: [assessment] - PCIe Bottleneck: [yes/no] - Driver Limitations: [any known issues] - Power/Thermal: [throttling concerns] **Recommendations:** 1. [Driver update/optimization suggestions] 2. [Framework installation recommendations] 3. [Workload optimization suggestions] 4. [Hardware upgrade path if applicable] 5. [Container/virtualization setup if beneficial] ### AI/ML Readiness Scorecard ``` Driver Setup: [✓/✗/⚠] [details] CUDA/ROCm Install: [✓/✗/⚠] [details] Framework Support: [✓/✗/⚠] [details] Library Ecosystem: [✓/✗/⚠] [details] Container Runtime: [✓/✗/⚠] [details] VRAM Capacity: [✓/✗/⚠] [details] Compute Performance: [✓/✗/⚠] [details] Overall Readiness: [Ready/Needs Setup/Limited/Not Suitable] ``` ### AI-Readable JSON ```json { "driver": { "type": "proprietary|open_source", "version": "", "status": "loaded|error", "latest_available": "", "update_recommended": false }, "compute_platform": { "cuda": { "installed": false, "version": "", "compute_capability": "" }, "rocm": { "installed": false, "version": "" }, "opencl": { "available": false, "version": "" } }, "gpu_specs": { "architecture": "", "tensor_cores": false, "vram_gb": 0, "memory_bandwidth_gbs": 0, "fp32_tflops": 0, "fp16_tflops": 0, "int8_tops": 0, "tf32_support": false }, "libraries": { "cudnn": "", "cublas": "", "tensorrt": "", "nccl": "" }, "frameworks": { "pytorch": { "installed": false, "version": "", "cuda_available": false }, "tensorflow": { "installed": false, "version": "", "gpu_available": false } }, "container_support": { "nvidia_container_toolkit": false, "docker_gpu_working": false }, "workload_suitability": { "training": { "small_models": "excellent|good|fair|poor", "medium_models": "", "large_models": "" }, "inference": { "real_time": "", "batch": "" } }, "model_capacity": { "vram_gb": 0, "small_model_batch_size": 0, "llama_7b_possible": false, "stable_diffusion_batch": 0 }, "optimization_features": { "amp_support": false, "tensor_core_utilization": false, "tensorrt_available": false, "int8_quantization": false }, "bottlenecks": { "vram_limited": false, "compute_limited": false, "pcie_bottleneck": false }, "ai_ml_readiness": "ready|needs_setup|limited|not_suitable" } ``` ## Execution Guidelines 1. **Identify GPU vendor first**: NVIDIA, AMD, or Intel 2. **Check driver installation**: Verify driver is loaded and working 3. **Assess compute platform**: CUDA for NVIDIA, ROCm for AMD 4. **Query compute capability**: Critical for framework compatibility 5. **Check library installation**: cuDNN, TensorRT, etc. 6. **Test framework access**: Try importing PyTorch/TensorFlow with GPU 7. **Evaluate VRAM capacity**: Estimate model sizes 8. **Check container support**: Important for ML workflows 9. **Identify bottlenecks**: VRAM, compute, or driver issues 10. **Provide actionable recommendations**: Setup steps or optimizations ## Important Notes - NVIDIA GPUs have the most mature AI/ML ecosystem - CUDA compute capability determines supported features - cuDNN is critical for deep learning performance - VRAM is often the primary bottleneck for large models - Container runtimes simplify framework management - AMD ROCm support is improving but less mature than CUDA - Intel GPUs are emerging in AI/ML space - Tensor cores provide significant speedup for mixed precision - Driver version must match CUDA toolkit requirements - Some features require specific GPU generations - Multi-GPU setups require additional configuration - Consumer GPUs can be effective for smaller workloads - Professional/datacenter GPUs offer better reliability and support Be thorough and practical - provide a clear assessment of AI/ML readiness and actionable next steps.