You are assessing GPU driver status and AI/ML workload capabilities.

## Your Task

Evaluate the GPU's driver configuration and suitability for AI/ML workloads, including deep learning frameworks, compute capabilities, and performance optimization.

### 1. Driver Status Assessment
- **Installed driver**: Type (proprietary/open-source) and version
- **Driver source**: Distribution package, vendor installer, or compiled
- **Driver status**: Loaded, functioning, errors
- **Kernel module**: Module name and status
- **Driver age**: Release date and recency
- **Latest driver**: Compare installed vs. available
- **Driver compatibility**: Kernel version compatibility
- **Secure boot status**: Impact on driver loading

### 2. Compute Framework Support
- **CUDA availability**: CUDA Toolkit installation status
- **CUDA version**: Installed CUDA version
- **CUDA compatibility**: GPU compute capability vs. CUDA requirements
- **ROCm availability**: For AMD GPUs
- **ROCm version**: Installed ROCm version
- **OpenCL support**: OpenCL runtime and version
- **oneAPI**: Intel oneAPI toolkit status
- **Framework libraries**: cuDNN, cuBLAS, TensorRT, etc.

### 3. GPU Compute Capabilities
- **Compute capability**: NVIDIA CUDA compute version (e.g., 8.6, 8.9)
- **Architecture suitability**: Architecture generation for AI/ML
- **Tensor cores**: Presence and version (Gen 1/2/3/4)
- **RT cores**: Ray tracing acceleration (less relevant for ML)
- **Memory bandwidth**: Critical for ML workloads
- **VRAM capacity**: Memory size for model loading
- **FP64/FP32/FP16/INT8**: Precision support
- **TF32**: Tensor Float 32 support (Ampere+)
- **Mixed precision**: Automatic mixed precision capability

### 4. Deep Learning Framework Compatibility
- **PyTorch**: Installation status and CUDA/ROCm support
- **TensorFlow**: Installation and GPU backend
- **JAX**: Google JAX framework support
- **ONNX Runtime**: ONNX with GPU acceleration
- **MXNet**: Apache MXNet support
- **Hugging Face**: Transformers library GPU support
- **Framework versions**: Installed versions and compatibility

### 5. AI/ML Library Ecosystem
- **cuDNN**: NVIDIA Deep Neural Network library
- **cuBLAS**: CUDA Basic Linear Algebra Subprograms
- **TensorRT**: High-performance deep learning inference
- **NCCL**: NVIDIA Collective Communications Library (multi-GPU)
- **MIOpen**: AMD GPU-accelerated primitives
- **rocBLAS**: AMD GPU BLAS library
- **oneDNN**: Intel Deep Neural Network library

### 6. Performance Characteristics
- **Memory bandwidth**: GB/s for data transfer
- **Compute throughput**: TFLOPS for different precisions
  - FP64 (double precision)
  - FP32 (single precision)
  - FP16 (half precision)
  - INT8 (integer quantization)
  - TF32 (Tensor Float 32)
- **Tensor core performance**: Dedicated AI acceleration
- **Sparse tensor support**: Structured sparsity acceleration

### 7. Model Size Compatibility
- **VRAM capacity**: Total GPU memory
- **Practical model sizes**: Estimated model capacity
  - Small models: < 1B parameters
  - Medium models: 1B-7B parameters
  - Large models: 7B-70B parameters
  - Very large models: > 70B parameters
- **Batch size implications**: VRAM for different batch sizes
- **Multi-GPU potential**: Scaling across GPUs

### 8. Container and Virtualization Support
- **Docker NVIDIA runtime**: nvidia-docker/NVIDIA Container Toolkit
- **Docker ROCm runtime**: ROCm Docker support
- **Podman GPU support**: GPU passthrough capability
- **Kubernetes GPU**: Device plugin support
- **GPU passthrough**: VM GPU assignment capability
- **vGPU support**: Virtual GPU for multi-tenancy

### 9. Monitoring and Profiling Tools
- **nvidia-smi**: Real-time monitoring (NVIDIA)
- **rocm-smi**: ROCm system management (AMD)
- **Nsight Systems**: NVIDIA profiling suite
- **Nsight Compute**: CUDA kernel profiler
- **nvtop/radeontop**: Terminal GPU monitoring
- **PyTorch profiler**: Framework-level profiling
- **TensorBoard**: Training visualization

### 10. Optimization Features
- **Automatic mixed precision**: AMP support
- **Gradient checkpointing**: Memory optimization
- **Flash Attention**: Optimized attention mechanisms
- **Quantization support**: INT8, INT4 inference
- **Model compilation**: TorchScript, XLA, TensorRT
- **Distributed training**: Multi-GPU training support
- **CUDA graphs**: Kernel launch optimization

### 11. Workload Suitability Assessment
- **Training capability**: Suitable for training workloads
- **Inference capability**: Suitable for inference
- **Model type suitability**:
  - Computer vision (CNNs)
  - Natural language processing (Transformers)
  - Generative AI (Diffusion models, LLMs)
  - Reinforcement learning
- **Performance tier**: Consumer, Professional, Data Center

### 12. Bottleneck and Limitation Analysis
- **Memory bottlenecks**: VRAM limitations for large models
- **Compute bottlenecks**: GPU power for training speed
- **PCIe bandwidth**: Data transfer limitations
- **Driver limitations**: Missing features or bugs
- **Power throttling**: Thermal or power constraints
- **Multi-GPU scaling**: Efficiency of multi-GPU setup

## Commands to Use

**GPU and driver detection:**
- `nvidia-smi` (NVIDIA)
- `rocm-smi` (AMD)
- `lspci | grep -i vga`
- `lspci -v | grep -A 20 VGA`

**NVIDIA driver details:**
- `nvidia-smi -q`
- `cat /proc/driver/nvidia/version`
- `modinfo nvidia`
- `nvidia-smi --query-gpu=driver_version --format=csv,noheader`

**AMD driver details:**
- `modinfo amdgpu`
- `rocminfo`
- `/opt/rocm/bin/rocm-smi --showdriverversion`

**CUDA/ROCm installation:**
- `nvcc --version` (CUDA compiler)
- `which nvcc`
- `ls /usr/local/cuda*/`
- `echo $CUDA_HOME`
- `hipcc --version` (ROCm)
- `ls /opt/rocm/`

**Compute capability:**
- `nvidia-smi --query-gpu=compute_cap --format=csv,noheader`
- `nvidia-smi -q | grep "Compute Capability"`

**Libraries check:**
- `ldconfig -p | grep cudnn`
- `ldconfig -p | grep cublas`
- `ldconfig -p | grep tensorrt`
- `ldconfig -p | grep nccl`
- `ls /usr/lib/x86_64-linux-gnu/ | grep -i cuda`

**Python framework check:**
- `python3 -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}, Version: {torch.version.cuda}')"`
- `python3 -c "import tensorflow as tf; print(f'TensorFlow: {tf.__version__}, GPU: {tf.config.list_physical_devices(\"GPU\")}')"`
- `python3 -c "import torch; print(f'Tensor Cores: {torch.cuda.get_device_capability()}')"`

**Container runtime:**
- `docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi`
- `which nvidia-container-cli`
- `nvidia-container-cli info`

**OpenCL:**
- `clinfo`
- `clinfo | grep "Device Name"`

**System libraries:**
- `dpkg -l | grep -i cuda`
- `dpkg -l | grep -i nvidia`
- `dpkg -l | grep -i rocm`

**Performance info:**
- `nvidia-smi --query-gpu=name,memory.total,memory.free,driver_version,compute_cap --format=csv`
- `nvidia-smi dmon -s pucvmet` (dynamic monitoring)

## Output Format

### Executive Summary
```
GPU: [model]
Driver: [proprietary/open] v[version] ([status])
Compute: [CUDA/ROCm] v[version] (Compute [capability])
AI/ML Readiness: [Ready/Partial/Not Ready]
Best For: [Training/Inference/Both]
Recommended Frameworks: [PyTorch, TensorFlow, etc.]
```

### Detailed AI/ML Assessment

**Driver Status:**
- Type: [Proprietary/Open Source]
- Version: [version number]
- Release Date: [date]
- Status: [Loaded/Error]
- Kernel Module: [module] ([loaded/not loaded])
- Latest Available: [version]
- Update Recommended: [Yes/No]
- Secure Boot: [Compatible/Issue]

**Compute Framework Availability:**
- CUDA Toolkit: [Installed/Not Installed] - v[version]
- CUDA Driver API: v[version]
- ROCm: [Installed/Not Installed] - v[version]
- OpenCL: [Available/Not Available] - v[version]
- Compute Capability: [X.X] ([architecture name])

**GPU Compute Specifications:**
- Architecture: [Turing/Ampere/Ada/RDNA3/Xe]
- Tensor Cores: [Yes/No] - [Generation]
- CUDA Cores / SPs: [count]
- VRAM: [GB] [memory type]
- Memory Bandwidth: [GB/s]
- Precision Support:
  - FP64: [TFLOPS]
  - FP32: [TFLOPS]
  - FP16: [TFLOPS]
  - INT8: [TOPS]
  - TF32: [Yes/No]

**AI/ML Libraries:**
- cuDNN: [version] ([installed/missing])
- cuBLAS: [version] ([installed/missing])
- TensorRT: [version] ([installed/missing])
- NCCL: [version] ([installed/missing])
- MIOpen: [version] (AMD only)
- rocBLAS: [version] (AMD only)

**Deep Learning Framework Support:**
- PyTorch: [version]
  - CUDA Enabled: [Yes/No]
  - CUDA Version: [version]
  - cuDNN Version: [version]
- TensorFlow: [version]
  - GPU Support: [Yes/No]
  - CUDA Version: [version]
- JAX: [installed/not installed]
- ONNX Runtime: [GPU backend available]

**Container Support:**
- NVIDIA Container Toolkit: [installed/not installed]
- Docker GPU Access: [working/not working]
- Podman GPU Support: [available]

**Model Capacity Estimates:**
- Small Models (< 1B params): [batch size X]
- Medium Models (1B-7B params): [batch size X]
- Large Models (7B-13B params): [batch size X]
- Very Large Models (13B-70B params): [requires multi-GPU or not possible]

Example workload estimates based on [GB] VRAM:
- LLaMA 7B: [inference only/training possible]
- Stable Diffusion: [batch size X]
- BERT Base: [batch size X]
- GPT-2: [batch size X]

**Workload Suitability:**
- Training:
  - Small models: [Excellent/Good/Fair/Poor]
  - Medium models: [rating]
  - Large models: [rating]
- Inference:
  - Real-time: [Excellent/Good/Fair/Poor]
  - Batch: [rating]
  - Low-latency: [rating]

**Use Case Recommendations:**
- Computer Vision (CNNs): [Excellent/Good/Fair/Poor]
- NLP (Transformers): [rating]
- Generative AI (LLMs): [rating]
- Diffusion Models: [rating]
- Reinforcement Learning: [rating]

**Performance Tier:**
- Category: [Consumer/Professional/Data Center]
- Training Performance: [rating]
- Inference Performance: [rating]
- Multi-GPU Scaling: [available/not available]

**Optimization Features Available:**
- Automatic Mixed Precision: [Yes/No]
- Tensor Core Utilization: [Yes/No]
- TensorRT Optimization: [Available]
- Flash Attention: [Supported]
- INT8 Quantization: [Supported]
- Multi-GPU Training: [Possible with [count] GPUs]

**Limitations and Bottlenecks:**
- VRAM Constraint: [assessment]
- Memory Bandwidth: [adequate/limited]
- Compute Throughput: [assessment]
- PCIe Bottleneck: [yes/no]
- Driver Limitations: [any known issues]
- Power/Thermal: [throttling concerns]

**Recommendations:**
1. [Driver update/optimization suggestions]
2. [Framework installation recommendations]
3. [Workload optimization suggestions]
4. [Hardware upgrade path if applicable]
5. [Container/virtualization setup if beneficial]

### AI/ML Readiness Scorecard

```
Driver Setup:        [✓/✗/⚠] [details]
CUDA/ROCm Install:   [✓/✗/⚠] [details]
Framework Support:   [✓/✗/⚠] [details]
Library Ecosystem:   [✓/✗/⚠] [details]
Container Runtime:   [✓/✗/⚠] [details]
VRAM Capacity:       [✓/✗/⚠] [details]
Compute Performance: [✓/✗/⚠] [details]

Overall Readiness: [Ready/Needs Setup/Limited/Not Suitable]
```

### AI-Readable JSON

```json
{
  "driver": {
    "type": "proprietary|open_source",
    "version": "",
    "status": "loaded|error",
    "latest_available": "",
    "update_recommended": false
  },
  "compute_platform": {
    "cuda": {
      "installed": false,
      "version": "",
      "compute_capability": ""
    },
    "rocm": {
      "installed": false,
      "version": ""
    },
    "opencl": {
      "available": false,
      "version": ""
    }
  },
  "gpu_specs": {
    "architecture": "",
    "tensor_cores": false,
    "vram_gb": 0,
    "memory_bandwidth_gbs": 0,
    "fp32_tflops": 0,
    "fp16_tflops": 0,
    "int8_tops": 0,
    "tf32_support": false
  },
  "libraries": {
    "cudnn": "",
    "cublas": "",
    "tensorrt": "",
    "nccl": ""
  },
  "frameworks": {
    "pytorch": {
      "installed": false,
      "version": "",
      "cuda_available": false
    },
    "tensorflow": {
      "installed": false,
      "version": "",
      "gpu_available": false
    }
  },
  "container_support": {
    "nvidia_container_toolkit": false,
    "docker_gpu_working": false
  },
  "workload_suitability": {
    "training": {
      "small_models": "excellent|good|fair|poor",
      "medium_models": "",
      "large_models": ""
    },
    "inference": {
      "real_time": "",
      "batch": ""
    }
  },
  "model_capacity": {
    "vram_gb": 0,
    "small_model_batch_size": 0,
    "llama_7b_possible": false,
    "stable_diffusion_batch": 0
  },
  "optimization_features": {
    "amp_support": false,
    "tensor_core_utilization": false,
    "tensorrt_available": false,
    "int8_quantization": false
  },
  "bottlenecks": {
    "vram_limited": false,
    "compute_limited": false,
    "pcie_bottleneck": false
  },
  "ai_ml_readiness": "ready|needs_setup|limited|not_suitable"
}
```

## Execution Guidelines

1. **Identify GPU vendor first**: NVIDIA, AMD, or Intel
2. **Check driver installation**: Verify driver is loaded and working
3. **Assess compute platform**: CUDA for NVIDIA, ROCm for AMD
4. **Query compute capability**: Critical for framework compatibility
5. **Check library installation**: cuDNN, TensorRT, etc.
6. **Test framework access**: Try importing PyTorch/TensorFlow with GPU
7. **Evaluate VRAM capacity**: Estimate model sizes
8. **Check container support**: Important for ML workflows
9. **Identify bottlenecks**: VRAM, compute, or driver issues
10. **Provide actionable recommendations**: Setup steps or optimizations

## Important Notes

- NVIDIA GPUs have the most mature AI/ML ecosystem
- CUDA compute capability determines supported features
- cuDNN is critical for deep learning performance
- VRAM is often the primary bottleneck for large models
- Container runtimes simplify framework management
- AMD ROCm support is improving but less mature than CUDA
- Intel GPUs are emerging in AI/ML space
- Tensor cores provide significant speedup for mixed precision
- Driver version must match CUDA toolkit requirements
- Some features require specific GPU generations
- Multi-GPU setups require additional configuration
- Consumer GPUs can be effective for smaller workloads
- Professional/datacenter GPUs offer better reliability and support

Be thorough and practical - provide a clear assessment of AI/ML readiness and actionable next steps.