RajBhope commited on
Commit
2edd63e
·
verified ·
1 Parent(s): 18f6a27

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - gpu-runtime-prediction
4
+ - code-understanding
5
+ - regression
6
+ - performance-modeling
7
+ datasets:
8
+ - RajBhope/gpu-runtime-prediction-dataset
9
+ language:
10
+ - code
11
+ library_name: scikit-learn
12
+ pipeline_tag: tabular-regression
13
+ ---
14
+
15
+ # GPU Runtime Predictor 🚀⚡
16
+
17
+ Predicts GPU kernel/operation **runtime in milliseconds** given **source code** + **GPU hardware specifications**.
18
+
19
+ ## How It Works
20
+
21
+ 1. **Code Feature Extraction**: Analyzes source code to extract 48 features (tensor dimensions, operation types, complexity indicators)
22
+ 2. **GPU Feature Encoding**: Uses 12 hardware specs (CUDA cores, memory bandwidth, compute capability, etc.)
23
+ 3. **ML Prediction**: Ensemble of Gradient Boosted Trees + Random Forest + Neural Network
24
+
25
+ ### Model Comparison
26
+
27
+ | Model | R² | RMSE | Spearman ρ | MAPE % |
28
+ |-------|-----|------|------------|--------|
29
+ | **GBR** | 0.9923 | 0.0728 | 0.9264 | 16.5% |
30
+ | **RF** | 0.9924 | 0.0724 | 0.9277 | 16.3% |
31
+ | **NN** | 0.9932 | 0.0687 | 0.9187 | 17.0% |
32
+ | **Ensemble** | 0.9930 | 0.0693 | 0.9272 | 16.3% |
33
+
34
+ ### GPU Catalog (12 GPUs)
35
+
36
+ | GPU | FP32 TFLOPS | Memory BW | VRAM |
37
+ |-----|------------|-----------|------|
38
+ | NVIDIA T4 | 8.1 | 320 GB/s | 16 GB |
39
+ | NVIDIA V100 | 15.7 | 900 GB/s | 32 GB |
40
+ | NVIDIA A10G | 31.2 | 600 GB/s | 24 GB |
41
+ | NVIDIA A100 40GB | 19.5 | 1555 GB/s | 40 GB |
42
+ | NVIDIA A100 80GB | 19.5 | 2039 GB/s | 80 GB |
43
+ | NVIDIA L4 | 30.3 | 300 GB/s | 24 GB |
44
+ | NVIDIA L40S | 91.6 | 864 GB/s | 48 GB |
45
+ | NVIDIA RTX 3090 | 35.6 | 936 GB/s | 24 GB |
46
+ | NVIDIA RTX 4090 | 82.6 | 1008 GB/s | 24 GB |
47
+ | NVIDIA H100 SXM | 67.0 | 3350 GB/s | 80 GB |
48
+ | NVIDIA H100 PCIe | 48.0 | 2039 GB/s | 80 GB |
49
+ | NVIDIA RTX A6000 | 38.7 | 768 GB/s | 48 GB |
50
+
51
+ ### 15 Supported Workload Types
52
+ matmul, conv2d, attention, transformer_block, linear, layernorm, batchnorm,
53
+ softmax, embedding, elementwise, reduction, pooling, FFT, sort, loss+backward
54
+
55
+ ## Usage
56
+
57
+ ```python
58
+ # See the Gradio demo for interactive use
59
+ # Or load models directly:
60
+ import pickle
61
+ with open('model_gbr.pkl', 'rb') as f:
62
+ model = pickle.load(f)
63
+ ```
64
+
65
+ ## Training
66
+
67
+ - **Dataset**: [RajBhope/gpu-runtime-prediction-dataset](https://hf.co/datasets/RajBhope/gpu-runtime-prediction-dataset)
68
+ - **51,900 samples** = 4,325 workloads × 12 GPUs
69
+ - Runtime generated via physics-based roofline performance model
70
+ - Based on research from [Regression Language Models](https://arxiv.org/abs/2509.26476) and [HELP](https://arxiv.org/abs/2106.08630)