Added model evaluation results
Browse files
README.md
CHANGED
@@ -67,54 +67,62 @@ print(prediction)
|
|
67 |
### Training Data
|
68 |
|
69 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
|
|
70 |
|
71 |
-
|
|
|
|
|
72 |
|
73 |
### Training Procedure
|
74 |
|
75 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
76 |
|
77 |
-
|
78 |
-
|
79 |
-
[More Information Needed]
|
80 |
-
|
81 |
|
82 |
-
|
83 |
|
84 |
-
|
85 |
|
86 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
87 |
|
88 |
-
|
89 |
-
|
90 |
-
[More Information Needed]
|
91 |
|
92 |
## Evaluation
|
93 |
|
94 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
95 |
|
96 |
-
### Testing Data
|
97 |
-
|
98 |
-
#### Testing Data
|
99 |
|
100 |
<!-- This should link to a Dataset Card if possible. -->
|
|
|
|
|
|
|
101 |
|
102 |
-
|
103 |
-
|
104 |
-
#### Factors
|
105 |
-
|
106 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
107 |
-
|
108 |
-
[More Information Needed]
|
109 |
-
|
110 |
-
#### Metrics
|
111 |
|
112 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
113 |
-
|
114 |
-
|
|
|
|
|
|
|
|
|
115 |
|
116 |
### Results
|
117 |
|
118 |
-
|
|
|
|
|
119 |
|
120 |
-
|
|
|
|
|
|
|
|
67 |
### Training Data
|
68 |
|
69 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
70 |
+
The model was fine-tuned on the hot paths dataset: [zhaojer/compiler_hot_paths](https://huggingface.co/datasets/zhaojer/compiler_hot_paths)
|
71 |
|
72 |
+
The dataset is already split into train, validation, test sets with necessary columns/data needed for training/fine-tuning. No further preprocessing was performed for the data.
|
73 |
+
|
74 |
+
The data (in the `path` column) were tokenized using the standard `BertTokenizer` for the `bert-base-uncased` model.
|
75 |
|
76 |
### Training Procedure
|
77 |
|
78 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
79 |
|
80 |
+
We defined accuracy and AUROC as evaluation metrics for the model.
|
|
|
|
|
|
|
81 |
|
82 |
+
The model was fine-tuned for 3 epochs with standard hyperparameters, which took about 10 minutes to complete using NVIDIA T4 GPU.
|
83 |
|
84 |
+
#### Detailed Training Hyperparameters
|
85 |
|
86 |
+
- `evaluation_strategy="epoch"`
|
87 |
+
- `logging_strategy="epoch"`
|
88 |
+
- `save_strategy="epoch"`
|
89 |
+
- `num_train_epochs=3`
|
90 |
+
- `per_device_train_batch_size=16`
|
91 |
+
- `per_device_eval_batch_size=16`
|
92 |
+
- `learning_rate=5e-5`
|
93 |
+
- `load_best_model_at_end=True`
|
94 |
+
- `metric_for_best_model="accuracy"`
|
95 |
|
96 |
+
Note: Anything not explicitly stated used default value.
|
|
|
|
|
97 |
|
98 |
## Evaluation
|
99 |
|
100 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
101 |
|
102 |
+
### Testing Data
|
|
|
|
|
103 |
|
104 |
<!-- This should link to a Dataset Card if possible. -->
|
105 |
+
The testing data consist of 68 hot paths and 92 cold paths generated from 4 distinct C programs.
|
106 |
+
They are also from [zhaojer/compiler_hot_paths](https://huggingface.co/datasets/zhaojer/compiler_hot_paths); please see its dataset card for how the testing data were created.
|
107 |
+
The model had never seen these testing data previously.
|
108 |
|
109 |
+
### Metrics
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
110 |
|
111 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
112 |
+
We evaluated the model on the testing data using the following metrics:
|
113 |
+
- Loss (available by default)
|
114 |
+
- Accuracy
|
115 |
+
- AUROC
|
116 |
+
- Precision, Recall, F1 score
|
117 |
+
- Confusion matrix
|
118 |
|
119 |
### Results
|
120 |
|
121 |
+
| Loss | Accuracy | AUROC | Precision | Recall | F1 |
|
122 |
+
| ---- | -------- | ----- | --------- | ------ | ---- |
|
123 |
+
| 0.0620 | 0.9875 | 0.9952| 1.0000 | 0.9706 | 0.99 |
|
124 |
|
125 |
+
| | Actually Hot | Actually Cold |
|
126 |
+
| ------------- | ----------- | ------------ |
|
127 |
+
| Predicted Hot | 66 | 0 |
|
128 |
+
| Predicted Cold| 2 | 92 |
|