zhaojer commited on
Commit
6f754e8
·
verified ·
1 Parent(s): 94863ac

Added model evaluation results

Browse files
Files changed (1) hide show
  1. README.md +35 -27
README.md CHANGED
@@ -67,54 +67,62 @@ print(prediction)
67
  ### Training Data
68
 
69
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
70
 
71
- [More Information Needed]
 
 
72
 
73
  ### Training Procedure
74
 
75
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
76
 
77
- #### Preprocessing [optional]
78
-
79
- [More Information Needed]
80
-
81
 
82
- #### Training Hyperparameters
83
 
84
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
85
 
86
- #### Speeds, Sizes, Times [optional]
 
 
 
 
 
 
 
 
87
 
88
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
89
-
90
- [More Information Needed]
91
 
92
  ## Evaluation
93
 
94
  <!-- This section describes the evaluation protocols and provides the results. -->
95
 
96
- ### Testing Data, Factors & Metrics
97
-
98
- #### Testing Data
99
 
100
  <!-- This should link to a Dataset Card if possible. -->
 
 
 
101
 
102
- [More Information Needed]
103
-
104
- #### Factors
105
-
106
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
107
-
108
- [More Information Needed]
109
-
110
- #### Metrics
111
 
112
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
113
-
114
- [More Information Needed]
 
 
 
 
115
 
116
  ### Results
117
 
118
- [More Information Needed]
 
 
119
 
120
- #### Summary
 
 
 
 
67
  ### Training Data
68
 
69
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
70
+ The model was fine-tuned on the hot paths dataset: [zhaojer/compiler_hot_paths](https://huggingface.co/datasets/zhaojer/compiler_hot_paths)
71
 
72
+ The dataset is already split into train, validation, test sets with necessary columns/data needed for training/fine-tuning. No further preprocessing was performed for the data.
73
+
74
+ The data (in the `path` column) were tokenized using the standard `BertTokenizer` for the `bert-base-uncased` model.
75
 
76
  ### Training Procedure
77
 
78
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
79
 
80
+ We defined accuracy and AUROC as evaluation metrics for the model.
 
 
 
81
 
82
+ The model was fine-tuned for 3 epochs with standard hyperparameters, which took about 10 minutes to complete using NVIDIA T4 GPU.
83
 
84
+ #### Detailed Training Hyperparameters
85
 
86
+ - `evaluation_strategy="epoch"`
87
+ - `logging_strategy="epoch"`
88
+ - `save_strategy="epoch"`
89
+ - `num_train_epochs=3`
90
+ - `per_device_train_batch_size=16`
91
+ - `per_device_eval_batch_size=16`
92
+ - `learning_rate=5e-5`
93
+ - `load_best_model_at_end=True`
94
+ - `metric_for_best_model="accuracy"`
95
 
96
+ Note: Anything not explicitly stated used default value.
 
 
97
 
98
  ## Evaluation
99
 
100
  <!-- This section describes the evaluation protocols and provides the results. -->
101
 
102
+ ### Testing Data
 
 
103
 
104
  <!-- This should link to a Dataset Card if possible. -->
105
+ The testing data consist of 68 hot paths and 92 cold paths generated from 4 distinct C programs.
106
+ They are also from [zhaojer/compiler_hot_paths](https://huggingface.co/datasets/zhaojer/compiler_hot_paths); please see its dataset card for how the testing data were created.
107
+ The model had never seen these testing data previously.
108
 
109
+ ### Metrics
 
 
 
 
 
 
 
 
110
 
111
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
112
+ We evaluated the model on the testing data using the following metrics:
113
+ - Loss (available by default)
114
+ - Accuracy
115
+ - AUROC
116
+ - Precision, Recall, F1 score
117
+ - Confusion matrix
118
 
119
  ### Results
120
 
121
+ | Loss | Accuracy | AUROC | Precision | Recall | F1 |
122
+ | ---- | -------- | ----- | --------- | ------ | ---- |
123
+ | 0.0620 | 0.9875 | 0.9952| 1.0000 | 0.9706 | 0.99 |
124
 
125
+ | | Actually Hot | Actually Cold |
126
+ | ------------- | ----------- | ------------ |
127
+ | Predicted Hot | 66 | 0 |
128
+ | Predicted Cold| 2 | 92 |