shng2025 commited on
Commit
a3247b9
1 Parent(s): 07981f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -54
README.md CHANGED
@@ -78,103 +78,73 @@ Use the code below to get started with the model.
78
  #### Speeds, Sizes, Times [optional]
79
 
80
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
81
-
82
- [More Information Needed]
83
 
84
  ## Evaluation
85
 
86
  <!-- This section describes the evaluation protocols and provides the results. -->
 
87
 
88
  ### Testing Data, Factors & Metrics
89
 
90
  #### Testing Data
91
 
92
  <!-- This should link to a Dataset Card if possible. -->
 
93
 
94
- [More Information Needed]
95
 
96
  #### Factors
97
 
98
  <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
99
-
100
- [More Information Needed]
101
-
102
- #### Metrics
103
-
104
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
105
-
106
- [More Information Needed]
107
 
108
  ### Results
109
 
110
- [More Information Needed]
 
111
 
112
  #### Summary
113
 
114
 
115
 
116
- ## Model Examination [optional]
117
-
118
- <!-- Relevant interpretability work for the model goes here -->
119
-
120
- [More Information Needed]
121
-
122
  ## Environmental Impact
123
 
124
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
125
 
126
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
127
 
128
- - **Hardware Type:** [More Information Needed]
129
- - **Hours used:** [More Information Needed]
130
- - **Cloud Provider:** [More Information Needed]
131
- - **Compute Region:** [More Information Needed]
132
  - **Carbon Emitted:** [More Information Needed]
133
 
134
- ## Technical Specifications [optional]
135
-
136
  ### Model Architecture and Objective
137
 
138
- [More Information Needed]
139
 
140
  ### Compute Infrastructure
141
 
142
- [More Information Needed]
 
 
 
143
 
144
  #### Hardware
145
 
146
- [More Information Needed]
 
 
 
147
 
148
  #### Software
149
 
150
- [More Information Needed]
 
 
151
 
152
  ## Citation [optional]
153
 
154
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
155
-
156
- **BibTeX:**
157
-
158
- [More Information Needed]
159
-
160
- **APA:**
161
-
162
- [More Information Needed]
163
-
164
- ## Glossary [optional]
165
-
166
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
167
-
168
- [More Information Needed]
169
-
170
- ## More Information [optional]
171
-
172
- [More Information Needed]
173
-
174
- ## Model Card Authors [optional]
175
-
176
- [More Information Needed]
177
-
178
- ## Model Card Contact
179
-
180
- [More Information Needed]
 
78
  #### Speeds, Sizes, Times [optional]
79
 
80
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
81
+ - 111 million parameter. FP16, 444 Megabytes.
82
+ - Pretty fast and lightweight model when using T4 GPU.
83
 
84
  ## Evaluation
85
 
86
  <!-- This section describes the evaluation protocols and provides the results. -->
87
+ - https://huggingface.co/datasets/shng2025/gptesla-valid
88
 
89
  ### Testing Data, Factors & Metrics
90
 
91
  #### Testing Data
92
 
93
  <!-- This should link to a Dataset Card if possible. -->
94
+ - https://huggingface.co/datasets/shng2025/gptesla-train
95
 
 
96
 
97
  #### Factors
98
 
99
  <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
100
+ - Perhaps not accurate because I'm expecting 1 to 1 representation for code. As in reality there's many way of coding to reach the same logic. And a precise way of coding is not required.
 
 
 
 
 
 
 
101
 
102
  ### Results
103
 
104
+ - 1.1 loss/train in the end. Model converged after 150,000 steps.
105
+ - weights and biases file: https://wandb.ai/marlborough-college-malaysia/gptesla-small/runs/m9sqzqo3?nw=nwusershng2025
106
 
107
  #### Summary
108
 
109
 
110
 
 
 
 
 
 
 
111
  ## Environmental Impact
112
 
113
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
114
 
115
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
116
 
117
+ - **Hardware Type:** 4x Nvidia A100 PCIe + 96x AMD CPU
118
+ - **Hours used:** 15 hours
119
+ - **Cloud Provider:** Azure
120
+ - **Compute Region:** Unclear
121
  - **Carbon Emitted:** [More Information Needed]
122
 
 
 
123
  ### Model Architecture and Objective
124
 
125
+ - Based on codeparrot. And using GPT2's architecture but it's weights are random initialised.
126
 
127
  ### Compute Infrastructure
128
 
129
+ - NVMe Link
130
+ - 4x Nvidia A100 PCIe
131
+ - 96x AMD CPU from Azure
132
+ - 900 GB RAM
133
 
134
  #### Hardware
135
 
136
+ - NVMe Link
137
+ - 4x Nvidia A100 PCIe
138
+ - 96x AMD CPU from Azure
139
+ - 900 GB RAM
140
 
141
  #### Software
142
 
143
+ - Python 3.10.14
144
+ - Latest version of Pytorch, transformer, wandb libraries, etc. installed. Refer to github repo for versions
145
+ - Accelerate
146
 
147
  ## Citation [optional]
148
 
149
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
150
+ - codeparrot used