shng2025
/

gptesla-small

@@ -78,103 +78,73 @@ Use the code below to get started with the model.
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
 #### Testing Data
 <!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
 #### Factors
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
 ### Results
-[More Information Needed]
 #### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
 - **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
 ### Model Architecture and Objective
-[More Information Needed]
 ### Compute Infrastructure
-[More Information Needed]
 #### Hardware
-[More Information Needed]
 #### Software
-[More Information Needed]
 ## Citation [optional]
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+- 111 million parameter. FP16, 444 Megabytes.
+- Pretty fast and lightweight model when using T4 GPU.
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
+- https://huggingface.co/datasets/shng2025/gptesla-valid
 ### Testing Data, Factors & Metrics
 #### Testing Data
 <!-- This should link to a Dataset Card if possible. -->
+- https://huggingface.co/datasets/shng2025/gptesla-train
 #### Factors
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+- Perhaps not accurate because I'm expecting 1 to 1 representation for code. As in reality there's many way of coding to reach the same logic. And a precise way of coding is not required.
 ### Results
+- 1.1 loss/train in the end. Model converged after 150,000 steps.
+- weights and biases file: https://wandb.ai/marlborough-college-malaysia/gptesla-small/runs/m9sqzqo3?nw=nwusershng2025
 #### Summary
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** 4x Nvidia A100 PCIe + 96x AMD CPU
+- **Hours used:** 15 hours
+- **Cloud Provider:** Azure
+- **Compute Region:** Unclear
 - **Carbon Emitted:** [More Information Needed]
 ### Model Architecture and Objective
+- Based on codeparrot. And using GPT2's architecture but it's weights are random initialised.
 ### Compute Infrastructure
+- NVMe Link
+- 4x Nvidia A100 PCIe
+- 96x AMD CPU from Azure
+- 900 GB RAM
 #### Hardware
+- NVMe Link
+- 4x Nvidia A100 PCIe
+- 96x AMD CPU from Azure
+- 900 GB RAM
 #### Software
+- Python 3.10.14
+- Latest version of Pytorch, transformer, wandb libraries, etc. installed. Refer to github repo for versions
+- Accelerate
 ## Citation [optional]
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+- codeparrot used