File size: 4,867 Bytes
b0224ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
07981f6
 
 
 
 
b0224ea
 
 
 
 
07981f6
b0224ea
 
 
 
07981f6
b0224ea
 
 
 
07981f6
b0224ea
 
 
 
 
 
07981f6
 
b0224ea
 
 
 
07981f6
b0224ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a3247b9
 
b0224ea
 
 
 
a3247b9
b0224ea
 
 
 
 
 
a3247b9
b0224ea
 
 
 
 
a3247b9
b0224ea
 
 
a3247b9
 
b0224ea
 
 
 
 
 
 
 
 
 
 
a3247b9
 
 
 
b0224ea
 
 
 
a3247b9
b0224ea
 
 
a3247b9
 
 
 
b0224ea
 
 
a3247b9
 
 
 
b0224ea
 
 
a3247b9
 
 
b0224ea
 
 
 
a3247b9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
library_name: transformers
tags: []
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** Shi Hao Ng, IB DP Student, Marlborough College Malaysia
- **Model type:** Transformer. Decoder only
- **Language(s) (NLP):** Python, HTML, etc. 
- **License:** mit
- **Finetuned from model [optional]:** No

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/Ice-Citron/GPTesla

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
- You input half finished python code, and it will generate python code.

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
- Some level of fine tuning is likely needed or preferred. However I won't be working on this.

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
- This can easily be used for IDEs. Not ideal though as it's rarely correct in its answer. Likely mainly attributed to how it's a pretty small model after all.
- Even then, I'm already struggling to train it with my 4x Nvidia A100 PCIe 80GB, taking 15 hours!

## How to Get Started with the Model

Use the code below to get started with the model.
- just follow the instructions on huggingface "use this model". Should work. If not try and contact me.

[More Information Needed]

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing [optional]

[More Information Needed]


#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
- 111 million parameter. FP16, 444 Megabytes.
- Pretty fast and lightweight model when using T4 GPU.

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->
- https://huggingface.co/datasets/shng2025/gptesla-valid

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->
- https://huggingface.co/datasets/shng2025/gptesla-train


#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
- Perhaps not accurate because I'm expecting 1 to 1 representation for code. As in reality there's many way of coding to reach the same logic. And a precise way of coding is not required.

### Results

- 1.1 loss/train in the end. Model converged after 150,000 steps.
- weights and biases file: https://wandb.ai/marlborough-college-malaysia/gptesla-small/runs/m9sqzqo3?nw=nwusershng2025

#### Summary



## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** 4x Nvidia A100 PCIe + 96x AMD CPU
- **Hours used:** 15 hours
- **Cloud Provider:** Azure
- **Compute Region:** Unclear
- **Carbon Emitted:** [More Information Needed]

### Model Architecture and Objective

- Based on codeparrot. And using GPT2's architecture but it's weights are random initialised. 

### Compute Infrastructure

- NVMe Link
- 4x Nvidia A100 PCIe
- 96x AMD CPU from Azure
- 900 GB RAM

#### Hardware

- NVMe Link
- 4x Nvidia A100 PCIe
- 96x AMD CPU from Azure
- 900 GB RAM

#### Software

- Python 3.10.14
- Latest version of Pytorch, transformer, wandb libraries, etc. installed. Refer to github repo for versions
- Accelerate

## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
- codeparrot used