File size: 4,867 Bytes
b0224ea 07981f6 b0224ea 07981f6 b0224ea 07981f6 b0224ea 07981f6 b0224ea 07981f6 b0224ea 07981f6 b0224ea a3247b9 b0224ea a3247b9 b0224ea a3247b9 b0224ea a3247b9 b0224ea a3247b9 b0224ea a3247b9 b0224ea a3247b9 b0224ea a3247b9 b0224ea a3247b9 b0224ea a3247b9 b0224ea a3247b9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
---
library_name: transformers
tags: []
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Developed by:** Shi Hao Ng, IB DP Student, Marlborough College Malaysia
- **Model type:** Transformer. Decoder only
- **Language(s) (NLP):** Python, HTML, etc.
- **License:** mit
- **Finetuned from model [optional]:** No
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/Ice-Citron/GPTesla
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
- You input half finished python code, and it will generate python code.
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
- Some level of fine tuning is likely needed or preferred. However I won't be working on this.
[More Information Needed]
### Downstream Use [optional]
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
- This can easily be used for IDEs. Not ideal though as it's rarely correct in its answer. Likely mainly attributed to how it's a pretty small model after all.
- Even then, I'm already struggling to train it with my 4x Nvidia A100 PCIe 80GB, taking 15 hours!
## How to Get Started with the Model
Use the code below to get started with the model.
- just follow the instructions on huggingface "use this model". Should work. If not try and contact me.
[More Information Needed]
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
[More Information Needed]
#### Training Hyperparameters
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
#### Speeds, Sizes, Times [optional]
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
- 111 million parameter. FP16, 444 Megabytes.
- Pretty fast and lightweight model when using T4 GPU.
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
- https://huggingface.co/datasets/shng2025/gptesla-valid
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
- https://huggingface.co/datasets/shng2025/gptesla-train
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
- Perhaps not accurate because I'm expecting 1 to 1 representation for code. As in reality there's many way of coding to reach the same logic. And a precise way of coding is not required.
### Results
- 1.1 loss/train in the end. Model converged after 150,000 steps.
- weights and biases file: https://wandb.ai/marlborough-college-malaysia/gptesla-small/runs/m9sqzqo3?nw=nwusershng2025
#### Summary
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** 4x Nvidia A100 PCIe + 96x AMD CPU
- **Hours used:** 15 hours
- **Cloud Provider:** Azure
- **Compute Region:** Unclear
- **Carbon Emitted:** [More Information Needed]
### Model Architecture and Objective
- Based on codeparrot. And using GPT2's architecture but it's weights are random initialised.
### Compute Infrastructure
- NVMe Link
- 4x Nvidia A100 PCIe
- 96x AMD CPU from Azure
- 900 GB RAM
#### Hardware
- NVMe Link
- 4x Nvidia A100 PCIe
- 96x AMD CPU from Azure
- 900 GB RAM
#### Software
- Python 3.10.14
- Latest version of Pytorch, transformer, wandb libraries, etc. installed. Refer to github repo for versions
- Accelerate
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
- codeparrot used
|