Model Card for Model ID
Model Details
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: Shi Hao Ng, IB DP Student, Marlborough College Malaysia
- Model type: Transformer. Decoder only
- Language(s) (NLP): Python, HTML, etc.
- License: mit
- Finetuned from model [optional]: No
Model Sources [optional]
- Repository: https://github.com/Ice-Citron/GPTesla
Uses
- You input half finished python code, and it will generate python code.
Direct Use
- Some level of fine tuning is likely needed or preferred. However I won't be working on this.
[More Information Needed]
Downstream Use [optional]
- This can easily be used for IDEs. Not ideal though as it's rarely correct in its answer. Likely mainly attributed to how it's a pretty small model after all.
- Even then, I'm already struggling to train it with my 4x Nvidia A100 PCIe 80GB, taking 15 hours!
How to Get Started with the Model
Use the code below to get started with the model.
- just follow the instructions on huggingface "use this model". Should work. If not try and contact me.
[More Information Needed]
Training Details
Training Data
[More Information Needed]
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
- 111 million parameter. FP16, 444 Megabytes.
- Pretty fast and lightweight model when using T4 GPU.
Evaluation
Testing Data, Factors & Metrics
Testing Data
Factors
- Perhaps not accurate because I'm expecting 1 to 1 representation for code. As in reality there's many way of coding to reach the same logic. And a precise way of coding is not required.
Results
- 1.1 loss/train in the end. Model converged after 150,000 steps.
- weights and biases file: https://wandb.ai/marlborough-college-malaysia/gptesla-small/runs/m9sqzqo3?nw=nwusershng2025
Summary
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: 4x Nvidia A100 PCIe + 96x AMD CPU
- Hours used: 15 hours
- Cloud Provider: Azure
- Compute Region: Unclear
- Carbon Emitted: [More Information Needed]
Model Architecture and Objective
- Based on codeparrot. And using GPT2's architecture but it's weights are random initialised.
Compute Infrastructure
- NVMe Link
- 4x Nvidia A100 PCIe
- 96x AMD CPU from Azure
- 900 GB RAM
Hardware
- NVMe Link
- 4x Nvidia A100 PCIe
- 96x AMD CPU from Azure
- 900 GB RAM
Software
- Python 3.10.14
- Latest version of Pytorch, transformer, wandb libraries, etc. installed. Refer to github repo for versions
- Accelerate
Citation [optional]
- codeparrot used
- Downloads last month
- 18
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.