gptesla-small / README.md
shng2025's picture
Update README.md
a3247b9 verified
metadata
library_name: transformers
tags: []

Model Card for Model ID

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: Shi Hao Ng, IB DP Student, Marlborough College Malaysia
  • Model type: Transformer. Decoder only
  • Language(s) (NLP): Python, HTML, etc.
  • License: mit
  • Finetuned from model [optional]: No

Model Sources [optional]

Uses

  • You input half finished python code, and it will generate python code.

Direct Use

  • Some level of fine tuning is likely needed or preferred. However I won't be working on this.

[More Information Needed]

Downstream Use [optional]

  • This can easily be used for IDEs. Not ideal though as it's rarely correct in its answer. Likely mainly attributed to how it's a pretty small model after all.
  • Even then, I'm already struggling to train it with my 4x Nvidia A100 PCIe 80GB, taking 15 hours!

How to Get Started with the Model

Use the code below to get started with the model.

  • just follow the instructions on huggingface "use this model". Should work. If not try and contact me.

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

  • 111 million parameter. FP16, 444 Megabytes.
  • Pretty fast and lightweight model when using T4 GPU.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

  • Perhaps not accurate because I'm expecting 1 to 1 representation for code. As in reality there's many way of coding to reach the same logic. And a precise way of coding is not required.

Results

Summary

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: 4x Nvidia A100 PCIe + 96x AMD CPU
  • Hours used: 15 hours
  • Cloud Provider: Azure
  • Compute Region: Unclear
  • Carbon Emitted: [More Information Needed]

Model Architecture and Objective

  • Based on codeparrot. And using GPT2's architecture but it's weights are random initialised.

Compute Infrastructure

  • NVMe Link
  • 4x Nvidia A100 PCIe
  • 96x AMD CPU from Azure
  • 900 GB RAM

Hardware

  • NVMe Link
  • 4x Nvidia A100 PCIe
  • 96x AMD CPU from Azure
  • 900 GB RAM

Software

  • Python 3.10.14
  • Latest version of Pytorch, transformer, wandb libraries, etc. installed. Refer to github repo for versions
  • Accelerate

Citation [optional]

  • codeparrot used