apcl
/

jam_sojm / README.md
aakashba's picture
Update README.md
deb0afb
|
raw
history blame
2.05 kB
metadata
license: bigscience-openrail-m
datasets:
  - apcl/so13m
  - apcl/jm52m

Jam_sojm

Jam_sojm is a GPT2-like model for research in fine-grained Java analysis. It is intended for fine-grained analysis of Java source code at the level of methods, statements, and variables, as a foundation for downstream tasks like code completion, comment generation, and automated bug repair.


Jam_sojm Training Details

  • We trained the jam_sojm model using the training procedures from Daniel Grittner's NanoGPT-LoRA

  • The datasets used to train our model are our own datasets so13m dataset and jm52m dataset.

  • First we train the model on so13m training set for 1 epoch, roughly 300,000 training iterations.

  • We reset the learning rate and weight decay, then train it again on the jm52mm training set for 1 more epoch, roughly 300,000 more training iterations for a total of 600,000 iterations.

  • Our GitHub repo contains the code for re-training using the raw data.

Hyperparameter Description Value
e embedding dimensions 1024
L number of layers 24
h attention heads 16
c block size / context length 256
b batch size 4
a accumulation steps 32
d dropout 0.20
r learning rate 3e-5
y weight decay 1e-1

We train our models using a single NVidia A5000 GPUs.


Jam Projects

Current projects using the jam_sojm pre-trained model can be found at our Github repository:

https://github.com/apcl-research/jam