CodeTrans transfer learning pre-trained model
Pretrained model on programming languages using the t5 large model architecture. It was first released in this repository.
Model description
This CodeTrans model is based on the t5-large
model. It has its own SentencePiece vocabulary model. It used transfer-learning pre-training on 7 unsupervised datasets in the software development domain.
The model was trained on a single TPU Pod V3-8 for 240,000 steps in total, using sequence length 512 (batch size 4096). It has a total of approximately 220M parameters and was trained using the encoder-decoder architecture. The optimizer used is AdaFactor with inverse square root learning rate schedule for pre-training.
It could be used to fine-tune other tasks in the software development domain.
Created by Ahmed Elnaggar | LinkedIn and Wei Ding | LinkedIn
- Downloads last month
- 7