Edit model card

ctrltokyo/llm_prompt_mask_fill_model

This model is a fine-tuned version of distilbert-base-uncased on the code_instructions_120k dataset. It achieves the following results on the evaluation set:

  • Train Loss: 2.1215
  • Validation Loss: 1.5672
  • Epoch: 0

Model description

It's just distilbert-base-uncased with some fine tuning.

Intended uses & limitations

This model could be used for live autocompletion of PROMPTS in a coding-specific chatbot. Don't try this on code, because it won't work.

Training and evaluation data

Evaluated on 5% of training data. No further evaluation performed at this point. Trained on NVIDIA V100.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'inner_optimizer': {'class_name': 'AdamWeightDecay', 'config': {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 2e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 108, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}}, 'dynamic': True, 'initial_scale': 32768.0, 'dynamic_growth_steps': 2000}
  • training_precision: mixed_float16

Training results

Train Loss Validation Loss Epoch
2.1215 1.5672 0

Framework versions

  • Transformers 4.31.0
  • TensorFlow 2.12.0
  • Datasets 2.14.1
  • Tokenizers 0.13.3
Downloads last month
5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from