metadata
license: bigcode-openrail-m
library_name: peft
tags:
- trl
- sft
- generated_from_trainer
base_model: bigcode/starcoder2-3b
model-index:
- name: finetune_starcoder2
results: []
datasets:
- bigcode/the-stack-smol
finetune_starcoder2
This model is a fine-tuned version of bigcode/starcoder2-3b on bigcode/the-stack-smol.
Model description
This fine-tuned model builds upon the bigcode/starcoder2-3b
base model, further specializing it for code completion tasks using the bigcode/the-stack-smol
dataset on SQL data. This dataset focuses on code snippets and solutions, allowing the model to suggest relevant completions and potentially even generate code based on your prompts.
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
1. Load Dataset and Model:
- Load the
bigcode/the-stack-smol
dataset using the Hugging Face Datasets library. - Filter for the specified subset (
data/sql
) and split (train
). - Load the
bigcode/starcoder2-3b
model from the Hugging Face Hub with '4-bit' quantization.
2. Preprocess Data:
- Tokenize the code text using the appropriate tokenizer for the chosen model.
- Apply necessary cleaning or normalization (e.g., removing comments, handling indentation).
- Create input examples suitable for the model's architecture (e.g., with masked language modeling objectives).
3. Configure Training:
- Initialize a Trainer object (likely from a library like Transformers).
- Set training arguments based on the provided
args
:- Learning rate, optimizer, scheduler
- Gradient accumulation steps
- Weight decay
- Loss function (likely cross-entropy)
- Evaluation metrics (e.g., accuracy, perplexity)
- Device placement (GPU/TPU)
- Number of processes for potential distributed training
4. Train the Model:
- Start the training loop for the specified
max_steps
. - Iterate through batches of preprocessed code examples.
- Forward pass through the model to generate predictions.
- Calculate loss based on ground truth and predictions.
- Backpropagate gradients to update model parameters.
5. Evaluation (Optional):
- Periodically evaluate model performance on a validation or test set.
- Calculate relevant metrics (accuracy, perplexity, code completion accuracy).
- Monitor training progress and adjust hyperparameters as needed.
6. Save the Fine-tuned Model:
- Save the model's weights and configuration to the
output_dir
.
7. Push to Hugging Face Hub (Optional):
- If
push_to_hub
is True, create a model card and push the model to Hugging Face Hub for sharing and use.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 0
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
- mixed_precision_training: Native AMP
Training results
Framework versions
- PEFT 0.8.2
- Transformers 4.40.0.dev0
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2