finetune_starcoder2 / README.md
Sayan18's picture
Update README.md
e45cb5b verified
|
raw
history blame
3.53 kB
metadata
license: bigcode-openrail-m
library_name: peft
tags:
  - trl
  - sft
  - generated_from_trainer
base_model: bigcode/starcoder2-3b
model-index:
  - name: finetune_starcoder2
    results: []
datasets:
  - bigcode/the-stack-smol

finetune_starcoder2

This model is a fine-tuned version of bigcode/starcoder2-3b on bigcode/the-stack-smol.

Model description

This fine-tuned model builds upon the bigcode/starcoder2-3b base model, further specializing it for code completion tasks using the bigcode/the-stack-smol dataset on SQL data. This dataset focuses on code snippets and solutions, allowing the model to suggest relevant completions and potentially even generate code based on your prompts.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

1. Load Dataset and Model:

  • Load the bigcode/the-stack-smol dataset using the Hugging Face Datasets library.
  • Filter for the specified subset (data/sql) and split (train).
  • Load the bigcode/starcoder2-3b model from the Hugging Face Hub with '4-bit' quantization.

2. Preprocess Data:

  • Tokenize the code text using the appropriate tokenizer for the chosen model.
  • Apply necessary cleaning or normalization (e.g., removing comments, handling indentation).
  • Create input examples suitable for the model's architecture (e.g., with masked language modeling objectives).

3. Configure Training:

  • Initialize a Trainer object (likely from a library like Transformers).
  • Set training arguments based on the provided args:
    • Learning rate, optimizer, scheduler
    • Gradient accumulation steps
    • Weight decay
    • Loss function (likely cross-entropy)
    • Evaluation metrics (e.g., accuracy, perplexity)
    • Device placement (GPU/TPU)
    • Number of processes for potential distributed training

4. Train the Model:

  • Start the training loop for the specified max_steps.
  • Iterate through batches of preprocessed code examples.
  • Forward pass through the model to generate predictions.
  • Calculate loss based on ground truth and predictions.
  • Backpropagate gradients to update model parameters.

5. Evaluation (Optional):

  • Periodically evaluate model performance on a validation or test set.
  • Calculate relevant metrics (accuracy, perplexity, code completion accuracy).
  • Monitor training progress and adjust hyperparameters as needed.

6. Save the Fine-tuned Model:

  • Save the model's weights and configuration to the output_dir.

7. Push to Hugging Face Hub (Optional):

  • If push_to_hub is True, create a model card and push the model to Hugging Face Hub for sharing and use.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 0
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000
  • mixed_precision_training: Native AMP

Training results

Framework versions

  • PEFT 0.8.2
  • Transformers 4.40.0.dev0
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2