--- license: bigcode-openrail-m library_name: peft tags: - trl - sft - generated_from_trainer base_model: bigcode/starcoder2-3b model-index: - name: finetune_starcoder2 results: [] datasets: - bigcode/the-stack-smol --- # finetune_starcoder2 This model is a fine-tuned version of [bigcode/starcoder2-3b](https://huggingface.co/bigcode/starcoder2-3b) on [bigcode/the-stack-smol](https://huggingface.co/datasets/bigcode/the-stack-smol). ## Model description This fine-tuned model builds upon the `bigcode/starcoder2-3b` base model, further specializing it for code completion tasks using the `bigcode/the-stack-smol` dataset on SQL data. This dataset focuses on code snippets and solutions, allowing the model to suggest relevant completions and potentially even generate code based on your prompts. ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure **1. Load Dataset and Model:** - Load the `bigcode/the-stack-smol` dataset using the Hugging Face Datasets library. - Filter for the specified subset (`data/sql`) and split (`train`). - Load the `bigcode/starcoder2-3b` model from the Hugging Face Hub with '4-bit' quantization. **2. Preprocess Data:** - Tokenize the code text using the appropriate tokenizer for the chosen model. - Apply necessary cleaning or normalization (e.g., removing comments, handling indentation). - Create input examples suitable for the model's architecture (e.g., with masked language modeling objectives). **3. Configure Training:** - Initialize a Trainer object (likely from a library like Transformers). - Set training arguments based on the provided `args`: - Learning rate, optimizer, scheduler - Gradient accumulation steps - Weight decay - Loss function (likely cross-entropy) - Evaluation metrics (e.g., accuracy, perplexity) - Device placement (GPU/TPU) - Number of processes for potential distributed training **4. Train the Model:** - Start the training loop for the specified `max_steps`. - Iterate through batches of preprocessed code examples. - Forward pass through the model to generate predictions. - Calculate loss based on ground truth and predictions. - Backpropagate gradients to update model parameters. **5. Evaluation (Optional):** - Periodically evaluate model performance on a validation or test set. - Calculate relevant metrics (accuracy, perplexity, code completion accuracy). - Monitor training progress and adjust hyperparameters as needed. **6. Save the Fine-tuned Model:** - Save the model's weights and configuration to the `output_dir`. **7. Push to Hugging Face Hub (Optional):** - If `push_to_hub` is True, create a model card and push the model to Hugging Face Hub for sharing and use. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 1 - eval_batch_size: 8 - seed: 0 - gradient_accumulation_steps: 4 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 - mixed_precision_training: Native AMP ### Training results ### Framework versions - PEFT 0.8.2 - Transformers 4.40.0.dev0 - Pytorch 2.2.1+cu121 - Datasets 2.18.0 - Tokenizers 0.15.2