Uploaded model

Model description

This model is a refined version of a LoRA adapter trained on the unsloth/Llama-3.2-3B-Instruct model using the FineTome-100k dataset. The finetuned model uses fewer parameters (1B vs. 3B) to achieve faster training and improved adaptability for specific tasks, such as medical applications.

Key adjustments:

  1. Reduced Parameter Count: The model was downsized to 1B parameters to improve training efficiency and ease customization.
  2. Adjusted Learning Rate: A smaller learning rate was used to prevent overfitting and mitigate catastrophic forgetting. This ensures the model retains its general pretraining knowledge while learning new tasks effectively.

The finetuning dataset, ruslanmv/ai-medical-chatbot, contains only 257k rows, which necessitated careful hyperparameter tuning to avoid over-specialization.


Hyperparameters and explanations

  • Learning rate: 2e-5
    A smaller learning rate reduces the risk of overfitting and catastrophic forgetting, particularly when working with models containing fewer parameters.

  • Warm-up steps: 5
    Warm-up allows the optimizer to gather gradient statistics before training at the full learning rate, improving stability.

  • Per device train batch size: 2
    Each GPU processes 2 training samples per step. This setup is suitable for resource-constrained environments.

  • Gradient accumulation steps: 4
    Gradients are accumulated over 4 steps to simulate a larger batch size (effective batch size: 8) without exceeding memory limits.

  • Optimizer: AdamW with 8-bit Quantization

    • AdamW: Adds weight decay to prevent overfitting.
    • 8-bit Quantization: Reduces memory usage by compressing optimizer states, facilitating faster training.
  • Weight decay: 0.01
    Standard weight decay value effective across various training scenarios.

  • Learning rate scheduler type: Linear
    Gradually decreases the learning rate from the initial value to zero over the course of training.


Quantization details

The model is saved in 16-bit GGUF format, which:

  • Ensures 100% accuracy retention.
  • Trades off speed and memory for improved precision.

Training optimization

Training was accelerated by 2x using Unsloth in combination with Hugging Face's TRL library.


Downloads last month
4
GGUF
Model size
1.24B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using forestav/medical_model 1