Model Card
Model Overview
Base Model: meta-llama/Llama-2-7b-hf
Fine-tuned on: timdettmers/openassistant-guanaco
dataset using LoRA (Low-Rank Adaptation) technique.
This model is a fine-tuned version of the Llama-2-7B model, adapted for specific tasks with a focus on balancing training speed and performance. It has been fine-tuned using LoRA for efficient parameter adaptation with reduced resource requirements.
Model Details
- Training Hardware: The model was trained on Intel Gaudi infrastructure, which is optimized for high-throughput deep learning tasks.
- Precision: Mixed precision (
bf16
) was used to speed up training while maintaining accuracy. - LoRA Configuration:
- LoRA Rank: 4
- LoRA Alpha: 32
- LoRA Dropout: 0.05
- Target Modules:
q_proj
,v_proj
- Max Sequence Length: 256 tokens
- Learning Rate: 2e-4
- Warmup Ratio: 0.05
- Scheduler Type: Linear
- Batch Size: 32 per device
- Max Gradient Norm: 0.5
- Throughput Warmup Steps: 2
Model Performance
The model was trained with the intent of balancing speed and accuracy, making it suitable for rapid experimentation and deployment in environments where computational efficiency is paramount. Evaluation and saving were done every 500 steps to ensure consistent performance monitoring.
Training Dataset
- Dataset Name:
timdettmers/openassistant-guanaco
- Validation Split: 2%
The dataset used for fine-tuning is designed for conversational AI, allowing the model to generate human-like responses in dialogue settings.
Intended Use
This model is intended for use in natural language understanding and generation tasks, such as:
- Conversational AI
- Text completion
- Dialogue systems
Limitations
- Context Length: Limited to 256 tokens, which may impact performance on tasks requiring long context comprehension.
- Learning Rate: The model uses a relatively high learning rate, which may lead to instability in certain scenarios.