Model Card

Model Overview

Base Model: meta-llama/Llama-2-7b-hf
Fine-tuned on: timdettmers/openassistant-guanaco dataset using LoRA (Low-Rank Adaptation) technique.

This model is a fine-tuned version of the Llama-2-7B model, adapted for specific tasks with a focus on balancing training speed and performance. It has been fine-tuned using LoRA for efficient parameter adaptation with reduced resource requirements.

Model Details

Training Hardware: The model was trained on Intel Gaudi infrastructure, which is optimized for high-throughput deep learning tasks.
Precision: Mixed precision (bf16) was used to speed up training while maintaining accuracy.
LoRA Configuration:
- LoRA Rank: 4
- LoRA Alpha: 32
- LoRA Dropout: 0.05
- Target Modules: q_proj, v_proj
Max Sequence Length: 256 tokens
Learning Rate: 2e-4
Warmup Ratio: 0.05
Scheduler Type: Linear
Batch Size: 32 per device
Max Gradient Norm: 0.5
Throughput Warmup Steps: 2

Model Performance

The model was trained with the intent of balancing speed and accuracy, making it suitable for rapid experimentation and deployment in environments where computational efficiency is paramount. Evaluation and saving were done every 500 steps to ensure consistent performance monitoring.

Training Dataset

Dataset Name: timdettmers/openassistant-guanaco
Validation Split: 2%

The dataset used for fine-tuning is designed for conversational AI, allowing the model to generate human-like responses in dialogue settings.

Intended Use

This model is intended for use in natural language understanding and generation tasks, such as:

Conversational AI
Text completion
Dialogue systems

Limitations

Context Length: Limited to 256 tokens, which may impact performance on tasks requiring long context comprehension.
Learning Rate: The model uses a relatively high learning rate, which may lead to instability in certain scenarios.