LLaMA-2-Econ: Q&A Model

Model Description

A specialized Q&A model fine-tuned from the LLaMA-2-7B-chat model for answering questions related to economic research. It leverages the combined strength of QLoRA and PEFT for enhanced understanding and generation capabilities in the economic domain.

Intended Uses & Limitations

Designed to serve as a virtual research assistant, providing answers to complex questions within the economic research domain. The model's effectiveness is subject to the quality of the input question and the relevance of the training data.

Training and Evaluation Data

The model was trained on a synthetic question-and-answer dataset created from academic paper abstracts in economics, employing a GPT-3.5 Turbo model for generating contextual dialogues.

Training Hyperparameters

QLoRA Settings:
- lora_rank (lora_r): 64
- lora_dropout: 0.1
Precision & Quantization:
- Precision: 4-bit
- Computation dtype: float16
- Quantization type: "nf4", with nested quantization
Training Schedule:
- Epochs: 8, with early stopping patience of 2 epochs for efficiency
- bf16 training enabled
Optimizer & Learning Rate:
- Optimizer: paged AdamW with 32-bit precision
- Learning rate: 2e-4, using a cosine learning rate scheduler
- Warmup ratio: 0.03
Additional Settings:
- Gradient checkpointing and a maximum gradient norm of 0.3
- Sequences grouped by length for training efficiency
- PEFT adapters merged into the baseline models for enhanced performance

Evaluation Results for LLaMA-2-Econ: Q&A Model

The LLaMA-2-Econ Q&A model's performance was evaluated against a set of criteria designed to measure its effectiveness in generating accurate, relevant answers to economics-related questions. We used BERT-Score.

Key Metrics

Precision: Achieved an average precision value of 0.90, indicating a high level of accuracy in the tokens used within the model's answers when compared to the reference answers.
Recall: Recorded an average recall value of 0.89, reflecting the model's ability to capture all relevant information from the reference answers in its generated responses.
F1 Score: Reached an average F1 value of 0.90, demonstrating an excellent balance between precision and recall, thus indicating the model's overall effectiveness in producing comprehensive and accurate answers.

Evaluation Procedure

Reference Answers Generation: A subset of synthetically created questions was used to obtain reference answers from the base LLaMA-2-7B-chat model integrated with a Retrieval Augmented Generation (RAG) pipeline, employing semantic search and dense vector indexing.
Human Verification: The reference answers were subjected to human verification to ensure their relevance and accuracy.
Model Comparison: LLaMA-2-Econ's generated answers, produced without any RAG integration, were compared against these human-verified reference responses.

Citation

Keleş, O. & Bayraklı, Ö. T. (Fortcoming 2024, May). LLaMA-2-Econ: Enhancing Title Generation, Classification, and Academic Q&A in Economic Research. To be presented in LREC-COLING 2024, 4th Workshop on ECONLP: Turin, Italy.

onurkeles
/

llama-2-7b-econ-chat-qa