Meta-Llama-3.1-8B-Instruct-finki-edu-bnb-4bit Model Card

Overview

Meta-Llama-3.1-8B-Instruct-finki-edu-bnb-4bit is a fine-tuned variant of unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit. Developed by vlada22, this model is optimized for tasks in higher education—specifically for generating, refining, and evaluating exam questions. It is built upon a framework that leverages large language models (LLMs) to ensure exam content aligns with course objectives, maintains clarity, and is appropriately challenging.

Background & Motivation

This model was developed as part of the research detailed in the paper:

Advancing AI in Higher Education: A Comparative Study of Large Language Model-Based Agents for Exam Question Generation, Improvement, and Evaluation
Nikolovski, V., Trajanov, D., & Chorbev, I. (2025). Algorithms, 18(3), 144.
DOI: 10.3390/a18030144

The study proposed a systematic framework for utilizing LLMs in exam-centric tasks, including:

Exam Question Generation: Crafting exam questions aligned with specific learning objectives.
Question Improvement: Enhancing existing questions to boost clarity and adjust difficulty.
Evaluation: Employing a meta-evaluator—supervised by human experts—to assess alignment accuracy and explanation quality.

Research was conducted across four university courses (covering both theory and application) to ensure that the generated content adhered to diverse cognitive levels as defined by Bloom’s taxonomy, with robust analytical techniques (including mixed-effects modeling) used to evaluate performance.

Training & Methodology

Base Model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
Fine-Tuning Framework: The model was fine-tuned using Huggingface's TRL library in conjunction with Unsloth, resulting in a training process that is approximately 2x faster.
Dataset & Techniques: Fine-tuning utilized a balanced dataset that represented various exam question categories and cognitive levels, ensuring robust evaluation and alignment with course objectives.

Intended Use Cases

Exam Question Generation: Automatically generate exam questions that align with specific course objectives.
Question Improvement: Enhance clarity and adjust the difficulty of existing exam questions.
Evaluation Support: Assist educators with preliminary evaluations of exam questions by identifying potential misalignments and areas for improvement.
Educational Assessment: Serve as a foundational tool in educational settings for aligning AI-generated content with defined learning outcomes.

Ethical Considerations & Limitations

Bias & Fairness: Despite rigorous training, the model may still exhibit biases present in the training data. It is crucial to have human oversight, especially for high-stakes applications.
Contextual Specificity: The model is tailored for exam question contexts and may require additional fine-tuning for broader applications.
Complementary Role: Designed to support educators, this model should complement—rather than replace—human expertise in educational assessments.

How to Cite

If you use this model in your research or application, please consider citing the original research paper:

@Article{a18030144,
  AUTHOR = {Nikolovski, Vlatko and Trajanov, Dimitar and Chorbev, Ivan},
  TITLE = {Advancing AI in Higher Education: A Comparative Study of Large Language Model-Based Agents for Exam Question Generation, Improvement, and Evaluation},
  JOURNAL = {Algorithms},
  VOLUME = {18},
  YEAR = {2025},
  NUMBER = {3},
  ARTICLE-NUMBER = {144},
  URL = {https://www.mdpi.com/1999-4893/18/3/144},
  ISSN = {1999-4893},
  ABSTRACT = {The transformative capabilities of large language models (LLMs) are reshaping educational assessment and question design in higher education. This study proposes a systematic framework for leveraging LLMs to enhance question-centric tasks: aligning exam questions with course objectives, improving clarity and difficulty, and generating new items guided by learning goals. The research spans four university courses—two theory-focused and two application-focused—covering diverse cognitive levels according to Bloom’s taxonomy. A balanced dataset ensures representation of question categories and structures. Three LLM-based agents—VectorRAG, VectorGraphRAG, and a fine-tuned LLM—are developed and evaluated against a meta-evaluator, supervised by human experts, to assess alignment accuracy and explanation quality. Robust analytical methods, including mixed-effects modeling, yield actionable insights for integrating generative AI into university assessment processes. Beyond exam-specific applications, this methodology provides a foundational approach for the broader adoption of AI in post-secondary education, emphasizing fairness, contextual relevance, and collaboration. The findings offer a comprehensive framework for aligning AI-generated content with learning objectives, detailing effective integration strategies, and addressing challenges such as bias and contextual limitations. Overall, this work underscores the potential of generative AI to enhance educational assessment while identifying pathways for responsible implementation.},
  DOI = {10.3390/a18030144}
}

Additional Information

Developed by: Vlatko Nikolovski
Affiliation Faculty of Computer Science and Education, "Ss. Cyril and Methodius" University in Skopje, Republic of North Macedonia
License: Apache-2.0
Repository & Resources:

vlada22
/

Meta-Llama-3.1-8B-Instruct-finki-edu-bnb-4bit