metadata

license: mit
language:
  - de

G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German

This model developed a contextualized German Science Education BERT (G-SciEdBERT), an innovative large language model tailored for scoring German-written responses to science tasks. Using G-BERT, we pre-trained G-SciEdBERT on a corpus of 50K German written science responses with 5M tokens to the Programme for International Student Assessment (PISA) 2015. We fine-tuned G-SciEdBERT on 59 assessment items and examined the scoring accuracy. We then compared its performance with G-BERT. Our findings reveal a substantial improvement in scoring accuracy with G-SciEdBERT, demonstrating a 10% increase of quadratic weighted kappa compared to G-BERT (mean accuracy difference = 0.096, SD = 0.024). These insights underline the significance of specialized language models like G-SciEdBERT, which is trained to enhance the accuracy of automated scoring, offering a substantial contribution to the field of AI in education.

Dataset

It is a pre-trained German science education BERT for written German science assessments of PISA test. PISA is an international test to monitor education trends led by OECD (Organisation for Economic Co-operation and Development). PISA items are developed to assess scientific literacy, highlighting real-world problem-solving skills and the need of future workforce. This study analyzed data collected for 59 construct response science assessment items in German at the middle school level. A total of 6,116 German students from 257 schools participated in PISA 2015. Given the geographical diversity of participants, PISA data reflect the general German students' science literacy.

The PISA items selected require either short (around one sentence) or extended (up to five sentences) responses. The minimum score for all items is 0, with the maximum being 3 or 4 for short responses and 4 or 5 for extended responses. Student responses have 20 words on average. Our pre-training dataset contains more than 50,000 student-written German responses, which means approximately 1,000 human-scored student responses per item for contextual learning through fine-tuning. More than 10 human raters scored each response in the training dataset organized by OECD. The responses were graded irrespective of the student's ethnicity, race, or gender to ensure fairness.

Usage

With Transformers >= 2.3 our German BERT models can be loaded like this:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("ai4stem-uga/G-SciEdBERT")
model = AutoModel.from_pretrained("ai4stem-uga/G-SciEdBERT")

Acknowledgments

This project is supported by the Alexender von Humboldt Foundation (PI Xiaoming Zhai, xiaoming.zhai@uga.edu).

Citation

    @InProceedings{Latif_2024_G-SciEdBERT,
        author    = {Latif, Ehsan and Lee, Gyeong-Geon and Neuman, Knut and Kastorff, Tamara and Zhai, Xiaoming},
        title     = {G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German},
        journal   = {arXiv preprint arXiv:2301.12031},
        year      = {2024}
        pages     = {1-9}
    }

*This model is trained and shared by Ehsan Latif, Ph.D (ehsan.latif@uga.edu)