ai4stem-uga
commited on
Commit
•
2ceffd5
1
Parent(s):
9313ceb
Update README.md
Browse files
README.md
CHANGED
@@ -3,11 +3,33 @@ license: mit
|
|
3 |
language:
|
4 |
- de
|
5 |
---
|
6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
## Usage
|
9 |
|
10 |
-
With Transformers >= 2.3 our German BERT models can be loaded like:
|
11 |
|
12 |
```python
|
13 |
from transformers import AutoModel, AutoTokenizer
|
@@ -17,4 +39,18 @@ model = AutoModel.from_pretrained("ai4stem-uga/G-SciEdBERT")
|
|
17 |
```
|
18 |
|
19 |
# Acknowledgments
|
20 |
-
This project is supported by the Alexender von Humboldt Foundation (PI Xiaoming Zhai, xiaoming.zhai@uga.edu).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
language:
|
4 |
- de
|
5 |
---
|
6 |
+
## G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German
|
7 |
+
This model developed a contextualized German Science Education BERT (G-SciEdBERT),
|
8 |
+
an innovative large language model tailored for scoring German-written responses to science tasks.
|
9 |
+
Using G-BERT, we pre-trained G-SciEdBERT on a corpus of 50K German written science responses with 5M tokens to the Programme for International Student Assessment (PISA) 2015.
|
10 |
+
We fine-tuned G-SciEdBERT on 59 assessment items and examined the scoring accuracy. We then compared its performance with G-BERT.
|
11 |
+
Our findings reveal a substantial improvement in scoring accuracy with G-SciEdBERT, demonstrating a 10% increase of quadratic weighted kappa compared to G-BERT
|
12 |
+
(mean accuracy difference = 0.096, SD = 0.024). These insights underline the significance of specialized language models like G-SciEdBERT,
|
13 |
+
which is trained to enhance the accuracy of automated scoring, offering a substantial contribution to the field of AI in education.
|
14 |
+
|
15 |
+
## Dataset
|
16 |
+
It is a pre-trained German science education BERT for written German science assessments of PISA test.
|
17 |
+
PISA is an international test to monitor education trends led by OECD (Organisation for Economic Co-operation and Development).
|
18 |
+
PISA items are developed to assess scientific literacy, highlighting real-world problem-solving skills and the need of future workforce.
|
19 |
+
This study analyzed data collected for 59 construct response science assessment items in German at the middle school level.
|
20 |
+
A total of 6,116 German students from 257 schools participated in PISA 2015.
|
21 |
+
Given the geographical diversity of participants, PISA data reflect the general German students' science literacy.
|
22 |
+
|
23 |
+
The PISA items selected require either short (around one sentence) or extended (up to five sentences) responses.
|
24 |
+
The minimum score for all items is 0, with the maximum being 3 or 4 for short responses and 4 or 5 for extended responses.
|
25 |
+
Student responses have 20 words on average. Our pre-training dataset contains more than 50,000 student-written German responses,
|
26 |
+
which means approximately 1,000 human-scored student responses per item for contextual learning through fine-tuning.
|
27 |
+
More than 10 human raters scored each response in the training dataset organized by OECD.
|
28 |
+
The responses were graded irrespective of the student's ethnicity, race, or gender to ensure fairness.
|
29 |
|
30 |
## Usage
|
31 |
|
32 |
+
With Transformers >= 2.3 our German BERT models can be loaded like this:
|
33 |
|
34 |
```python
|
35 |
from transformers import AutoModel, AutoTokenizer
|
|
|
39 |
```
|
40 |
|
41 |
# Acknowledgments
|
42 |
+
This project is supported by the Alexender von Humboldt Foundation (PI Xiaoming Zhai, xiaoming.zhai@uga.edu).
|
43 |
+
|
44 |
+
## Citation
|
45 |
+
|
46 |
+
```bibtex
|
47 |
+
@InProceedings{Latif_2024_G-SciEdBERT,
|
48 |
+
author = {Latif, Ehsan and Lee, Gyeong-Geon and Neuman, Knut and Kastorff, Tamara and Zhai, Xiaoming},
|
49 |
+
title = {G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German},
|
50 |
+
journal = {arXiv preprint arXiv:2301.12031},
|
51 |
+
year = {2024}
|
52 |
+
pages = {1-9}
|
53 |
+
}
|
54 |
+
```
|
55 |
+
|
56 |
+
*This model is trained and shared by Ehsan Latif, Ph.D (ehsan.latif@uga.edu)
|