ai4stem-uga commited on
Commit
2ceffd5
1 Parent(s): 9313ceb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -3
README.md CHANGED
@@ -3,11 +3,33 @@ license: mit
3
  language:
4
  - de
5
  ---
6
- It is a pre-trained German science education BERT for written German science assessments.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  ## Usage
9
 
10
- With Transformers >= 2.3 our German BERT models can be loaded like:
11
 
12
  ```python
13
  from transformers import AutoModel, AutoTokenizer
@@ -17,4 +39,18 @@ model = AutoModel.from_pretrained("ai4stem-uga/G-SciEdBERT")
17
  ```
18
 
19
  # Acknowledgments
20
- This project is supported by the Alexender von Humboldt Foundation (PI Xiaoming Zhai, xiaoming.zhai@uga.edu).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  language:
4
  - de
5
  ---
6
+ ## G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German
7
+ This model developed a contextualized German Science Education BERT (G-SciEdBERT),
8
+ an innovative large language model tailored for scoring German-written responses to science tasks.
9
+ Using G-BERT, we pre-trained G-SciEdBERT on a corpus of 50K German written science responses with 5M tokens to the Programme for International Student Assessment (PISA) 2015.
10
+ We fine-tuned G-SciEdBERT on 59 assessment items and examined the scoring accuracy. We then compared its performance with G-BERT.
11
+ Our findings reveal a substantial improvement in scoring accuracy with G-SciEdBERT, demonstrating a 10% increase of quadratic weighted kappa compared to G-BERT
12
+ (mean accuracy difference = 0.096, SD = 0.024). These insights underline the significance of specialized language models like G-SciEdBERT,
13
+ which is trained to enhance the accuracy of automated scoring, offering a substantial contribution to the field of AI in education.
14
+
15
+ ## Dataset
16
+ It is a pre-trained German science education BERT for written German science assessments of PISA test.
17
+ PISA is an international test to monitor education trends led by OECD (Organisation for Economic Co-operation and Development).
18
+ PISA items are developed to assess scientific literacy, highlighting real-world problem-solving skills and the need of future workforce.
19
+ This study analyzed data collected for 59 construct response science assessment items in German at the middle school level.
20
+ A total of 6,116 German students from 257 schools participated in PISA 2015.
21
+ Given the geographical diversity of participants, PISA data reflect the general German students' science literacy.
22
+
23
+ The PISA items selected require either short (around one sentence) or extended (up to five sentences) responses.
24
+ The minimum score for all items is 0, with the maximum being 3 or 4 for short responses and 4 or 5 for extended responses.
25
+ Student responses have 20 words on average. Our pre-training dataset contains more than 50,000 student-written German responses,
26
+ which means approximately 1,000 human-scored student responses per item for contextual learning through fine-tuning.
27
+ More than 10 human raters scored each response in the training dataset organized by OECD.
28
+ The responses were graded irrespective of the student's ethnicity, race, or gender to ensure fairness.
29
 
30
  ## Usage
31
 
32
+ With Transformers >= 2.3 our German BERT models can be loaded like this:
33
 
34
  ```python
35
  from transformers import AutoModel, AutoTokenizer
 
39
  ```
40
 
41
  # Acknowledgments
42
+ This project is supported by the Alexender von Humboldt Foundation (PI Xiaoming Zhai, xiaoming.zhai@uga.edu).
43
+
44
+ ## Citation
45
+
46
+ ```bibtex
47
+ @InProceedings{Latif_2024_G-SciEdBERT,
48
+ author = {Latif, Ehsan and Lee, Gyeong-Geon and Neuman, Knut and Kastorff, Tamara and Zhai, Xiaoming},
49
+ title = {G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German},
50
+ journal = {arXiv preprint arXiv:2301.12031},
51
+ year = {2024}
52
+ pages = {1-9}
53
+ }
54
+ ```
55
+
56
+ *This model is trained and shared by Ehsan Latif, Ph.D (ehsan.latif@uga.edu)