ai4stem-uga
/

G-SciEdBERT

Text Classification

Inference Endpoints

Model card Files Files and versions Community

ai4stem-uga commited on Feb 13

Commit

15483fc

•

1 Parent(s): 549fe9a

Results udpated

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -31,6 +31,28 @@ The responses were graded irrespective of the student's ethnicity, race, or gend
 The model is pre-trained on [G-BERT](https://huggingface.co/dbmdz/bert-base-german-uncased?text=Ich+mag+dich.+Ich+liebe+%5BMASK%5D) and the pre-trainig method can be seen as:
 ![architecture](https://huggingface.co/ai4stem-uga/G-SciEdBERT/resolve/main/G-SciEdBERT_architecture.png)
 ## Usage
 With Transformers >= 2.3 our German BERT models can be loaded like this:

 The model is pre-trained on [G-BERT](https://huggingface.co/dbmdz/bert-base-german-uncased?text=Ich+mag+dich.+Ich+liebe+%5BMASK%5D) and the pre-trainig method can be seen as:
 ![architecture](https://huggingface.co/ai4stem-uga/G-SciEdBERT/resolve/main/G-SciEdBERT_architecture.png)
+## Evaluation Results
+The table below compares the outcomes between G-BERT and G-SciEdBERT for randomly picked five PISA assessment items and the average accuracy (QWK)
+reported for all datasets combined. It shows that G-SciEdBERT significantly outperformed G-BERT on automatic scoring of student written responses.
+Based on the QWK values, the percentage differences in accuracy vary from 4.2% to 13.6%, with an average increase of 10.0% in average (from .7136 to .8137).
+Especially for item S268Q02, which saw the largest improvement at 13.6% (from .761 to .852), this improvement is noteworthy.
+These findings demonstrate that G-SciEdBERT is more effective than G-BERT at comprehending and assessing complex science-related writings.
+The results of our analysis strongly support the adoption of G-SciEdBERT for the automatic scoring of German-written science responses in large-scale
+assessments such as PISA, given its superior accuracy over the general-purpose G-BERT model.
+| Item    | Training Samples | Testing Samples | Labels       | G-BERT | G-SciEdBERT |
+|---------|------------------|-----------------|--------------|--------|-------------|
+| S131Q02 | 487              | 122             | 5            | 0.761  | **0.852**   |
+| S131Q04 | 478              | 120             | 5            | 0.683  | **0.725**   |
+| S268Q02 | 446              | 112             | 2            | 0.757  | **0.893**   |
+| S269Q01 | 508              | 127             | 2            | 0.837  | **0.953**   |
+| S269Q03 | 500              | 126             | 4            | 0.702  | **0.802**   |
+| Average | 665.95           | 166.49          | 2-5 (min-max) | 0.7136 | **0.8137**  |
 ## Usage
 With Transformers >= 2.3 our German BERT models can be loaded like this: