phueb
/

BabyBERTa-1

Inference Endpoints

Model card Files Files and versions Community

phueb commited on Nov 6, 2021

Commit

f3f2b02

•

1 Parent(s): df8908b

table

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -10,10 +10,10 @@ The three provided models are randomly selected from 10 that were trained and re
 ## Loading the tokenizer
 BabyBERTa was trained with `add_prefix_space=True`, so it will not work properly with the tokenizer defaults.
-Make sure to load the tokenizer as follows:
 ```python
-tokenizer = RobertaTokenizerFast.from_pretrained("phueb/BabyBERTa",
                                                  add_prefix_space=True)
 ```
@@ -38,11 +38,13 @@ In contrast, because BabyBERTa is not case-sensitive, its performance is not inf
 2. The latest version of Zorro no longer contains ambiguous content words such as "Spanish" which can be both a noun and an adjective.
  this resulted in a small reduction in the performance of BabyBERTa.
 | Model Name                             | Accuracy (holistic scoring)  | Accuracy (MLM-scoring) |
 |----------------------------------------|------------------------------|------------|
 | [BabyBERTa-1][link-BabyBERTa-1]        | 80.3                         | 79.9       |
-| [BabyBERTa-2][link-BabyBERTa-2]        | 80.3                         | 79.9       |
-| [BabyBERTa-3][link-BabyBERTa-3]        | 80.3                         | 79.9       |
@@ -61,6 +63,5 @@ More info can be found [here](https://github.com/phueb/BabyBERTa).
 language:
 - en
 tags:
-- child-directed-language
 - acquisition
 ---

 ## Loading the tokenizer
 BabyBERTa was trained with `add_prefix_space=True`, so it will not work properly with the tokenizer defaults.
+For instance, to load the tokenizer for BabyBERTa-1, load it as follows:
 ```python
+tokenizer = RobertaTokenizerFast.from_pretrained("phueb/BabyBERTa-1",
                                                  add_prefix_space=True)
 ```
 2. The latest version of Zorro no longer contains ambiguous content words such as "Spanish" which can be both a noun and an adjective.
  this resulted in a small reduction in the performance of BabyBERTa.
+Overall Accuracy on Zorro:
 | Model Name                             | Accuracy (holistic scoring)  | Accuracy (MLM-scoring) |
 |----------------------------------------|------------------------------|------------|
 | [BabyBERTa-1][link-BabyBERTa-1]        | 80.3                         | 79.9       |
+| [BabyBERTa-2][link-BabyBERTa-2]        | 78.6                         | 78.2       |
+| [BabyBERTa-3][link-BabyBERTa-3]        | 74.5                         | 78.1       |
 language:
 - en
 tags:
 - acquisition
 ---