phueb commited on
Commit
5a30508
1 Parent(s): af50534

initial model version

Browse files
Files changed (2) hide show
  1. README.md +34 -5
  2. config.json +25 -0
README.md CHANGED
@@ -5,6 +5,8 @@
5
  BabyBERTa is a light-weight version of RoBERTa trained on 5M words of American-English child-directed input.
6
  It is intended for language acquisition research, on a single desktop with a single GPU - no high-performance computing infrastructure needed.
7
 
 
 
8
  ## Loading the tokenizer
9
 
10
  BabyBERTa was trained with `add_prefix_space=True`, so it will not work properly with the tokenizer defaults.
@@ -15,12 +17,19 @@ tokenizer = RobertaTokenizerFast.from_pretrained("phueb/BabyBERTa",
15
  add_prefix_space=True)
16
  ```
17
 
 
 
 
 
 
 
 
18
  ### Performance
19
 
20
- The provided model is the best-performing out of 10 that were evaluated on the [Zorro](https://github.com/phueb/Zorro) test suite.
21
- This model was trained for 400K steps, and achieves an overall accuracy of 80.3,
22
- comparable to RoBERTa-base, which achieves an overall accuracy of 82.6 on the latest version of Zorro (as of October, 2021).
23
-
24
  Both values differ slightly from those reported in the paper (Huebner et al., 2020).
25
  There are two reasons for this:
26
  1. Performance of RoBERTa-base is slightly larger because the authors previously lower-cased all words in Zorro before evaluation.
@@ -29,9 +38,29 @@ In contrast, because BabyBERTa is not case-sensitive, its performance is not inf
29
  2. The latest version of Zorro no longer contains ambiguous content words such as "Spanish" which can be both a noun and an adjective.
30
  this resulted in a small reduction in the performance of BabyBERTa.
31
 
 
 
 
 
 
 
 
32
 
33
  ### Additional Information
34
 
35
  This model was trained by [Philip Huebner](https://philhuebner.com), currently at the [UIUC Language and Learning Lab](http://www.learninglanguagelab.org).
36
 
37
- More info can be found [here](https://github.com/phueb/BabyBERTa).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  BabyBERTa is a light-weight version of RoBERTa trained on 5M words of American-English child-directed input.
6
  It is intended for language acquisition research, on a single desktop with a single GPU - no high-performance computing infrastructure needed.
7
 
8
+ The three provided models are randomly selected from 10 that were trained and reported in the paper.
9
+
10
  ## Loading the tokenizer
11
 
12
  BabyBERTa was trained with `add_prefix_space=True`, so it will not work properly with the tokenizer defaults.
 
17
  add_prefix_space=True)
18
  ```
19
 
20
+ ### Hyper-Parameters
21
+
22
+ See the paper for details.
23
+ All provided models were trained for 400K steps with a batch size of 16.
24
+ Importantly, BabyBERTa never predicts unmasked tokens during training - `unmask_prob` is set to zero.
25
+
26
+
27
  ### Performance
28
 
29
+ BabyBerta was developed for learning grammatical knowledge from child-directed input.
30
+ Its grammatical knowledge was evaluated using the [Zorro](https://github.com/phueb/Zorro) test suite.
31
+ The best model achieves an overall accuracy of 80.3,
32
+ comparable to RoBERTa-base, which achieves an overall accuracy of 82.6 on the latest version of Zorro (as of October, 2021).
33
  Both values differ slightly from those reported in the paper (Huebner et al., 2020).
34
  There are two reasons for this:
35
  1. Performance of RoBERTa-base is slightly larger because the authors previously lower-cased all words in Zorro before evaluation.
 
38
  2. The latest version of Zorro no longer contains ambiguous content words such as "Spanish" which can be both a noun and an adjective.
39
  this resulted in a small reduction in the performance of BabyBERTa.
40
 
41
+ | Model Name | Accuracy (holistic scoring) | Accuracy (MLM-scoring) |
42
+ |----------------------------------------|------------------------------|------------|
43
+ | [BabyBERTa-1][link-BabyBERTa-1] | 80.3 | 79.9 |
44
+ | [BabyBERTa-2][link-BabyBERTa-2] | 80.3 | 79.9 |
45
+ | [BabyBERTa-3][link-BabyBERTa-3] | 80.3 | 79.9 |
46
+
47
+
48
 
49
  ### Additional Information
50
 
51
  This model was trained by [Philip Huebner](https://philhuebner.com), currently at the [UIUC Language and Learning Lab](http://www.learninglanguagelab.org).
52
 
53
+ More info can be found [here](https://github.com/phueb/BabyBERTa).
54
+
55
+
56
+ [link-BabyBERTa-1]: https://huggingface.co/phueb/BabyBERTa-1
57
+ [link-BabyBERTa-2]: https://huggingface.co/phueb/BabyBERTa-2
58
+ [link-BabyBERTa-3]: https://huggingface.co/phueb/BabyBERTa-3
59
+
60
+ ---
61
+ language:
62
+ - en
63
+ tags:
64
+ - child-directed-language
65
+ - acquisition
66
+ ---
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 3,
7
+ "eos_token_id": 4,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 256,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1024,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 130,
16
+ "model_type": "roberta",
17
+ "num_attention_heads": 8,
18
+ "num_hidden_layers": 8,
19
+ "pad_token_id": 1,
20
+ "position_embedding_type": "absolute",
21
+ "transformers_version": "4.3.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 8192
25
+ }