hails commited on
Commit
ce788ba
1 Parent(s): c77b412

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -8
README.md CHANGED
@@ -112,22 +112,18 @@ As with all language models, it is hard to predict in advance how FIM-1.3B will
112
 
113
  We evaluate our model on a number of standard NLP datasets to verify that our infilling model performs on par with a comparable autoregressive model.
114
 
115
- We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) developed by EleutherAI.
116
 
 
117
 
118
- Report:
119
- LogiQA, PIQA, SciQ, WSC, Winogrande, ARC_challenge, ARC_easy, lambada
120
- On FIM-1.3B, the comparable autoregressive model,
121
-
122
- | Model | HumanEval-Infilling | arc_easy | arc_challenge | lambada | piqa | sciq | wsc | winogrande |
123
  |-----------------|---------------------|----------|---------------|---------|--------|-------|--------|------------|
124
  | AR-1.3B | 0.0029 | 0.5816 | 0.2465 | 7.03 | 0.7116 | 0.85 | 0.3654 | 0.5651 |
125
- | FIM-1.3B-rotary | 0.0155 | 0.5829 | 0.2457 | 7.08 | 0.7029 | 0.861 | 0.3654 | 0.5390 |
126
  | FIM-1.3B-alibi | 0.0029 | 0.5589 | 0.25 | 7.49 | 0.6926 | 0.856 | 0.3654 | 0.5406 |
127
 
128
 
129
 
130
- We also perform preliminary investigation on code generation and infilling capabilities by testing on HumanEval-Infilling [link to github] [Bavarian et al. 2022]
131
 
132
 
133
 
 
112
 
113
  We evaluate our model on a number of standard NLP datasets to verify that our infilling model performs on par with a comparable autoregressive model.
114
 
115
+ We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library developed by EleutherAI for all evaluations except for HumanEval-infilling, for which we use the code in [https://github.com/openai/human-eval-infilling](https://github.com/openai/human-eval-infilling) to evaluate performance.
116
 
117
+ All 3 models here are trained using the same configuration with differing FIM hyperparameters and/or different positional embeddings. "AR-1.3B" refers to a model trained without FIM and with rotary positional embeddings, "CarperAI/FIM-NeoX-1.3B" refers to this model (trained with a FIM rate of 0.9 in SPM mode according to Bavarian et al. 2022), and "FIM-1.3B-alibi" refers to a model trained with [AliBi](https://arxiv.org/abs/2108.12409) positional embeddings but otherwise the same as this model.
118
 
119
+ | Model | HumanEval-infilling | arc\_easy | arc\_challenge | lambada | piqa | sciq | wsc | winogrande |
 
 
 
 
120
  |-----------------|---------------------|----------|---------------|---------|--------|-------|--------|------------|
121
  | AR-1.3B | 0.0029 | 0.5816 | 0.2465 | 7.03 | 0.7116 | 0.85 | 0.3654 | 0.5651 |
122
+ | CarperAI/FIM-NeoX-1.3B | 0.0155 | 0.5829 | 0.2457 | 7.08 | 0.7029 | 0.861 | 0.3654 | 0.5390 |
123
  | FIM-1.3B-alibi | 0.0029 | 0.5589 | 0.25 | 7.49 | 0.6926 | 0.856 | 0.3654 | 0.5406 |
124
 
125
 
126
 
 
127
 
128
 
129