Spaces:

cpllab
/

syntaxgym

Running

App Files Files Community

jgauthier commited on Jul 7, 2022

Commit

092c6b1

•

1 Parent(s): 27bb1ab

partial readme draft

Browse files

Files changed (1) hide show

README.md +112 -3

README.md CHANGED Viewed

@@ -1,12 +1,121 @@
 ---
-title: Syntaxgym
-emoji: 🐨
 colorFrom: pink
 colorTo: yellow
 sdk: gradio
 sdk_version: 3.0.13
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: SyntaxGym
+emoji: 🏋️
 colorFrom: pink
 colorTo: yellow
 sdk: gradio
 sdk_version: 3.0.13
 app_file: app.py
 pinned: false
+tags:
+- evaluate
+- metric
+description: >-
+  Evaluates Huggingface models on SyntaxGym datasets (targeted syntactic evaluations).
 ---
+# Metric Card for SyntaxGym
+## Metric Description
+[SyntaxGym][syntaxgym] is a framework for targeted syntactic evaluation of language models. This metric can be combined with the [SyntaxGym dataset][syntaxgym-dataset] to evaluate the syntactic capacities of any Huggingface causal language model.
+## How to Use
+The metric takes a SyntaxGym test suite as input, as well as the name of the model that should be evaluated:
+```python
+import datasets
+import evaluate
+dataset = datasets.load_dataset("cpllab/syntaxgym", "subordination_src-src")
+metric = evaluate.load("cpllab/syntaxgym")
+result = metric.compute(suite=dataset["test"], model_id="gpt2")
+# Compute suite accuracy. Mean success over items, where "success" is the conjunction
+# of all boolean prediction results.
+suite_accuracy = result["prediction_results"].all(axis=1).mean(axis=0)
+```
+### Run the entire SyntaxGym dataset
+TODO
+### Inputs
+- **suite** (`Dataset`): SyntaxGym test suite, represented as a Huggingface dataset. See the [dataset reference][syntaxgym-dataset].
+- **model_id** (str): Model used to calculate probabilities of each word. (This is only well defined for causal language models. This includes models such as `gpt2`, causal variations of BERT, causal versions of T5, and more. The full list can be found in the [`AutoModelForCausalLM` documentation][causal].)
+- **batch_size** (int): Maximum batch size for computations
+- **add_start_token** (bool): whether to add the start token to each sentence. Defaults to `True`.
+- **device** (str): device to run on, defaults to `cuda` when available
+### Output Values
+The metric returns a dict with two entries:
+- **prediction_results** (`List[List[bool]]`): For each item in the test suite, a list of booleans indicating whether each corresponding prediction came out `True`. Typically these are combined to yield an accuracy score (see example usage above).
+- **region_totals** (`List[Dict[Tuple[str, int], float]`): For each item, a mapping from individual region (keys `(<condition_name>, <region_number>)`) to the float-valued total surprisal for tokens in this region. This is useful for visualization, or if you'd like to use the aggregate surprisal data for other tasks (e.g. reading time prediction or neural activity prediction).
+```python
+>>> print(result["prediction_results"][0])
+[True]
+>>> print(result["region_totals"][0])
+{('sub_no-matrix', 1): 14.905603408813477,
+ ('sub_no-matrix', 2): 39.063140869140625,
+ ('sub_no-matrix', 3): 26.862628936767578,
+ ('sub_no-matrix', 4): 50.56561279296875,
+ ('sub_no-matrix', 5): 7.470069408416748,
+ ('no-sub_no-matrix', 1): 13.15120792388916,
+ ('no-sub_no-matrix', 2): 38.50318908691406,
+ ('no-sub_no-matrix', 3): 27.623855590820312,
+ ('no-sub_no-matrix', 4): 48.8316535949707,
+ ('no-sub_no-matrix', 5): 1.8095952272415161,
+ ('sub_matrix', 1): 14.905603408813477,
+ ('sub_matrix', 2): 39.063140869140625,
+ ('sub_matrix', 3): 26.862628936767578,
+ ('sub_matrix', 4): 50.56561279296875,
+ ('sub_matrix', 5): 26.532146453857422,
+ ('no-sub_matrix', 1): 13.15120792388916,
+ ('no-sub_matrix', 2): 38.50318908691406,
+ ('no-sub_matrix', 3): 27.623855590820312,
+ ('no-sub_matrix', 4): 48.8316535949707,
+ ('no-sub_matrix', 5): 38.085227966308594}
+ ```
+ ## Limitations and Bias
+ TODO
+ ## Citation
+ If you use this metric in your research, please cite:
+```bibtex
+@inproceedings{gauthier-etal-2020-syntaxgym,
+	title = "{S}yntax{G}ym: An Online Platform for Targeted Evaluation of Language Models",
+	author = "Gauthier, Jon and Hu, Jennifer and Wilcox, Ethan and Qian, Peng and Levy, Roger",
+	booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
+	month = jul,
+	year = "2020",
+	address = "Online",
+	publisher = "Association for Computational Linguistics",
+	url = "https://www.aclweb.org/anthology/2020.acl-demos.10",
+	pages = "70--76",
+	abstract = "Targeted syntactic evaluations have yielded insights into the generalizations learned by neural network language models. However, this line of research requires an uncommon confluence of skills: both the theoretical knowledge needed to design controlled psycholinguistic experiments, and the technical proficiency needed to train and deploy large-scale language models. We present SyntaxGym, an online platform designed to make targeted evaluations accessible to both experts in NLP and linguistics, reproducible across computing environments, and standardized following the norms of psycholinguistic experimental design. This paper releases two tools of independent value for the computational linguistics community: 1. A website, syntaxgym.org, which centralizes the process of targeted syntactic evaluation and provides easy tools for analysis and visualization; 2. Two command-line tools, {`}syntaxgym{`} and {`}lm-zoo{`}, which allow any user to reproduce targeted syntactic evaluations and general language model inference on their own machine.",
+}
+```
+ If you use the [SyntaxGym dataset][syntaxgym-dataset] in your research, please cite:
+ ```bibtex
+@inproceedings{Hu:et-al:2020,
+  author = {Hu, Jennifer and Gauthier, Jon and Qian, Peng and Wilcox, Ethan and Levy, Roger},
+  title = {A systematic assessment of syntactic generalization in neural language models},
+  booktitle = {Proceedings of the Association of Computational Linguistics},
+  year = {2020}
+}
+ ```
+[syntaxgym]: https://syntaxgym.org
+[syntaxgym-dataset]: https://huggingface.co/datasets/cpllab/syntaxgym
+[causal]: https://huggingface.co/docs/transformers/master/en/model_doc/auto#transformers.AutoModelForCausalLM