jgauthier commited on
Commit
ea234b8
1 Parent(s): e00b8f2

document accuracy property

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -105,9 +105,10 @@ overall_accuracy = np.mean(list(suite_accuracies.values()))
105
 
106
  ### Output Values
107
 
108
- The metric returns a dict of `SyntaxGymMetricSuiteResult` tuples, mapping test suite names to test suite performance. Each inner dict has two entries:
109
 
110
- - **prediction_results** (`List[List[bool]]`): For each item in the test suite, a list of booleans indicating whether each corresponding prediction came out `True`. Typically these are combined to yield an accuracy score (see example usage above).
 
111
  - **region_totals** (`List[Dict[Tuple[str, int], float]`): For each item, a mapping from individual region (keys `(<condition_name>, <region_number>)`) to the float-valued total surprisal for tokens in this region. This is useful for visualization, or if you'd like to use the aggregate surprisal data for other tasks (e.g. reading time prediction or neural activity prediction).
112
 
113
  ```python
105
 
106
  ### Output Values
107
 
108
+ The metric returns a dict of `SyntaxGymMetricSuiteResult` objects, mapping test suite names to test suite performance. Each inner object has three properties:
109
 
110
+ - **accuracy** (`float`): Model accuracy on this suite. This is the accuracy of the conjunction of all boolean predictions per item in the suite.
111
+ - **prediction_results** (`List[List[bool]]`): For each item in the test suite, a list of booleans indicating whether each corresponding prediction came out `True`. Typically these are combined to yield an accuracy score (but you can simply use the `accuracy` property).
112
  - **region_totals** (`List[Dict[Tuple[str, int], float]`): For each item, a mapping from individual region (keys `(<condition_name>, <region_number>)`) to the float-valued total surprisal for tokens in this region. This is useful for visualization, or if you'd like to use the aggregate surprisal data for other tasks (e.g. reading time prediction or neural activity prediction).
113
 
114
  ```python