lvwerra HF staff commited on
Commit
e42b19b
·
1 Parent(s): 11b5190

Update Space (evaluate main: 828c6327)

Browse files
Files changed (4) hide show
  1. README.md +97 -4
  2. app.py +6 -0
  3. indic_glue.py +173 -0
  4. requirements.txt +5 -0
README.md CHANGED
@@ -1,12 +1,105 @@
1
  ---
2
- title: Indic_glue
3
- emoji: 💩
4
  colorFrom: blue
5
- colorTo: indigo
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: IndicGLUE
3
+ emoji: 🤗
4
  colorFrom: blue
5
+ colorTo: red
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
  ---
14
 
15
+ # Metric Card for IndicGLUE
16
+
17
+ ## Metric description
18
+ This metric is used to compute the evaluation metric for the [IndicGLUE dataset](https://huggingface.co/datasets/indic_glue).
19
+
20
+ IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide variety of tasks and covers 11 major Indian languages - Assamese (`as`), Bengali (`bn`), Gujarati (`gu`), Hindi (`hi`), Kannada (`kn`), Malayalam (`ml`), Marathi(`mr`), Oriya(`or`), Panjabi (`pa`), Tamil(`ta`) and Telugu (`te`).
21
+
22
+ ## How to use
23
+
24
+ There are two steps: (1) loading the IndicGLUE metric relevant to the subset of the dataset being used for evaluation; and (2) calculating the metric.
25
+
26
+ 1. **Loading the relevant IndicGLUE metric** : the subsets of IndicGLUE are the following: `wnli`, `copa`, `sna`, `csqa`, `wstp`, `inltkh`, `bbca`, `cvit-mkb-clsr`, `iitp-mr`, `iitp-pr`, `actsa-sc`, `md`, and`wiki-ner`.
27
+
28
+ More information about the different subsets of the Indic GLUE dataset can be found on the [IndicGLUE dataset page](https://indicnlp.ai4bharat.org/indic-glue/).
29
+
30
+ 2. **Calculating the metric**: the metric takes two inputs : one list with the predictions of the model to score and one lists of references for each translation for all subsets of the dataset except for `cvit-mkb-clsr`, where each prediction and reference is a vector of floats.
31
+
32
+ ```python
33
+ indic_glue_metric = evaluate.load('indic_glue', 'wnli')
34
+ references = [0, 1]
35
+ predictions = [0, 1]
36
+ results = indic_glue_metric.compute(predictions=predictions, references=references)
37
+ ```
38
+
39
+ ## Output values
40
+
41
+ The output of the metric depends on the IndicGLUE subset chosen, consisting of a dictionary that contains one or several of the following metrics:
42
+
43
+ `accuracy`: the proportion of correct predictions among the total number of cases processed, with a range between 0 and 1 (see [accuracy](https://huggingface.co/metrics/accuracy) for more information).
44
+
45
+ `f1`: the harmonic mean of the precision and recall (see [F1 score](https://huggingface.co/metrics/f1) for more information). Its range is 0-1 -- its lowest possible value is 0, if either the precision or the recall is 0, and its highest possible value is 1.0, which means perfect precision and recall.
46
+
47
+ `precision@10`: the fraction of the true examples among the top 10 predicted examples, with a range between 0 and 1 (see [precision](https://huggingface.co/metrics/precision) for more information).
48
+
49
+ The `cvit-mkb-clsr` subset returns `precision@10`, the `wiki-ner` subset returns `accuracy` and `f1`, and all other subsets of Indic GLUE return only accuracy.
50
+
51
+ ### Values from popular papers
52
+
53
+ The [original IndicGlue paper](https://aclanthology.org/2020.findings-emnlp.445.pdf) reported an average accuracy of 0.766 on the dataset, which varies depending on the subset selected.
54
+
55
+ ## Examples
56
+
57
+ Maximal values for the WNLI subset (which outputs `accuracy`):
58
+
59
+ ```python
60
+ indic_glue_metric = evaluate.load('indic_glue', 'wnli')
61
+ references = [0, 1]
62
+ predictions = [0, 1]
63
+ results = indic_glue_metric.compute(predictions=predictions, references=references)
64
+ print(results)
65
+ {'accuracy': 1.0}
66
+ ```
67
+
68
+ Minimal values for the Wiki-NER subset (which outputs `accuracy` and `f1`):
69
+
70
+ ```python
71
+ >>> indic_glue_metric = evaluate.load('indic_glue', 'wiki-ner')
72
+ >>> references = [0, 1]
73
+ >>> predictions = [1,0]
74
+ >>> results = indic_glue_metric.compute(predictions=predictions, references=references)
75
+ >>> print(results)
76
+ {'accuracy': 1.0, 'f1': 1.0}
77
+ ```
78
+
79
+ Partial match for the CVIT-Mann Ki Baat subset (which outputs `precision@10`)
80
+
81
+ ```python
82
+ >>> indic_glue_metric = evaluate.load('indic_glue', 'cvit-mkb-clsr')
83
+ >>> references = [[0.5, 0.5, 0.5], [0.1, 0.2, 0.3]]
84
+ >>> predictions = [[0.5, 0.5, 0.5], [0.1, 0.2, 0.3]]
85
+ >>> results = indic_glue_metric.compute(predictions=predictions, references=references)
86
+ >>> print(results)
87
+ {'precision@10': 1.0}
88
+ ```
89
+
90
+ ## Limitations and bias
91
+ This metric works only with datasets that have the same format as the [IndicGLUE dataset](https://huggingface.co/datasets/glue).
92
+
93
+ ## Citation
94
+
95
+ ```bibtex
96
+ @inproceedings{kakwani2020indicnlpsuite,
97
+ title={{IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages}},
98
+ author={Divyanshu Kakwani and Anoop Kunchukuttan and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
99
+ year={2020},
100
+ booktitle={Findings of EMNLP},
101
+ }
102
+ ```
103
+
104
+ ## Further References
105
+ - [IndicNLP website](https://indicnlp.ai4bharat.org/home/)
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("indic_glue")
6
+ launch_gradio_widget(module)
indic_glue.py ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2020 The HuggingFace Evaluate Authors.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """ IndicGLUE benchmark metric. """
15
+
16
+ import datasets
17
+ import numpy as np
18
+ from scipy.spatial.distance import cdist
19
+ from scipy.stats import pearsonr, spearmanr
20
+ from sklearn.metrics import f1_score
21
+
22
+ import evaluate
23
+
24
+
25
+ _CITATION = """\
26
+ @inproceedings{kakwani2020indicnlpsuite,
27
+ title={{IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages}},
28
+ author={Divyanshu Kakwani and Anoop Kunchukuttan and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
29
+ year={2020},
30
+ booktitle={Findings of EMNLP},
31
+ }
32
+ """
33
+
34
+ _DESCRIPTION = """\
35
+ IndicGLUE is a natural language understanding benchmark for Indian languages. It contains a wide
36
+ variety of tasks and covers 11 major Indian languages - as, bn, gu, hi, kn, ml, mr, or, pa, ta, te.
37
+ """
38
+
39
+ _KWARGS_DESCRIPTION = """
40
+ Compute IndicGLUE evaluation metric associated to each IndicGLUE dataset.
41
+ Args:
42
+ predictions: list of predictions to score (as int64),
43
+ except for 'cvit-mkb-clsr' where each prediction is a vector (of float32).
44
+ references: list of ground truth labels corresponding to the predictions (as int64),
45
+ except for 'cvit-mkb-clsr' where each reference is a vector (of float32).
46
+ Returns: depending on the IndicGLUE subset, one or several of:
47
+ "accuracy": Accuracy
48
+ "f1": F1 score
49
+ "precision": Precision@10
50
+ Examples:
51
+
52
+ >>> indic_glue_metric = evaluate.load('indic_glue', 'wnli') # 'wnli' or any of ["copa", "sna", "csqa", "wstp", "inltkh", "bbca", "iitp-mr", "iitp-pr", "actsa-sc", "md"]
53
+ >>> references = [0, 1]
54
+ >>> predictions = [0, 1]
55
+ >>> results = indic_glue_metric.compute(predictions=predictions, references=references)
56
+ >>> print(results)
57
+ {'accuracy': 1.0}
58
+
59
+ >>> indic_glue_metric = evaluate.load('indic_glue', 'wiki-ner')
60
+ >>> references = [0, 1]
61
+ >>> predictions = [0, 1]
62
+ >>> results = indic_glue_metric.compute(predictions=predictions, references=references)
63
+ >>> print(results)
64
+ {'accuracy': 1.0, 'f1': 1.0}
65
+
66
+ >>> indic_glue_metric = evaluate.load('indic_glue', 'cvit-mkb-clsr')
67
+ >>> references = [[0.5, 0.5, 0.5], [0.1, 0.2, 0.3]]
68
+ >>> predictions = [[0.5, 0.5, 0.5], [0.1, 0.2, 0.3]]
69
+ >>> results = indic_glue_metric.compute(predictions=predictions, references=references)
70
+ >>> print(results)
71
+ {'precision@10': 1.0}
72
+
73
+ """
74
+
75
+
76
+ def simple_accuracy(preds, labels):
77
+ return float((preds == labels).mean())
78
+
79
+
80
+ def acc_and_f1(preds, labels):
81
+ acc = simple_accuracy(preds, labels)
82
+ f1 = float(f1_score(y_true=labels, y_pred=preds))
83
+ return {
84
+ "accuracy": acc,
85
+ "f1": f1,
86
+ }
87
+
88
+
89
+ def precision_at_10(en_sentvecs, in_sentvecs):
90
+ en_sentvecs = np.array(en_sentvecs)
91
+ in_sentvecs = np.array(in_sentvecs)
92
+ n = en_sentvecs.shape[0]
93
+
94
+ # mean centering
95
+ en_sentvecs = en_sentvecs - np.mean(en_sentvecs, axis=0)
96
+ in_sentvecs = in_sentvecs - np.mean(in_sentvecs, axis=0)
97
+
98
+ sim = cdist(en_sentvecs, in_sentvecs, "cosine")
99
+ actual = np.array(range(n))
100
+ preds = sim.argsort(axis=1)[:, :10]
101
+ matches = np.any(preds == actual[:, None], axis=1)
102
+ return float(matches.mean())
103
+
104
+
105
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
106
+ class IndicGlue(evaluate.EvaluationModule):
107
+ def _info(self):
108
+ if self.config_name not in [
109
+ "wnli",
110
+ "copa",
111
+ "sna",
112
+ "csqa",
113
+ "wstp",
114
+ "inltkh",
115
+ "bbca",
116
+ "cvit-mkb-clsr",
117
+ "iitp-mr",
118
+ "iitp-pr",
119
+ "actsa-sc",
120
+ "md",
121
+ "wiki-ner",
122
+ ]:
123
+ raise KeyError(
124
+ "You should supply a configuration name selected in "
125
+ '["wnli", "copa", "sna", "csqa", "wstp", "inltkh", "bbca", '
126
+ '"cvit-mkb-clsr", "iitp-mr", "iitp-pr", "actsa-sc", "md", '
127
+ '"wiki-ner"]'
128
+ )
129
+ return evaluate.EvaluationModuleInfo(
130
+ description=_DESCRIPTION,
131
+ citation=_CITATION,
132
+ inputs_description=_KWARGS_DESCRIPTION,
133
+ features=datasets.Features(
134
+ {
135
+ "predictions": datasets.Value("int64")
136
+ if self.config_name != "cvit-mkb-clsr"
137
+ else datasets.Sequence(datasets.Value("float32")),
138
+ "references": datasets.Value("int64")
139
+ if self.config_name != "cvit-mkb-clsr"
140
+ else datasets.Sequence(datasets.Value("float32")),
141
+ }
142
+ ),
143
+ codebase_urls=[],
144
+ reference_urls=[],
145
+ format="numpy" if self.config_name != "cvit-mkb-clsr" else None,
146
+ )
147
+
148
+ def _compute(self, predictions, references):
149
+ if self.config_name == "cvit-mkb-clsr":
150
+ return {"precision@10": precision_at_10(predictions, references)}
151
+ elif self.config_name in ["wiki-ner"]:
152
+ return acc_and_f1(predictions, references)
153
+ elif self.config_name in [
154
+ "wnli",
155
+ "copa",
156
+ "sna",
157
+ "csqa",
158
+ "wstp",
159
+ "inltkh",
160
+ "bbca",
161
+ "iitp-mr",
162
+ "iitp-pr",
163
+ "actsa-sc",
164
+ "md",
165
+ ]:
166
+ return {"accuracy": simple_accuracy(predictions, references)}
167
+ else:
168
+ raise KeyError(
169
+ "You should supply a configuration name selected in "
170
+ '["wnli", "copa", "sna", "csqa", "wstp", "inltkh", "bbca", '
171
+ '"cvit-mkb-clsr", "iitp-mr", "iitp-pr", "actsa-sc", "md", '
172
+ '"wiki-ner"]'
173
+ )
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # TODO: fix github to release
2
+ git+https://github.com/huggingface/evaluate.git@b6e6ed7f3e6844b297bff1b43a1b4be0709b9671
3
+ datasets~=2.0
4
+ scipy
5
+ sklearn