Spaces:

xu1998hz
/

sescore

Build error

App Files Files Community

xu1998hz commited on Nov 8, 2022

Commit

0a0a3b8

1 Parent(s): 2a708e1

adding information to the app (#2)

Browse files

- updated with the nice app interface and information (eba40799285317e6931ca68926a30964e85b2f30)

Files changed (2) hide show

app.py +69 -1
description.md +59 -0

app.py CHANGED Viewed

@@ -1,5 +1,73 @@
 import evaluate
-from evaluate.utils import launch_gradio_widget
 module = evaluate.load("xu1998hz/sescore")
 launch_gradio_widget(module)

 import evaluate
+import sys
+from pathlib import Path
+from evaluate.utils import infer_gradio_input_types, json_to_string_type, parse_readme, parse_gradio_data, parse_test_cases
+def launch_gradio_widget(metric):
+    """Launches `metric` widget with Gradio."""
+    try:
+        import gradio as gr
+    except ImportError as error:
+        logger.error("To create a metric widget with Gradio make sure gradio is installed.")
+        raise error
+    local_path = Path(sys.path[0])
+    # if there are several input types, use first as default.
+    if isinstance(metric.features, list):
+        (feature_names, feature_types) = zip(*metric.features[0].items())
+    else:
+        (feature_names, feature_types) = zip(*metric.features.items())
+    gradio_input_types = infer_gradio_input_types(feature_types)
+    def compute(data):
+        return metric.compute(**parse_gradio_data(data, gradio_input_types))
+    header_html = '''<div style="max-width:800px; margin:auto; float:center; margin-top:0; margin-bottom:0; padding:0;">
+            <img src="https://huggingface.co/spaces/xu1998hz/sescore/resolve/main/img/logo_sescore.png" style="margin:0; padding:0; margin-top:-10px; margin-bottom:-50px;">
+        </div>
+        <h2 style='margin-top: 5pt; padding-top:10pt;'>About <i>SEScore</i></h2>
+        <p><b>SEScore</b> is a reference-based text-generation evaluation metric that requires no pre-human-annotated error data,
+        described in our paper <a href="https://arxiv.org/abs/2210.05035"><b>"Not All Errors are Equal: Learning Text Generation Metrics using
+        Stratified Error Synthesis"</b></a> from EMNLP 2022.</p>
+        <p>Its effectiveness over prior methods like BLEU and COMET has been demonstrated on a diverse set of language generation tasks, including
+        translation, captioning, and web text generation. <a href="https://twitter.com/LChoshen/status/1580136005654700033">Readers have even described SEScore as "one unsupervised evaluation to rule them all"</a>
+        and we are very excited to share it with you!</p>
+        <h2 style='margin-top: 10pt; padding-top:0;'>Try it yourself!</h2>
+        <p>Provide sample (gold) reference text and (model output) predicted text below and see how SEScore rates them! It is most performant
+        in a relative ranking setting, so in general <b>it will rank better predictions higher than worse ones.</b> Providing useful
+        absolute numbers based on SEScore is an ongoing direction of investigation.</p>
+    '''.replace('\n',' ')
+    tail_markdown = parse_readme(local_path / "description.md")
+    iface = gr.Interface(
+        fn=compute,
+        inputs=gr.inputs.Dataframe(
+            headers=feature_names,
+            col_count=len(feature_names),
+            row_count=2,
+            datatype=json_to_string_type(gradio_input_types),
+        ),
+        outputs=gr.outputs.Textbox(label=metric.name),
+        description=header_html,
+        #title=f"SEScore Metric Usage Example",
+        article=tail_markdown,
+        # TODO: load test cases and use them to populate examples
+        # examples=[parse_test_cases(test_cases, feature_names, gradio_input_types)]
+    )
+    print(dir(iface))
+    iface.launch()
 module = evaluate.load("xu1998hz/sescore")
 launch_gradio_widget(module)

description.md ADDED Viewed

	@@ -0,0 +1,59 @@

+## Installation and usage
+```bash
+pip install -r requirements.txt
+```
+Minimal example (evaluating English text generation)
+```python
+import evaluate
+sescore = evaluate.load("xu1998hz/sescore")
+score = sescore.compute(
+    references=['sescore is a simple but effective next-generation text evaluation metric'],
+    predictions=['sescore is simple effective text evaluation metric for next generation']
+)
+```
+*SEScore* compares a list of references (gold translation/generated output examples) with a same-length list of candidate generated samples. Currently, the output range is learned and scores are most useful in relative ranking scenarios rather than absolute comparisons. We are producing a series of rescaling options to make absolute SEScore-based scaling more effective.
+### Available pre-trained models
+Currently, the following language/model pairs are available:
+| Language | pretrained data | pretrained model link |
+|----------|-----------------|-----------------------|
+| English  | MT              | [xu1998hz/sescore_english_mt](https://huggingface.co/xu1998hz/sescore_english_mt) |
+| German   | MT              | [xu1998hz/sescore_german_mt](https://huggingface.co/xu1998hz/sescore_german_mt) |
+| English  | webNLG17        | [xu1998hz/sescore_english_webnlg17](https://huggingface.co/xu1998hz/sescore_english_webnlg17) |
+| English  | CoCo captions   | [xu1998hz/sescore_english_coco](https://huggingface.co/xu1998hz/sescore_english_coco) |
+Please contact repo maintainer Wenda Xu to add your models!
+## Limitations
+*SEScore* is trained on synthetic data in-domain.
+Although this data is generated to simulate user-relevant errors like deletion and spurious insertion, it may be limited in its ability to simulate humanlike errors.
+Model applicability is domain-specific (e.g., CoCo caption-trained model will be better for captioning than MT-trained).
+We are in the process of producing and benchmarking general language-level *SEScore* variants.
+## Citation
+If you find our work useful, please cite the following:
+```bibtex
+@inproceedings{xu-etal-2022-not,
+  title={Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis},
+  author={Xu, Wenda and Tuan, Yi-lin and Lu, Yujie and Saxon, Michael and Li, Lei and Wang, William Yang},
+  booktitle ={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},
+  month={dec},
+  year={2022},
+  url={https://arxiv.org/abs/2210.05035}
+}
+```
+## Acknowledgements
+The work of the [COMET](https://github.com/Unbabel/COMET) maintainers at [Unbabel](https://duckduckgo.com/?t=ffab&q=unbabel&ia=web) has been instrumental in producing SEScore.