Papers
arxiv:1905.00537

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Published on May 2, 2019
Authors:
,
,
,
,
,
,

Abstract

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.

Community

Sign up or log in to comment

Models citing this paper 16

Browse 16 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1905.00537 in a Space README.md to link it from this page.

Collections including this paper 1