Processors ----------------------------------------------------------------------------------------------------------------------- This library includes processors for several traditional tasks. These processors can be used to process a dataset into examples that can be fed to a model. Processors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All processors follow the same architecture which is that of the :class:``. The processor returns a list of :class:``. These :class:`` can be converted to :class:`` in order to be fed to the model. .. autoclass:: :members: .. autoclass:: :members: .. autoclass:: :members: GLUE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `General Language Understanding Evaluation (GLUE) `__ is a benchmark that evaluates the performance of models across a diverse set of existing NLU tasks. It was released together with the paper `GLUE: A multi-task benchmark and analysis platform for natural language understanding `__ This library hosts a total of 10 processors for the following tasks: MRPC, MNLI, MNLI (mismatched), CoLA, SST2, STSB, QQP, QNLI, RTE and WNLI. Those processors are: - :class:`` - :class:`` - :class:`` - :class:`` - :class:`` - :class:`` - :class:`` - :class:`` - :class:`` Additionally, the following method can be used to load values from a data file and convert them to a list of :class:``. .. automethod:: Example usage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An example using these processors is given in the ` `__ script. XNLI ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `The Cross-Lingual NLI Corpus (XNLI) `__ is a benchmark that evaluates the quality of cross-lingual text representations. XNLI is crowd-sourced dataset based on `MultiNLI `: pairs of text are labeled with textual entailment annotations for 15 different languages (including both high-resource language such as English and low-resource languages such as Swahili). It was released together with the paper `XNLI: Evaluating Cross-lingual Sentence Representations `__ This library hosts the processor to load the XNLI data: - :class:`` Please note that since the gold labels are available on the test set, evaluation is performed on the test set. An example using these processors is given in the ` `__ script. SQuAD ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `The Stanford Question Answering Dataset (SQuAD) `__ is a benchmark that evaluates the performance of models on question answering. Two versions are available, v1.1 and v2.0. The first version (v1.1) was released together with the paper `SQuAD: 100,000+ Questions for Machine Comprehension of Text `__. The second version (v2.0) was released alongside the paper `Know What You Don't Know: Unanswerable Questions for SQuAD `__. This library hosts a processor for each of the two versions: Processors ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Those processors are: - :class:`` - :class:`` They both inherit from the abstract class :class:`` .. autoclass:: :members: Additionally, the following method can be used to convert SQuAD examples into :class:`` that can be used as model inputs. .. automethod:: These processors as well as the aforementionned method can be used with files containing the data as well as with the `tensorflow_datasets` package. Examples are given below. Example usage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here is an example using the processors as well as the conversion method using data files: .. code-block:: # Loading a V2 processor processor = SquadV2Processor() examples = processor.get_dev_examples(squad_v2_data_dir) # Loading a V1 processor processor = SquadV1Processor() examples = processor.get_dev_examples(squad_v1_data_dir) features = squad_convert_examples_to_features( examples=examples, tokenizer=tokenizer, max_seq_length=max_seq_length, doc_stride=args.doc_stride, max_query_length=max_query_length, is_training=not evaluate, ) Using `tensorflow_datasets` is as easy as using a data file: .. code-block:: # tensorflow_datasets only handle Squad V1. tfds_examples = tfds.load("squad") examples = SquadV1Processor().get_examples_from_dataset(tfds_examples, evaluate=evaluate) features = squad_convert_examples_to_features( examples=examples, tokenizer=tokenizer, max_seq_length=max_seq_length, doc_stride=args.doc_stride, max_query_length=max_query_length, is_training=not evaluate, ) Another example using these processors is given in the ` `__ script.