HuggingFace Datasets
=======================================

Datasets and evaluation metrics for natural language processing

Compatible with NumPy, Pandas, PyTorch and TensorFlow

🤗datasets is a lightweight and extensible library to easily share and access datasets and evaluation metrics for Natural Language Processing (NLP).

🤗Datasets has many interesting features (beside easy sharing and accessing datasets/metrics):

Built-in interoperability with Numpy, Pandas, PyTorch and Tensorflow 2
Lightweight and fast with a transparent and pythonic API
Strive on large datasets: 🤗Datasets naturally frees the user from RAM memory limitation, all datasets are memory-mapped on drive by default.
Smart caching: never wait for your data to process several times
🤗Datasets currently provides access to ~100 NLP datasets and ~10 evaluation metrics and is designed to let the community easily add and share new datasets and evaluation metrics. You can browse the full set of datasets with the live 🤗Datasets viewer.

🤗Datasets originated from a fork of the awesome TensorFlow Datasets and the HuggingFace team want to deeply thank the TensorFlow Datasets team for building this amazing library. More details on the differences between 🤗Datasets and tfds can be found in the section Main differences between 🤗Datasets and tfds.

Contents
---------------------------------

The documentation is organized in five parts:

- **GET STARTED** contains a quick tour and the installation instructions.
- **USING DATASETS** contains general tutorials on how to use and contribute to the datasets in the library.
- **USING METRICS** contains general tutorials on how to use and contribute to the metrics in the library.
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a part of the library.
- **PACKAGE REFERENCE** contains the documentation of each public class and function.

.. toctree::
    :maxdepth: 2
    :caption: Get started

    quicktour
    installation

.. toctree::
    :maxdepth: 2
    :caption: Using datasets

    loading_datasets
    exploring
    processing
    torch_tensorflow
    filesystems
    faiss_and_ea

.. toctree::
    :maxdepth: 2
    :caption: Using metrics

    loading_metrics
    using_metrics

.. toctree::
    :maxdepth: 2
    :caption: Adding new datasets/metrics

    add_dataset
    share_dataset
    add_metric

.. toctree::
    :maxdepth: 2
    :caption: Advanced guides

    features
    splits
    beam_dataset

.. toctree::
    :maxdepth: 2
    :caption: Package reference

    package_reference/loading_methods
    package_reference/main_classes
    package_reference/builder_classes
    package_reference/logging_methods