datasets
This directory contains custom HuggingFace dataset loading scripts. They are provided to maintain backward compatibility with the ad-hoc data downloaders in earlier versions of the lm-evaluation-harness
before HuggingFace datasets
was adopted as the default downloading manager. For example, some instances in the HuggingFace datasets
repository process features (e.g. whitespace stripping, lower-casing, etc.) in ways that the lm-evaluation-harness
did not.
NOTE: We are not accepting any additional loading scripts into the main branch! If you'd like to use a custom dataset, fork the repo and follow HuggingFace's loading script guide found here. You can then override your Task
's DATASET_PATH
attribute to point to this script's local path.
WARNING: A handful of loading scripts are included in this collection because they have not yet been pushed to the Huggingface Hub or a HuggingFace organization repo. We will remove such scripts once pushed.