how to fix DatasetGenerationError?

#92
by speiqin - opened
ds = load_dataset("bigcode/starcoderdata")

There is a DatasetGenerationError: An error occurred while generating the dataset
More details is:

ValueError: Couldn't cast
max_stars_repo_path: string
max_stars_repo_name: string
max_stars_count: int64
id: string
content: string
-- schema metadata --
huggingface: '{"info": {"features": {"max_stars_repo_path": {"dtype": "st' + 241
to
{'id': Value(dtype='string', id=None), 'content': Value(dtype='string', id=None)}
because column names don't match

How can I fix it?

BigCode org

As specified in the dataset's readme you can't load it once because some subsets have different columns (you need to load programming languages separately from these subsets)
image.png

loubnabnl changed discussion status to closed

Sign up or log in to comment