Spaces:

Dovakiins
/

qwerrwe

Build error

App Files Files Community

ethanhs commited on Sep 27, 2023

Commit

8fe0e63

•

1 Parent(s): d1236f2

Fix bug in dataset loading (#284)

Browse files

* Fix bug in dataset loading

This fixes a bug when loading datasets. `d.data_files` is a list, so it cannot be directly passed to `hf_hub_download`

* Check type of data_files, and load accordingly

Files changed (1) hide show

src/axolotl/utils/data.py +20 -5

src/axolotl/utils/data.py CHANGED Viewed

@@ -205,11 +205,26 @@ def load_tokenized_prepared_datasets(
                     use_auth_token=use_auth_token,
                 )
             else:
-                fp = hf_hub_download(
-                    repo_id=d.path,
-                    repo_type="dataset",
-                    filename=d.data_files,
-                )
                 ds = load_dataset(
                     "json", name=d.name, data_files=fp, streaming=False, split=None
                 )

                     use_auth_token=use_auth_token,
                 )
             else:
+                if isinstance(d.data_files, str):
+                    fp = hf_hub_download(
+                        repo_id=d.path,
+                        repo_type="dataset",
+                        filename=d.data_files,
+                    )
+                elif isinstance(d.data_files, list):
+                    fp = []
+                    for file in d.data_files:
+                        fp.append(
+                            hf_hub_download(
+                                repo_id=d.path,
+                                repo_type="dataset",
+                                filename=file,
+                            )
+                        )
+                else:
+                    raise ValueError(
+                        "data_files must be either a string or list of strings"
+                    )
                 ds = load_dataset(
                     "json", name=d.name, data_files=fp, streaming=False, split=None
                 )