sft-data / README.md
yuslzp's picture
Upload README.md
878a78f verified
|
raw
history blame
2.55 kB

Instruction for downloading data from the sft-data repository.

First, you would want to log in and access the huggingface data through using

from huggingface_hub import login
login()

Then, you could either download the zip file of the all the sft data folders, which would look like

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="LEVI-Project/sft-data", filename="sft-data.zip")

Notice that the sft-data.zip file above has the following structure:

sft-data
β”œβ”€β”€ README.md            # This README file.
β”œβ”€β”€ alf                  # Folder for ALFWORLD.
β”‚   β”œβ”€β”€ alfworld.json    # The JSON file for ALFWORLD.
β”‚   └── alf_data_folder  # Folder for the ALFWORLD environment.
β”‚       β”œβ”€β”€ alf_image_id_0  # Folder 0 for ALFWORLD image data.
β”‚       β”œβ”€β”€ alf_image_id_1  # Folder 1 for ALFWORLD image data.
β”‚       β”œβ”€β”€ alf_image_id_2  # Folder 2 for ALFWORLD image data.
β”‚       β”œβ”€β”€ alf_image_id_3  # Folder 3 for ALFWORLD image data.
β”‚       └── alf_image_id_4  # Folder 4 for ALFWORLD image data.
β”œβ”€β”€ blackjack            # Folder for blackjack environment in the `gym_cards`.
β”‚   β”œβ”€β”€ blackjack_data_folder  # Folder for blackjack image data.
β”‚   └── blackjack.json         # The JSON file for blackjack.
β”œβ”€β”€ ezpoints             # Folder for ezpoints environment in the `gym_cards`.
β”‚   β”œβ”€β”€ ezpoints_data_folder  # Folder for ezpoints image data.
β”‚   └── ezpoints.json         # The JSON file for ezpoints.
β”œβ”€β”€ points24             # Folder for points24 environment in the `gym_cards`.
β”‚   β”œβ”€β”€ points24_data_folder  # Folder for points24 image data.
β”‚   └── points24.json         # The JSON file for points24.
└── numberline           # Folder for numberline environment in the `gym_cards`.
    β”œβ”€β”€ numberline_data_folder  # Folder for numberline image data.
    └── numberline.json         # The JSON file for numberline.

Also, you could choose to download the files for any environment out of the five ones. For example, you should be using the following code for downloading data from blackjack.

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="LEVI-Project/sft-data", filename="blackjack.zip") # zip folder for image data folder
hf_hub_download(repo_id="LEVI-Project/sft-data", filename="blackjack.json") # JSON file 

For ALFWORLD, notice that the zip file for the image data folder is alf_data_folder.zip.