Spaces:
Runtime error
Runtime error
# Install dependency | |
```bash | |
pip install -r requirement.txt | |
``` | |
# Download the data set | |
```bash | |
export WORKDIR_ROOT=<a directory which will hold all working files> | |
``` | |
The downloaded data will be at $WORKDIR_ROOT/ML50 | |
# preprocess the data | |
Install SPM [here](https://github.com/google/sentencepiece) | |
```bash | |
export WORKDIR_ROOT=<a directory which will hold all working files> | |
export SPM_PATH=<a path pointing to sentencepice spm_encode.py> | |
``` | |
* $WORKDIR_ROOT/ML50/raw: extracted raw data | |
* $WORKDIR_ROOT/ML50/dedup: dedup data | |
* $WORKDIR_ROOT/ML50/clean: data with valid and test sentences removed from the dedup data | |