pdf-to-page-images-dataset / dataset_card_template.py
davanstrien's picture
davanstrien HF staff
create card template
662b961
DATASET_CARD_TEMPLATE = """
# Dataset Card for {hf_repo}
## Dataset Description
This dataset contains images converted from PDFs using the PDFs to Page Images Converter Space.
- **Number of images:** {num_images}
- **Number of PDFs processed:** {num_pdfs}
- **Sample size per PDF:** {sample_size}
- **Created on:** {creation_date}
## Dataset Creation
### Source Data
The images in this dataset were generated from user-uploaded PDF files.
### Processing Steps
1. PDF files were uploaded to the PDFs to Page Images Converter.
2. Each PDF was processed, converting selected pages to images.
3. The resulting images were saved and uploaded to this dataset.
## Dataset Structure
The dataset consists of JPEG images, each representing a single page from the source PDFs.
### Data Fields
- `images/`: A folder containing all the converted images.
### Data Splits
This dataset does not have specific splits.
## Additional Information
- **Contributions:** Thanks to the PDFs to Page Images Converter for creating this dataset.
"""