Spaces:
Runtime error
Runtime error
File size: 1,816 Bytes
22eec52 3ee750e 22eec52 3ee750e 22eec52 3ee750e f92f684 3ee750e f92f684 3ee750e f92f684 3ee750e f92f684 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
title: camelot-pg
app_file: src/app/run.py
sdk: gradio
sdk_version: 4.32.2
---
# PDF Table Parser
This script extracts tables from PDF files and saves them as CSV files. It supports command-line interface (CLI) for batch processing and also provides an optional web UI for interactive processing.
## Features
- Multi-page PDF support
- Progress display per lines/rows, per page, and per file
- CSV output with UTF-8 with BOM encoding
- Customizable edge and row tolerances for table detection
- Optional web UI for interactive processing using Gradio
## Installation
1. Clone the repository or download the script.
2. Install the required dependencies:
```bash
pip install rich camelot-py polars gradio gradio_pdf
```
## Usage
### Command-Line Interface (CLI)
To run the script via CLI, use the following command:
```bash
python src/app/parser.py input1.pdf input2.pdf output1.csv output2.csv
```
#### Arguments:
- `input_files`: List of input PDF files
- `output_files`: List of output CSV files (must match the number of input files)
#### Optional Arguments:
- `--delimiter`: Output file delimiter (default: `,`)
- `--edge_tol`: Tolerance parameter used to specify the distance between text and table edges (default: `50`)
- `--row_tol`: Tolerance parameter used to specify the distance between table rows (default: `10`)
- `--webui`: Launch the web UI
### Web UI
To run the script with the web UI, use the following command:
```bash
python src/app/run.py
```
This will launch a Gradio-based web application where you can upload PDFs and view the extracted tables interactively.
## Example
### CLI Example
```bash
python src/app/parser.py data/demo.pdf data/output.csv --delimiter ";" --edge_tol 60 --row_tol 40
```
## License
This project is licensed under the MIT License. |