File size: 1,816 Bytes
22eec52
3ee750e
 
22eec52
3ee750e
22eec52
 
3ee750e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f92f684
3ee750e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f92f684
3ee750e
 
 
 
 
 
 
 
 
f92f684
3ee750e
 
 
 
f92f684
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
title: camelot-pg
app_file: src/app/run.py
sdk: gradio
sdk_version: 4.32.2
---

# PDF Table Parser

This script extracts tables from PDF files and saves them as CSV files. It supports command-line interface (CLI) for batch processing and also provides an optional web UI for interactive processing.

## Features

- Multi-page PDF support
- Progress display per lines/rows, per page, and per file
- CSV output with UTF-8 with BOM encoding
- Customizable edge and row tolerances for table detection
- Optional web UI for interactive processing using Gradio

## Installation

1. Clone the repository or download the script.
2. Install the required dependencies:
    ```bash
    pip install rich camelot-py polars gradio gradio_pdf
    ```

## Usage

### Command-Line Interface (CLI)

To run the script via CLI, use the following command:

```bash
python src/app/parser.py input1.pdf input2.pdf output1.csv output2.csv
```

#### Arguments:

- `input_files`: List of input PDF files
- `output_files`: List of output CSV files (must match the number of input files)

#### Optional Arguments:

- `--delimiter`: Output file delimiter (default: `,`)
- `--edge_tol`: Tolerance parameter used to specify the distance between text and table edges (default: `50`)
- `--row_tol`: Tolerance parameter used to specify the distance between table rows (default: `10`)
- `--webui`: Launch the web UI

### Web UI

To run the script with the web UI, use the following command:

```bash
python src/app/run.py
```

This will launch a Gradio-based web application where you can upload PDFs and view the extracted tables interactively.

## Example

### CLI Example

```bash
python src/app/parser.py data/demo.pdf data/output.csv --delimiter ";" --edge_tol 60 --row_tol 40
```

## License

This project is licensed under the MIT License.