marker-io / docs /install_ocrmypdf.md
Ritvik19's picture
Add all files and directories
c8a32e7
|
raw
history blame
No virus
1.53 kB
## Linux
- Run `apt-get install ocrmypdf`
- Install ghostscript > 9.55 by following [these instructions](https://ghostscript.readthedocs.io/en/latest/Install.html) or running `scripts/install/ghostscript_install.sh`.
- Run `pip install ocrmypdf`
- Install any tesseract language packages that you want (example `apt-get install tesseract-ocr-eng`)
- Set the tesseract data folder path
- Find the tesseract data folder `tessdata` with `find / -name tessdata`. Make sure to use the one corresponding to the latest tesseract version if you have multiple.
- Create a `local.env` file in the root `marker` folder with `TESSDATA_PREFIX=/path/to/tessdata` inside it
## Mac
Only needed if using `ocrmypdf` as the ocr backend.
- Run `brew install ocrmypdf`
- Run `brew install tesseract-lang` to add language support
- Run `pip install ocrmypdf`
- Set the tesseract data folder path
- Find the tesseract data folder `tessdata` with `brew list tesseract`
- Create a `local.env` file in the root `marker` folder with `TESSDATA_PREFIX=/path/to/tessdata` inside it
## Windows
- Install `ocrmypdf` and ghostscript by following [these instructions](https://ocrmypdf.readthedocs.io/en/latest/installation.html#installing-on-windows)
- Run `pip install ocrmypdf`
- Install any tesseract language packages you want
- Set the tesseract data folder path
- Find the tesseract data folder `tessdata` with `brew list tesseract`
- Create a `local.env` file in the root `marker` folder with `TESSDATA_PREFIX=/path/to/tessdata` inside it