Contributing
Installation
Install pixi for pulling conda/pip packages:
curl -fsSL https://pixi.sh/install.sh | sh
Create pixi environment and enter activated shell:
pixi s
Create a virtualenv and install nemo-retriever-ocr into it via uv:
uv venv \
&& uv pip install -e ./nemo-retriever-ocr -v
Assert that OCR inference libraries can now be imported successfully:
uv run python -c "import nemo_retriever_ocr; import nemo_retriever_ocr_cpp"
Usage
nemo_retriever_ocr.inference.pipeline.NemoRetrieverOCR is the main entry point for performing OCR inference; it can be used to iterate over predictions for a given input image:
from nemo_retriever_ocr.inference.pipeline import NemoRetrieverOCR
ocr = NemoRetrieverOCR()
predictions = ocr("ocr-example-input-1.png")
for pred in predictions:
print(
f" - Text: '{pred['text']}', "
f"Confidence: {pred['confidence']:.2f}, "
f"Bbox: [left={pred['left']:.4f}, upper={pred['upper']:.4f}, right={pred['right']:.4f}, lower={pred['lower']:.4f}]"
)
Or predictions can be superimposed on the input image for visualization:
ocr(image_path, visualize=True)
The level of detection merging can be adjusted by modifying the merge_level argument (defaulting to "paragraph"):
ocr(image_path, merge_level="word") # leave detected words unmerged
ocr(image_path, merge_level="sentence") # merge detected words into sentences
An example script example.py is provided for convenience:
uv run python example.py ocr-example-input-1.png
Detection merging can be adjusted by modifying the --merge-level option:
uv run python example.py ocr-example-input-1.png --merge-level word
uv run python example.py ocr-example-input-1.png --merge-level sentence