words2csv / README.md
snake11235's picture
Update README.md
f2d4539 verified
metadata
title: Words2csv
emoji: 👁
colorFrom: purple
colorTo: gray
sdk: docker
pinned: false

words2csv

Upload a PDF or image and convert the detected content into CSV using one of the supported backends:

  • OpenAI (requires OPENAI_API_KEY)
  • Gemini (requires GEMINI_API_KEY)
  • olmOCR via Hugging Face Inference Endpoint (requires HF_TOKEN and a configured HF Endpoint URL)

The app runs as a Gradio UI (app.py).

Research

The result of the research you can find here

Prerequisites

  • Python 3.11+
  • One or more of the following credentials (depending on the model you pick in the UI):
    • OPENAI_API_KEY
    • GEMINI_API_KEY
    • HF_TOKEN (Hugging Face access token with permission to call your Inference Endpoint)

Quickstart (local)

  1. Create a virtualenv and install deps
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
  1. Create a .env file (or export env vars in your shell)
OPENAI_API_KEY=...
GEMINI_API_KEY=...
HF_TOKEN=...
  1. Run the app
python app.py

Then open http://localhost:7860.

Quickstart (Docker)

If you prefer Docker locally:

  1. Create .env with the keys you need (same as above)
  2. Run:
docker compose up --build

Open http://localhost:7860.

Run on your own Hugging Face account (Spaces)

This repo is configured for Hugging Face Spaces using the Docker SDK (see the YAML frontmatter at the top of this README and the Dockerfile).

1) Create the Space

  1. Go to https://huggingface.co/spaces
  2. Click Create new Space
  3. Choose:
    • Space SDK: Docker
    • Visibility: your choice (private is recommended if you use paid API keys)
  4. Create the Space

Tip: in the Create new Space flow, you can also use the Clone repository option and paste your repo URL to import this project directly.

2) Push this repository to the Space

Clone the Space repo and add this project files (or push from your existing git remote). In general, a Space is just a Git repository.

3) Add secrets (required)

In your Space, go to:

Settings -> Variables and secrets

Add:

  • OPENAI_API_KEY: required if you select an OpenAI model in the UI
  • GEMINI_API_KEY: required if you select a Gemini model in the UI
  • HF_TOKEN: required if you select the olmOCR-2-7B-1025-FP8 backend (Inference Endpoint)

Notes:

  • Secrets are injected as environment variables at runtime.
  • If you don’t set a key, selecting that backend will raise a runtime error.

4) Wait for the Space build and start

Once the Space finishes building, open it and upload a PDF/image.

Hugging Face Inference Endpoint (olmOCR backend)

The olmOCR backend does not call Google/OpenAI. It calls a Hugging Face Inference Endpoint using huggingface_hub.InferenceClient.

Important details:

  • The endpoint URL is currently hardcoded as HF_ENDPOINT_URL in olm_ocr.py.
  • To run this in your own HF account, you typically need your own endpoint URL.

Create your own endpoint

  1. Go to https://huggingface.co/inference-endpoints
  2. Create an endpoint for the model:
    • allenai/olmOCR-2-7B-1025-FP8
  3. Wait until the endpoint status is Running
  4. Copy the endpoint URL (it looks like https://<id>.<region>.<provider>.endpoints.huggingface.cloud)

Configure this repo to use your endpoint

Update HF_ENDPOINT_URL in olm_ocr.py to your endpoint URL.

The call is authenticated via HF_TOKEN:

  • Create a token at https://huggingface.co/settings/tokens
  • Make sure it can access your endpoint (and the endpoint is in the same account/org)

About GEMINI_API_KEY and Hugging Face

GEMINI_API_KEY is used by the Gemini backend (gemini_backend.py) via the google-genai client.

If you are running this app on Hugging Face Spaces:

  • You still provide GEMINI_API_KEY as a Space secret.
  • You do not get a Gemini key from Hugging Face.

If you specifically want Gemini calls to go through a Hugging Face-hosted endpoint, that would require a different integration than the current code (e.g. calling an HF Inference Endpoint hosting a Gemini-compatible service). The current implementation calls Google directly using your GEMINI_API_KEY.

Troubleshooting

  • OPENAI_API_KEY environment variable is not set.
    • Add OPENAI_API_KEY as env var (local .env) or Space secret.
  • GEMINI_API_KEY environment variable is not set.
    • Add GEMINI_API_KEY as env var (local .env) or Space secret.
  • olmOCR endpoint errors (401/403)
    • Ensure HF_TOKEN is set and has permission to call the endpoint.
    • Ensure HF_ENDPOINT_URL points to an endpoint you own / can access.
  • Space builds but doesn’t start / crashes
    • Check Space logs.
    • Make sure you’re not missing required secrets for the backend you select.