grobid-ocr / README.md
oafzal's picture
Add Grobid PDF processor with Docker setup
d6903be
metadata
title: Grobid OCR
emoji: πŸ“„
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

Grobid PDF Document Processor

This space uses Grobid to extract structured information from PDF documents, particularly academic papers.

Features

  • Header Extraction: Fast extraction of title, authors, and abstract
  • Full Text Processing: Complete document processing including introduction sections
  • Academic Focus: Optimized for scholarly documents and research papers

Usage

  1. Upload a PDF document
  2. Choose extraction type:
    • Header Only: Quick extraction of metadata
    • Full Text: Complete processing including introduction
  3. Click "Process PDF" to get structured results

Technology

  • Grobid: Machine learning library for PDF extraction
  • Gradio: Web interface framework
  • Docker: Containerized deployment

Perfect for researchers who need to quickly extract key information from academic papers!