Wikit
/

pdf-pages-classifier

@@ -1,82 +1,82 @@
-        ---
-        license: apache-2.0
-        pipeline_tag: image-classification
-        tags:
-        - image-classification
-        - multi-label-classification
-        - onnx
-        - openvino
-        - pdf
-        - document-understanding
-        - rag
-        datasets:
-        - Wikit/PdfVisClassif
-        ---
-        # PDF Page Classifier
-        Multi-label classifier for PDF page images. Determines whether a PDF page
-        requires image embedding (vs. text-only) in RAG pipelines.
-        Backbone: EfficientNet-Lite0. Exported to ONNX and OpenVINO INT8 via
-        Quantization-Aware Training (QAT). **No PyTorch required at inference time.**
-        ## Classes
-        - `Complex Table`
-- `Simple Table`
-- `Visual - Essential`
-- `Visual - Supportive`
-        Pages matching any of the following classes should trigger image embedding:
-        - `Complex Table`
-- `Visual - Essential`
-        Default threshold: `0.5`
-        ## Usage
-        ### With [chunknorris](https://github.com/wikit-ai/chunknorris) (recommended)
-        ```bash
-        pip install "chunknorris[ml-onnx]"       # ONNX backend
-        pip install "chunknorris[ml-openvino]"   # OpenVINO INT8, fastest on CPU
-        ```
-        ```python
-        from chunknorris.ml import load_classifier
-        clf = load_classifier("Wikit/pdf-pages-classifier")   # auto-selects best available backend
-        result = clf.predict("page.png")
-        # {"needs_image_embedding": True, "predicted_classes": [...], "probabilities": {...}}
-        ```
-        ### Standalone (no chunknorris)
-        ```bash
-        git clone https://huggingface.co/Wikit/pdf-pages-classifier
-        cd pdf-pages-classifier
-        pip install onnxruntime Pillow numpy   # or: openvino Pillow numpy
-        ```
-        ```python
-        from classifiers import load_classifier
-        clf = load_classifier(".")            # auto-selects available backend
-        result = clf.predict("page.png")
-        ```
-        ## Files
-        | File | Format | Notes |
-        |------|--------|-------|
-        | `model.onnx` | ONNX FP32 | Cross-platform CPU/GPU inference |
-        | `openvino_model.xml/.bin` | OpenVINO INT8 | Fastest CPU inference (QAT) |
-        | `pytorch_model.bin` | PyTorch | Raw checkpoint; requires `torch` + `timm` |
-        | `config.json` | JSON | Preprocessing config and class names |
-        | `classifiers/` | Python | Standalone inference scripts (no chunknorris needed) |
-        ## Dataset
-        Trained on [Wikit/PdfVisClassif](https://huggingface.co/datasets/Wikit/PdfVisClassif).

+---
+license: apache-2.0
+pipeline_tag: image-classification
+tags:
+- image-classification
+- multi-label-classification
+- onnx
+- openvino
+- pdf
+- document-understanding
+- rag
+datasets:
+- Wikit/PdfVisClassif
+---
+# PDF Page Classifier
+Multi-label classifier for PDF page images. Determines whether a PDF page
+requires image embedding (vs. text-only) in RAG pipelines.
+Backbone: EfficientNet-Lite0. Exported to ONNX and OpenVINO INT8 via
+Quantization-Aware Training (QAT). **No PyTorch required at inference time.**
+## Classes
+- `Complex Table`
+- `Simple Table`
+- `Visual - Essential`
+- `Visual - Supportive`
+Pages matching any of the following classes should trigger image embedding:
+- `Complex Table`
+- `Visual - Essential`
+Default threshold: `0.5`
+## Usage
+### With [chunknorris](https://github.com/wikit-ai/chunknorris) (recommended)
+```bash
+pip install "chunknorris[ml-onnx]"       # ONNX backend
+pip install "chunknorris[ml-openvino]"   # OpenVINO INT8, fastest on CPU
+```
+```python
+from chunknorris.ml import load_classifier
+clf = load_classifier("Wikit/pdf-pages-classifier")   # auto-selects best available backend
+result = clf.predict("page.png")
+# {"needs_image_embedding": True, "predicted_classes": [...], "probabilities": {...}}
+```
+### Standalone (no chunknorris)
+```bash
+git clone https://huggingface.co/Wikit/pdf-pages-classifier
+cd pdf-pages-classifier
+pip install onnxruntime Pillow numpy   # or: openvino Pillow numpy
+```
+```python
+from classifiers import load_classifier
+clf = load_classifier(".")            # auto-selects available backend
+result = clf.predict("page.png")
+```
+## Files
+| File | Format | Notes |
+|------|--------|-------|
+| `model.onnx` | ONNX FP32 | Cross-platform CPU/GPU inference |
+| `openvino_model.xml/.bin` | OpenVINO INT8 | Fastest CPU inference (QAT) |
+| `pytorch_model.bin` | PyTorch | Raw checkpoint; requires `torch` + `timm` |
+| `config.json` | JSON | Preprocessing config and class names |
+| `classifiers/` | Python | Standalone inference scripts (no chunknorris needed) |
+## Dataset
+Trained on [Wikit/PdfVisClassif](https://huggingface.co/datasets/Wikit/PdfVisClassif).