File size: 1,165 Bytes
9b15c29
 
095ee51
 
 
 
9b15c29
 
 
 
 
 
eeb5674
9b15c29
 
 
9f0c342
9b15c29
 
cf53d80
9f0c342
9b15c29
 
 
 
 
 
9f0c342
 
e7f38f5
b4f0ddc
e7f38f5
 
 
b4f0ddc
 
 
237104f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: cc-by-4.0
tags:
- nougat
- small
- ocr
---


# nougat-small onnx


https://huggingface.co/facebook/nougat-small but exported to onnx. This is **not quantized**.


```python
from transformers import NougatProcessor
from optimum.onnxruntime import ORTModelForVision2Seq

model_name = 'pszemraj/nougat-small-onnx'
processor = NougatProcessor.from_pretrained(model_name)
model = ORTModelForVision2Seq.from_pretrained(
    model_name,
    provider="CPUExecutionProvider", # 'CUDAExecutionProvider' for gpu 
    use_merged=False,
    use_io_binding=True
)
```

on colab CPU-only (_at time of writing_) you may get `CuPy` errors, to solve this uninstall it:

```sh
pip uninstall cupy-cuda11x -y
```

## how do da inference?

See [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/b46d3e89e631701ef205297435064ab780c4853a/Nougat/Inference_with_Nougat_to_read_scientific_PDFs.ipynb) or [this basic notebook](https://huggingface.co/pszemraj/nougat-small-onnx/blob/main/nougat-small-onnx-example.ipynb) I uploaded. It seems ONNX brings CPU inference times to 'feasible' - it took ~15 mins for _Attention is All You Meme_ on Colab free CPU runtime.