philschmid HF staff commited on
Commit
941ee89
1 Parent(s): 294e921

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -36
README.md CHANGED
@@ -6,46 +6,18 @@ tags:
6
  - vision
7
  ---
8
 
 
 
 
 
 
 
9
  # Donut (base-sized model, fine-tuned on CORD)
10
 
11
  Donut model fine-tuned on CORD. It was introduced in the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewok et al. and first released in [this repository](https://github.com/clovaai/donut).
12
 
13
- Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team.
14
 
15
- ## Model description
16
 
17
- Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.
18
 
19
- ![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/donut_architecture.jpg)
20
-
21
- ## Intended uses & limitations
22
-
23
- This model is fine-tuned on CORD, a document parsing dataset.
24
-
25
- We refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/donut) which includes code examples.
26
-
27
- ### BibTeX entry and citation info
28
-
29
- ```bibtex
30
- @article{DBLP:journals/corr/abs-2111-15664,
31
- author = {Geewook Kim and
32
- Teakgyu Hong and
33
- Moonbin Yim and
34
- Jinyoung Park and
35
- Jinyeong Yim and
36
- Wonseok Hwang and
37
- Sangdoo Yun and
38
- Dongyoon Han and
39
- Seunghyun Park},
40
- title = {Donut: Document Understanding Transformer without {OCR}},
41
- journal = {CoRR},
42
- volume = {abs/2111.15664},
43
- year = {2021},
44
- url = {https://arxiv.org/abs/2111.15664},
45
- eprinttype = {arXiv},
46
- eprint = {2111.15664},
47
- timestamp = {Thu, 02 Dec 2021 10:50:44 +0100},
48
- biburl = {https://dblp.org/rec/journals/corr/abs-2111-15664.bib},
49
- bibsource = {dblp computer science bibliography, https://dblp.org}
50
- }
51
- ```
 
6
  - vision
7
  ---
8
 
9
+ # Fork of [naver-clova-ix/donut-base-finetuned-cord-v2](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2)
10
+
11
+ > This is fork of [naver-clova-ix/donut-base-finetuned-cord-v2](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2) implementing a custom `handler.py` as an example for how to use `flair` models with [inference-endpoints](https://hf.co/inference-endpoints)
12
+
13
+ ---
14
+
15
  # Donut (base-sized model, fine-tuned on CORD)
16
 
17
  Donut model fine-tuned on CORD. It was introduced in the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewok et al. and first released in [this repository](https://github.com/clovaai/donut).
18
 
19
+ Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.
20
 
21
+ # Use with Inference Endpoints
22
 
 
23