File size: 1,040 Bytes
8255cb1 d3a090e 8255cb1 d3a090e 47bbe1e 53d03b3 47bbe1e d3a090e 4e07994 95c0725 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: image-to-text
---
# Optical Music Recognition Transformer
<!-- Provide a quick summary of what the model is/does. [Optional] -->
Image-To-Text model for optical music recognition.
The model is trained to predict simple notes in the [LilyPond](https://en.wikipedia.org/wiki/LilyPond) format from a given image.
Training data consists of artificial, handwritten and white board images.
The model itself is based on [Donut](https://huggingface.co/docs/transformers/model_doc/donut).
## Demo
![White Board Sample](sample1.png)
Prediction: `c'2 a''8 c''8 r4 c'1 e'8 c'8 c'8 a''8 f'4 a'8 c'8`
![White Board Sample](sample2.png)
Prediction: `d'8 g'8 c''8 a'8 d'2 c'8 f''8 d'4 c''4 e'8 r8 g'8 b'8 e'8 g'8 d'2`
Repo: https://github.com/UHHRobotics22-23/robot_project/tree/main/marimbabot_vision
Note: For historical reasons, images need to be rotated by 90 deg to the left to get the best performance. This is also the case for the "Hosted inference API". |