Flova
/

omr_transformer

vision-encoder-decoder

image-text-to-text

Model card Files Files and versions Community

Optical Music Recognition Transformer

Image-To-Text model for optical music recognition. The model is trained to predict simple notes in the LilyPond format from a given image. Training data consists of artificial, handwritten and white board images. The model itself is based on Donut.

Demo

Prediction: c'2 a''8 c''8 r4 c'1 e'8 c'8 c'8 a''8 f'4 a'8 c'8

Prediction: d'8 g'8 c''8 a'8 d'2 c'8 f''8 d'4 c''4 e'8 r8 g'8 b'8 e'8 g'8 d'2

Prediction: g'4 c'4 r8 f''8 e'8 d'8 r8 c'4 c'2 a'2 b'4 r4 a'8 r8 r4

Repo: https://github.com/UHHRobotics22-23/robot_project/tree/main/marimbabot_vision

Downloads last month: 180

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Flova/omr_transformer 1