Optical Music Recognition Transformer

Image-To-Text model for optical music recognition. The model is trained to predict simple notes in the LilyPond format from a given image. Training data consists of artificial, handwritten and white board images. The model itself is based on Donut.

Demo

White Board Sample

Prediction: c'2 a''8 c''8 r4 c'1 e'8 c'8 c'8 a''8 f'4 a'8 c'8

White Board Sample

Prediction: d'8 g'8 c''8 a'8 d'2 c'8 f''8 d'4 c''4 e'8 r8 g'8 b'8 e'8 g'8 d'2

Handwritten White Board Sample

Prediction: g'4 c'4 r8 f''8 e'8 d'8 r8 c'4 c'2 a'2 b'4 r4 a'8 r8 r4

Repo: https://github.com/UHHRobotics22-23/robot_project/tree/main/marimbabot_vision

Downloads last month
41
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using Flova/omr_transformer 1