Guitar Tab OMR

Guitar Tab OMR is a PyTorch optical music recognition model for converting rendered guitar tablature system images into Tabbox guitar OMR tokens.

Try the interactive demo on Hugging Face Spaces:

https://huggingface.co/spaces/kk9293/guitar-tab-omr

Guitar Tab OMR demo

Training scripts and metrics:

https://github.com/LIU9293/guitar-tab-omr

Intended Use

The model is designed for cropped single-system guitar tablature images, especially digital score exports from Guitar Pro, alphaTab, MuseScore-style renderers, and similar notation tools. It predicts a token sequence describing bars, beats, durations, notes, rests, tuplets, tuning/capo metadata, and common guitar techniques.

It is not intended as a general OCR model, a full score editor, or a reliable parser for arbitrary scanned sheet music.

Files

  • best.pt: best validation checkpoint.
  • config.json: training and model configuration.
  • vocab.json: OMR token vocabulary.
  • summary.json: best checkpoint summary.
  • metrics.jsonl: training history.
  • eval-greedy.json/metrics.json: greedy decode evaluation metrics.
  • eval-constrained-beam.json/metrics.json: constrained beam evaluation metrics.

Model

  • Architecture: ConvNeXt tiny encoder + branch transformer decoder.
  • Input size: 1200 x 224, grayscale.
  • Max decode length: 384.
  • Hidden size: 256.
  • Decoder layers: 3.
  • Attention heads: 4.
  • Feed-forward size: 1024.

Evaluation

Validation was run on the renderer-augmented Tabbox guitar OMR dataset.

Decode Samples Exact Match Parse Success Core SER Token Edit Similarity
Greedy 200 76.5% 99.5% 0.0236 97.64%
Constrained beam 200 76.5% 99.5% 0.0240 97.60%

Best checkpoint:

  • Best metric: val_core_ser
  • Best value: 0.00852
  • Best epoch: 24

Limitations

  • Works best on clearly cropped guitar tab systems, not full PDF pages.
  • Training data is still biased toward clean digital exports.
  • Some uncommon notation layouts, handwritten scans, dense multi-voice notation, and unusual renderer styles may degrade token quality.
  • Predicted tokens may require downstream validation before converting to a playable score.

Usage

This model uses custom model and decoding code rather than a stock Transformers pipeline. For a simple interactive trial, use the Space linked above. For programmatic use, load best.pt, config.json, and vocab.json with the inference code in the Tabbox repository.

Downloads last month
53
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using kk9293/guitar-tab-omr 1