Guitar Tab OMR
Guitar Tab OMR is a PyTorch optical music recognition model for converting rendered guitar tablature system images into Tabbox guitar OMR tokens.
Try the interactive demo on Hugging Face Spaces:
https://huggingface.co/spaces/kk9293/guitar-tab-omr
Training scripts and metrics:
https://github.com/LIU9293/guitar-tab-omr
Intended Use
The model is designed for cropped single-system guitar tablature images, especially digital score exports from Guitar Pro, alphaTab, MuseScore-style renderers, and similar notation tools. It predicts a token sequence describing bars, beats, durations, notes, rests, tuplets, tuning/capo metadata, and common guitar techniques.
It is not intended as a general OCR model, a full score editor, or a reliable parser for arbitrary scanned sheet music.
Files
best.pt: best validation checkpoint.config.json: training and model configuration.vocab.json: OMR token vocabulary.summary.json: best checkpoint summary.metrics.jsonl: training history.eval-greedy.json/metrics.json: greedy decode evaluation metrics.eval-constrained-beam.json/metrics.json: constrained beam evaluation metrics.
Model
- Architecture: ConvNeXt tiny encoder + branch transformer decoder.
- Input size:
1200 x 224, grayscale. - Max decode length:
384. - Hidden size:
256. - Decoder layers:
3. - Attention heads:
4. - Feed-forward size:
1024.
Evaluation
Validation was run on the renderer-augmented Tabbox guitar OMR dataset.
| Decode | Samples | Exact Match | Parse Success | Core SER | Token Edit Similarity |
|---|---|---|---|---|---|
| Greedy | 200 | 76.5% | 99.5% | 0.0236 | 97.64% |
| Constrained beam | 200 | 76.5% | 99.5% | 0.0240 | 97.60% |
Best checkpoint:
- Best metric:
val_core_ser - Best value:
0.00852 - Best epoch:
24
Limitations
- Works best on clearly cropped guitar tab systems, not full PDF pages.
- Training data is still biased toward clean digital exports.
- Some uncommon notation layouts, handwritten scans, dense multi-voice notation, and unusual renderer styles may degrade token quality.
- Predicted tokens may require downstream validation before converting to a playable score.
Usage
This model uses custom model and decoding code rather than a stock Transformers pipeline. For a simple interactive trial, use the Space linked above. For programmatic use, load best.pt, config.json, and vocab.json with the inference code in the Tabbox repository.
- Downloads last month
- 53
