LocateAnything-3B CoreML

CoreML packages and a lightweight Python runner for image localization on Apple hardware.

Author: devin-lai markauto75@gmail.com

CoreML Inference Performance

This CoreML build is tuned for local macOS inference, where fast repeat runs matter more than model startup time. On an M5 Mac with 32GB memory, the optimized CoreML path improves the post-load inference workflow and makes the biggest jump in the decoder stage.

Benchmark setup:

  • Device: macOS M5, 32GB memory
  • Model: LocateAnything-3B
  • Input image: 1536x1024
  • Categories: person, car
  • Comparison focus: inference time after model loading
Metric CoreML Optimized PyTorch MPS bf16 Improvement
Post-load inference time 11.7s 12.7s ~1.1x faster
Generation time 7.64s 12.56s ~1.6x faster
Prefill time 1.72s 7.97s ~4.6x faster
Tokens per second 17.55 TPS 10.35 TPS ~1.7x higher throughput

For anyone running vision-language localization directly on a Mac, the practical win is lower wait time after the packages are loaded. CoreML reduces post-load inference from 12.7s to 11.7s, while the generation path improves from 12.56s to 7.64s.

The standout improvement is prefill: 7.97s drops to 1.72s, a roughly 4.6x speedup. Throughput also rises from 10.35 TPS to 17.55 TPS, making local decoding noticeably smoother for repeated image queries.

In short, this CoreML version delivers faster local inference, much faster prefill, and higher decoding throughput for macOS users who want the model running close to the metal.

Contents

  • LocateAnything-vision.mlpackage - image encoder package
  • LocateAnything-embed.mlpackage - token embedding package
  • LocateAnything-decoder.mlpackage - decoder package
  • LocateAnything-assets/ - tokenizer and runtime configuration
  • run_locateanything_image_coreml.py - still-image runner
  • test.png - sample input

Setup

pip install -r requirements.txt

Example

python run_locateanything_image_coreml.py \
  --input test.png \
  --categories "person,car"

By default, the script writes:

  • test.coreml.annotated.png
  • test.coreml.detections.json

Notes

The packages are configured for the image grid stored in the vision package metadata. Use the bundled assets directory with these packages so token ids and runtime limits stay aligned.

The license follows the upstream NVIDIA LocateAnything-3B terms linked in the metadata above.

Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support