LatentJam Models

ONNX model weights for LatentJam β€” a privacy-first Android music player that recommends what to play next entirely on-device. These models live here because clap_audio.onnx is 116 MB and exceeds GitHub's 100 MB per-file cap.

The Android app downloads these files at build time via scripts/download-models.sh and bundles them into app/src/main/assets/ml/. Inference at runtime uses ONNX Runtime with the Qualcomm QNN execution provider for Hexagon NPU offload on Snapdragon devices, falling back to CPU on everything else.

Files

File Size Role
clap_audio.onnx 116 MB Audio encoder derived from CLAP. Consumes a 15 s mono PCM chunk at 48 kHz, produces a 512-d L2-normalized embedding per track. Runs once per track during library indexing, then the embedding is cached in the app's Room database.
predictor_state.onnx 32 MB Transformer-style state encoder. Reads a sequence of recent listening events (skip / listen-through / replay, weighted by recency) and produces a user-state vector.
predictor_scorer_n100.onnx 5 MB Top-100 candidate scorer. Given the predictor state and 100 candidate embeddings (chosen by approximate-nearest-neighbor retrieval against the user state), scores each candidate. The highest score becomes the next track in smart-shuffle mode.
embedding_version.txt 69 B Bumps when the encoder changes. The app re-extracts all embeddings on mismatch.
predictor_version.txt 20 B Bumps when the predictor changes. The app drops the predictor cache on mismatch.

Intended use

  • Powering the smart-shuffle feature in the LatentJam Android app: cycling the shuffle button to SMART picks the next track using these models.
  • Experimenting with on-device music recommendation on mobile. The encoder + predictor are deliberately small β€” the entire pipeline (audio decode β†’ encoder β†’ state encoder β†’ scorer) runs end-to-end in under a second on a Snapdragon 8 Gen 3 with the Hexagon NPU enabled.

These models are not intended for:

  • Server-side recommendation (use a bigger CLAP variant and a proper retrieval index)
  • Music classification or tagging
  • Generating audio

Pipeline overview

                                Library indexing (one-time, in background)
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
mp3 / flac / opus / m4a / ogg ─── native C++ decoder (in LatentJam)    β”‚
                                β”‚   ↓                                  β”‚
                                β”‚   15 s mono PCM at 48 kHz            β”‚
                                β”‚   ↓                                  β”‚
                                β”‚ clap_audio.onnx (this repo)          β”‚
                                β”‚   ↓                                  β”‚
                                β”‚   512-d embedding, L2-normalized     β”‚
                                β”‚   ↓                                  β”‚
                                β”‚ Room (on-device cache)               β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                                Smart-shuffle inference (on demand)
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
listening history (Room)      ─── predictor_state.onnx (this repo)     β”‚
                                β”‚   ↓                                  β”‚
                                β”‚   user-state vector                  β”‚
                                β”‚   ↓                                  β”‚
                                β”‚ ANN retrieval over cached embeddings β”‚
                                β”‚   ↓                                  β”‚
                                β”‚   100 candidate tracks               β”‚
                                β”‚   ↓                                  β”‚
                                β”‚ predictor_scorer_n100.onnx           β”‚
                                β”‚   ↓                                  β”‚
                                β”‚   next track                         β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Privacy

  • All inference is on-device. No audio, no embeddings, no listening history is ever transmitted anywhere.
  • The LatentJam Android app does not request the INTERNET permission for the recommender. The only network access is the build-time download from this repo onto the developer's machine.

Limitations

  • Smart mode requires that an embedding has been computed for every track. The first time you index a large library this takes a while β€” the encoder runs in the background only when the device is charging + idle (via WorkManager) to avoid thermal throttling and battery drain.
  • The encoder is CLAP-derived but distilled to fit on-device. Genre/mood discrimination is good for popular Western genres and weaker for genres CLAP's training data underrepresented.
  • The predictor was trained on a closed user-history dataset and may not generalize perfectly to your taste right away. On-device fine-tuning is planned but not yet shipped (see ml/retrain/RetrainWorker.kt in the app repo β€” currently a stub).

License

GPL-3.0-or-later, matching the LatentJam Android app.

The CLAP audio encoder is derived from LAION's CLAP (CC0/MIT) and quantized + exported to ONNX for on-device use. The state encoder and scorer were trained from scratch for this project.

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support