LatentJam Models
ONNX model weights for LatentJam β a privacy-first Android music player that recommends what to play next entirely on-device. These models live here because clap_audio.onnx is 116 MB and exceeds GitHub's 100 MB per-file cap.
The Android app downloads these files at build time via scripts/download-models.sh and bundles them into app/src/main/assets/ml/. Inference at runtime uses ONNX Runtime with the Qualcomm QNN execution provider for Hexagon NPU offload on Snapdragon devices, falling back to CPU on everything else.
Files
| File | Size | Role |
|---|---|---|
clap_audio.onnx |
116 MB | Audio encoder derived from CLAP. Consumes a 15 s mono PCM chunk at 48 kHz, produces a 512-d L2-normalized embedding per track. Runs once per track during library indexing, then the embedding is cached in the app's Room database. |
predictor_state.onnx |
32 MB | Transformer-style state encoder. Reads a sequence of recent listening events (skip / listen-through / replay, weighted by recency) and produces a user-state vector. |
predictor_scorer_n100.onnx |
5 MB | Top-100 candidate scorer. Given the predictor state and 100 candidate embeddings (chosen by approximate-nearest-neighbor retrieval against the user state), scores each candidate. The highest score becomes the next track in smart-shuffle mode. |
embedding_version.txt |
69 B | Bumps when the encoder changes. The app re-extracts all embeddings on mismatch. |
predictor_version.txt |
20 B | Bumps when the predictor changes. The app drops the predictor cache on mismatch. |
Intended use
- Powering the smart-shuffle feature in the LatentJam Android app: cycling the shuffle button to
SMARTpicks the next track using these models. - Experimenting with on-device music recommendation on mobile. The encoder + predictor are deliberately small β the entire pipeline (audio decode β encoder β state encoder β scorer) runs end-to-end in under a second on a Snapdragon 8 Gen 3 with the Hexagon NPU enabled.
These models are not intended for:
- Server-side recommendation (use a bigger CLAP variant and a proper retrieval index)
- Music classification or tagging
- Generating audio
Pipeline overview
Library indexing (one-time, in background)
ββββββββββββββββββββββββββββββββββββββββ
mp3 / flac / opus / m4a / ogg βββ€ native C++ decoder (in LatentJam) β
β β β
β 15 s mono PCM at 48 kHz β
β β β
β clap_audio.onnx (this repo) β
β β β
β 512-d embedding, L2-normalized β
β β β
β Room (on-device cache) β
ββββββββββββββββββββββββββββββββββββββββ
Smart-shuffle inference (on demand)
ββββββββββββββββββββββββββββββββββββββββ
listening history (Room) βββ€ predictor_state.onnx (this repo) β
β β β
β user-state vector β
β β β
β ANN retrieval over cached embeddings β
β β β
β 100 candidate tracks β
β β β
β predictor_scorer_n100.onnx β
β β β
β next track β
ββββββββββββββββββββββββββββββββββββββββ
Privacy
- All inference is on-device. No audio, no embeddings, no listening history is ever transmitted anywhere.
- The LatentJam Android app does not request the
INTERNETpermission for the recommender. The only network access is the build-time download from this repo onto the developer's machine.
Limitations
- Smart mode requires that an embedding has been computed for every track. The first time you index a large library this takes a while β the encoder runs in the background only when the device is charging + idle (via WorkManager) to avoid thermal throttling and battery drain.
- The encoder is CLAP-derived but distilled to fit on-device. Genre/mood discrimination is good for popular Western genres and weaker for genres CLAP's training data underrepresented.
- The predictor was trained on a closed user-history dataset and may not generalize perfectly to your taste right away. On-device fine-tuning is planned but not yet shipped (see
ml/retrain/RetrainWorker.ktin the app repo β currently a stub).
License
GPL-3.0-or-later, matching the LatentJam Android app.
The CLAP audio encoder is derived from LAION's CLAP (CC0/MIT) and quantized + exported to ONNX for on-device use. The state encoder and scorer were trained from scratch for this project.
Links
- π± Android app: https://github.com/Nikita-sud/latentjam
- π Architecture notes: https://github.com/Nikita-sud/latentjam/blob/main/ARCHITECTURE_NOTES.md
- π Fork notice & attribution: https://github.com/Nikita-sud/latentjam/blob/main/FORK_NOTICE.md