LatentJam Models

ONNX model weights for LatentJam — a privacy-first Android music player that recommends what to play next entirely on-device. These models live here because clap_audio.onnx is 116 MB and exceeds GitHub's 100 MB per-file cap.

The Android app downloads these files at build time via scripts/download-models.sh and bundles them into app/src/main/assets/ml/. Inference at runtime uses ONNX Runtime with the Qualcomm QNN execution provider for Hexagon NPU offload on Snapdragon devices, falling back to CPU on everything else.

Files

File	Size	Role
`clap_audio.onnx`	116 MB	Audio encoder derived from CLAP. Consumes a 15 s mono PCM chunk at 48 kHz, produces a 512-d L2-normalized embedding per track. Runs once per track during library indexing, then the embedding is cached in the app's Room database.
`predictor_state.onnx`	32 MB	Transformer-style state encoder. Reads a sequence of recent listening events (skip / listen-through / replay, weighted by recency) and produces a user-state vector.
`predictor_scorer_n100.onnx`	5 MB	Top-100 candidate scorer. Given the predictor state and 100 candidate embeddings (chosen by approximate-nearest-neighbor retrieval against the user state), scores each candidate. The highest score becomes the next track in smart-shuffle mode.
`embedding_version.txt`	69 B	Bumps when the encoder changes. The app re-extracts all embeddings on mismatch.
`predictor_version.txt`	20 B	Bumps when the predictor changes. The app drops the predictor cache on mismatch.

Intended use

Powering the smart-shuffle feature in the LatentJam Android app: cycling the shuffle button to SMART picks the next track using these models.
Experimenting with on-device music recommendation on mobile. The encoder + predictor are deliberately small — the entire pipeline (audio decode → encoder → state encoder → scorer) runs end-to-end in under a second on a Snapdragon 8 Gen 3 with the Hexagon NPU enabled.

These models are not intended for:

Server-side recommendation (use a bigger CLAP variant and a proper retrieval index)
Music classification or tagging
Generating audio

Pipeline overview

                                Library indexing (one-time, in background)
                                ┌──────────────────────────────────────┐
mp3 / flac / opus / m4a / ogg ──┤ native C++ decoder (in LatentJam)    │
                                │   ↓                                  │
                                │   15 s mono PCM at 48 kHz            │
                                │   ↓                                  │
                                │ clap_audio.onnx (this repo)          │
                                │   ↓                                  │
                                │   512-d embedding, L2-normalized     │
                                │   ↓                                  │
                                │ Room (on-device cache)               │
                                └──────────────────────────────────────┘

                                Smart-shuffle inference (on demand)
                                ┌──────────────────────────────────────┐
listening history (Room)      ──┤ predictor_state.onnx (this repo)     │
                                │   ↓                                  │
                                │   user-state vector                  │
                                │   ↓                                  │
                                │ ANN retrieval over cached embeddings │
                                │   ↓                                  │
                                │   100 candidate tracks               │
                                │   ↓                                  │
                                │ predictor_scorer_n100.onnx           │
                                │   ↓                                  │
                                │   next track                         │
                                └──────────────────────────────────────┘

Privacy

All inference is on-device. No audio, no embeddings, no listening history is ever transmitted anywhere.
The LatentJam Android app does not request the INTERNET permission for the recommender. The only network access is the build-time download from this repo onto the developer's machine.

Limitations

Smart mode requires that an embedding has been computed for every track. The first time you index a large library this takes a while — the encoder runs in the background only when the device is charging + idle (via WorkManager) to avoid thermal throttling and battery drain.
The encoder is CLAP-derived but distilled to fit on-device. Genre/mood discrimination is good for popular Western genres and weaker for genres CLAP's training data underrepresented.
The predictor was trained on a closed user-history dataset and may not generalize perfectly to your taste right away. On-device fine-tuning is planned but not yet shipped (see ml/retrain/RetrainWorker.kt in the app repo — currently a stub).

License

GPL-3.0-or-later, matching the LatentJam Android app.

The CLAP audio encoder is derived from LAION's CLAP (CC0/MIT) and quantized + exported to ONNX for on-device use. The state encoder and scorer were trained from scratch for this project.

AILOVER3000
/

latentjam-models