code example

#1
by Shadow0482 - opened

Guys any interface codes to run the model? Like Python codes or something like that?

Idle Intelligence org

Hey Shadow,

I'm not sure you've seen the disclaimer on the Model car :
"Work in progress. This model exists primarily to test layer pruning + QLoRA recovery for browser deployment. Model behavior may differ from the original 32L model. No guarantees — use at your own discretion."

That being said, I built this to play with Speech to Speech in the browser, you can find some inference code (in rust, sorry) over there: https://github.com/idle-intelligence/sts-web and there's a demo link in there too.

I guess if you're interested I do have, somewhere, a version of nvidia's python inference that will run this model, let me know!

Hey ilnmtlbnm , thanks for sharing this looks really interesting.

I don’t think I’m seeing the model card on my end, not sure if I missed it
Also, I’d definitely be interested in the Python inference version (the NVIDIA one you mentioned), if you don’t mind sharing it. That would be super helpful for me to experiment with.

Appreciate it

Idle Intelligence org

Hey Shadow, I appreciate your interest. I meant the model card here: https://huggingface.co/idle-intelligence/personaplex-24L-q4_k-webgpu

For the python inference... I didn't think this through when answering initially (I did this thing a few weeks back already 😅 .
This GGUF is custom (layer-pruned + LoRA-recovered + Q4_K with Moshi-style tensor names), so it can't be loaded with NVIDIA's PersonaPlex Python inference as-is.

I added a native Rust CLI that runs the model end-to-end (Vulkan on Linux, Metal on macOS, no Python required):
https://github.com/idle-intelligence/sts-web#native-cli-sts

huggingface-cli download idle-intelligence/personaplex-24L-q4_k-webgpu
--local-dir personaplex-24L-q4_k-webgpu

cargo run --release --features "wgpu,cli" --bin sts -- \
--model-dir ./personaplex-24L-q4_k-webgpu \
--input question.wav \
--output response.wav
--voice NATF2

Takes ~5 min for the first build (Burn + cubecl), then ~64 ms/frame on a 3080. Voices: NATF0..3, NATM0..3, VARF0..4, VARM0..4.

Sign up or log in to comment