StreamLip Audio Reconstruction Checkpoints
This repository contains the project-specific checkpoints and pinned runtime weights for the StreamLip audio reconstruction demo.
Contents
recon/streamlip_recon_timbrefix_step_002000.pt: default audio reconstruction checkpoint.recon/streamlip_recon_residual_base_step_005000.pt: residual base checkpoint used by the recon configuration.v5/streamlip_v5_olmo_step_002000_infer.pt: inference-only StreamLip V5 checkpoint. It keepsstepandmodeland removes optimizer state.streamlip-v5-lm/: StreamLip V5 language model directory.norm/latent_norm_stats.npz: latent normalization statistics.auto-avsr/vsr_trlrs2lrs3vox2avsp_base.pth: pinned Auto-AVSR weights required by the default local pipeline.speaker/resnet50-11ad3fa6.pth: pinned speaker/timbre frontend weights.
The public pretrained Mimi and SmolLM2 dependencies are intentionally not duplicated here. Download them separately:
export HF_ENDPOINT=https://hf-mirror.com
hf download kyutai/mimi --local-dir ckpt/mimi
hf download HuggingFaceTB/SmolLM2-360M --local-dir ckpt/smollm2-360m
See the project README for full environment setup and raw-video inference commands.