BoxOfColors Claude Sonnet 4.6 commited on
Commit
8b87263
·
1 Parent(s): 6ef8b2e

Normalize all seg wavs to stereo (2,T) at save time

Browse files

- _to_stereo(): new helper that squeezes (1,T)→(T,) then duplicates
mono (T,)→(2,T). Handles all three model outputs uniformly.
- _save_seg_wavs(): applies _to_stereo() before np.save so every .npy
on disk is always (2,T). TARO and HunyuanFoley mono gets duplicated
to fake stereo; MMAudio's genuine stereo is preserved as-is.
- Eliminates the root cause of all channel-shape mismatches in _cf_join:
on-disk format is now uniformly stereo regardless of source model.
_load_seg_wavs squeeze and _normalize_channel_layout remain as
defensive fallbacks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. app.py +13 -3
app.py CHANGED
@@ -316,13 +316,23 @@ def _register_tmp_dir(tmp_dir: str) -> str:
316
  return tmp_dir
317
 
318
 
 
 
 
 
 
 
 
 
 
319
  def _save_seg_wavs(wavs: list[np.ndarray], tmp_dir: str, prefix: str) -> list[str]:
320
- """Save a list of numpy wav arrays to .npy files, return list of paths.
321
- This avoids serialising large float arrays into JSON/HTML data-state."""
 
322
  paths = []
323
  for i, w in enumerate(wavs):
324
  p = os.path.join(tmp_dir, f"{prefix}_seg{i}.npy")
325
- np.save(p, w)
326
  paths.append(p)
327
  return paths
328
 
 
316
  return tmp_dir
317
 
318
 
319
+ def _to_stereo(w: np.ndarray) -> np.ndarray:
320
+ """Ensure *w* is stereo (2, T). Squeezes (1,T) then duplicates mono."""
321
+ if w.ndim == 2 and w.shape[0] == 1:
322
+ w = w.squeeze(0) # (1, T) → (T,)
323
+ if w.ndim == 1:
324
+ w = np.stack([w, w], axis=0) # (T,) → (2, T)
325
+ return w
326
+
327
+
328
  def _save_seg_wavs(wavs: list[np.ndarray], tmp_dir: str, prefix: str) -> list[str]:
329
+ """Save a list of numpy wav arrays to .npy files as stereo (2, T).
330
+ Mono arrays are duplicated to stereo so the on-disk format is always
331
+ uniform — this avoids shape mismatches during cross-model regens."""
332
  paths = []
333
  for i, w in enumerate(wavs):
334
  p = os.path.join(tmp_dir, f"{prefix}_seg{i}.npy")
335
+ np.save(p, _to_stereo(w))
336
  paths.append(p)
337
  return paths
338