Spaces:

JackIsNotInTheBox
/

Generate_Audio_for_Video

Running on Zero

BoxOfColors Claude Sonnet 4.6 commited on 23 days ago

Commit

8b87263

1 Parent(s): 6ef8b2e

Normalize all seg wavs to stereo (2,T) at save time

- _to_stereo(): new helper that squeezes (1,T)→(T,) then duplicates
mono (T,)→(2,T). Handles all three model outputs uniformly.
- _save_seg_wavs(): applies _to_stereo() before np.save so every .npy
on disk is always (2,T). TARO and HunyuanFoley mono gets duplicated
to fake stereo; MMAudio's genuine stereo is preserved as-is.
- Eliminates the root cause of all channel-shape mismatches in _cf_join:
on-disk format is now uniformly stereo regardless of source model.
_load_seg_wavs squeeze and _normalize_channel_layout remain as
defensive fallbacks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show

app.py +13 -3

app.py CHANGED Viewed

@@ -316,13 +316,23 @@ def _register_tmp_dir(tmp_dir: str) -> str:
     return tmp_dir
 def _save_seg_wavs(wavs: list[np.ndarray], tmp_dir: str, prefix: str) -> list[str]:
-    """Save a list of numpy wav arrays to .npy files, return list of paths.
-    This avoids serialising large float arrays into JSON/HTML data-state."""
     paths = []
     for i, w in enumerate(wavs):
         p = os.path.join(tmp_dir, f"{prefix}_seg{i}.npy")
-        np.save(p, w)
         paths.append(p)
     return paths

     return tmp_dir
+def _to_stereo(w: np.ndarray) -> np.ndarray:
+    """Ensure *w* is stereo (2, T).  Squeezes (1,T) then duplicates mono."""
+    if w.ndim == 2 and w.shape[0] == 1:
+        w = w.squeeze(0)          # (1, T) → (T,)
+    if w.ndim == 1:
+        w = np.stack([w, w], axis=0)   # (T,) → (2, T)
+    return w
 def _save_seg_wavs(wavs: list[np.ndarray], tmp_dir: str, prefix: str) -> list[str]:
+    """Save a list of numpy wav arrays to .npy files as stereo (2, T).
+    Mono arrays are duplicated to stereo so the on-disk format is always
+    uniform — this avoids shape mismatches during cross-model regens."""
     paths = []
     for i, w in enumerate(wavs):
         p = os.path.join(tmp_dir, f"{prefix}_seg{i}.npy")
+        np.save(p, _to_stereo(w))
         paths.append(p)
     return paths