Possible fix for the CedModel's forward method for a long audio

#1
by daisukelab - opened

Hi, thanks for sharing the implementation on Hugging Face.

I noticed issues with 30-s audios and made a fix locally, which are around:
https://huggingface.co/mispeech/ced-base/blob/main/ced_model/modeling_ced.py#L456-L459
These codes have problems that the self.forward_head and self.ced are not found in the class.
I guess, instead, we should simply call forward_features as same as when the audio is short.

In addition, it seems to have an issue with reshaping right after that.

Then, the following is my local fix.

        x = self.forward_features(x)
        SPLB, T, D = x.shape
        x = torch.reshape(
            x, (n_splits, SPLB//n_splits, T, D)
        )

I hope it helps.

Xiaomi Audio & Speech Group org

Fixed. Thanks a lot!

jimbozhang changed discussion status to closed

Sign up or log in to comment