facebook/seamless-m4t-v2-large · minimal HW requirements

Nov 1, 2024

•

edited Nov 1, 2024

Hello everyone. I have troubles running the model in SageMaker. Every try gets "Killed" on out-of-memory. Even ml.m5.4xlarge. I am trying to use code from the example:

from transformers import AutoProcessor, SeamlessM4Tv2Model
import torchaudio

processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")

# from audio
audio, orig_freq =  torchaudio.load("my_audio.wav")   
audio =  torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
audio_inputs = processor(audios=audio, return_tensors="pt")
audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="eng")[0].cpu().numpy().squeeze()

My audio has 4 minutes and 25MBs.

What are the minimal requirements to run the model?

goodeejay

Nov 17, 2024

Having same problem, any updates?

RafatK

Jan 9

I am able to infer Seamless m4T on a 3070 8GB vRAM while chunking audios upto 20 seconds. You can also check this issue: https://github.com/facebookresearch/seamless_communication/issues/82 where they mention the max audio length to prevent OOM