max audio model input lenght

#6
by arubittu - opened

what is the maximum audio input lenght I can classify? assuming my sampling lenght is 16 khz. I have tried inferencing with input size up to 100 seconds (100 * 16k size array) and it gives the output. What input size is this model trained to accept? will it have the same performance at larger sizes?

audEERING GmbH org

there is no official max lenght, it'S defined by your ram, but we trained with segmented audio, about 2-6 seconds.
It showed that performance doesn't drop until 3 seconds

i want to do classification on audio clips of larger lenght , around 1 min, the performance should get better right since I am providing the model with more data to classify?

audEERING GmbH org
β€’
edited May 21

i guess best performance would be to segment them and then pool the predictions per speaker, but you could try both and compare

there is no official max lenght, it'S defined by your ram, but we trained with segmented audio, about 2-6 seconds.
It showed that performance doesn't drop until 3 seconds

did you use dynamic padding for batches? which is why 2 to 6s ?

Sign up or log in to comment