I would like to see the intermediate text outputs leading to this

#2
by deleted - opened

Interesting, i plan to display intermediate text generation from LP mUsic caps and Llama vision translation ;)

deleted

Thanks. It would be great.

In the meantime I discovered that the Lp Music Caps Demo returns repeatable (or maybe only nearly identical?) outputs. It appears that when the captioning model fails it fails really creatively.

So it turns out that Cheesy_toy_melody_old_mc_donald.ogg is actually an Ethiopian traditional music piece with female vocal, harp and electric guitar 🀣

[0:00-10:00]
This is a live recording of an Ethiopian traditional music piece. There is a female vocalist singing in a melancholic manner. The melody is being played by the harp and the electric guitar to create a dissonant sound. The atmosphere is solemn. This piece could be used in the soundtrack of a drama movie, especially during the scenes of a historical drama movie.

[10:00-20:00]
This is a live recording of a classical music piece. There is a female vocalist singing melodically. The melody is being played by the harp while the rhythmic background is composed of the theremin. The atmosphere is dramatic. This piece could be used in the soundtrack of a historical drama movie during a documentary.

[20:00-30:00]
The low quality recording features a woodblock one shot. The recording is noisy and in mono, as it was probably recorded with a phone.

yes LP Music Caps could benefit some more training, for sure πŸ˜‰

deleted changed discussion status to closed

Sign up or log in to comment