MiniMaxAI/MiniMax-VL-01
Image-Text-to-Text
•
Updated
•
2.09k
•
228
what if we segment the audio first and then transcribe tho its some extra compute to throw in but imo it would resul tin better result !