this is a first prototype of verbalens using whisper and nem
Transcribe audio to text with speaker diarization