--- license: mit tags: - pyannote - pyannote-audio - pyannote-audio-pipeline - audio - voice - speech - speaker - speaker-diarization - speaker-change-detection - endpoints-template library_name: generic --- # 🎹 Speaker diarization with Pyannote and Inference Endpoints This repository implements a custom `handler` for `speaker-diarization` for 🤗 Inference Endpoints using Pyannote. The code for the customized pipeline is in the [handler.py](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/handler.py). There is also a [notebook](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py` ### Request The endpoint expects a binary audio file. Below are a cURL and a Python example using the `requests` library. **curl** ```bash # load audio file wget https://cdn-media.huggingface.co/speech_samples/sample1.flac # run request curl --request POST \ --url https://{ENDPOINT}/ \ --header 'Content-Type: audio/x-wav' \ --header 'Authorization: Bearer {HF_TOKEN}' \ --data-binary '@sample.wav' ``` **Python** ```python import json from typing import List import requests as r import base64 import mimetypes ENDPOINT_URL="" HF_TOKEN="" def predict(path_to_audio:str=None): # read audio file with open(path_to_audio, "rb") as i: b = i.read() # get mimetype content_type= mimetypes.guess_type(path_to_audio)[0] headers= { "Authorization": f"Bearer {HF_TOKEN}", "Content-Type": content_type } response = r.post(ENDPOINT_URL, headers=headers, data=b) return response.json() prediction = predict(path_to_audio="sample.wav") prediction ``` expected output ```json {"diarization": [ {"label": "SPEAKER_01", "start": "0.4978125", "stop": "1.3921875"}, {"label": "SPEAKER_01", "start": "1.8984375", "stop": "2.7590624999999998"}, {"label": "SPEAKER_02", "start": "2.9953125", "stop": "3.5015625000000004"}, {"label": "SPEAKER_01", "start": "3.5690625000000002", "stop": "4.311562500000001"} ... ```