|
--- |
|
license: mit |
|
tags: |
|
- pyannote |
|
- pyannote-audio |
|
- pyannote-audio-pipeline |
|
- audio |
|
- voice |
|
- speech |
|
- speaker |
|
- speaker-diarization |
|
- speaker-change-detection |
|
- endpoints-template |
|
library_name: generic |
|
--- |
|
# 🎹 Speaker diarization with Pyannote and Inference Endpoints |
|
|
|
|
|
This repository implements a custom `handler` for `speaker-diarization` for 🤗 Inference Endpoints using Pyannote. The code for the customized pipeline is in the [handler.py](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/handler.py). |
|
|
|
There is also a [notebook](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py` |
|
|
|
### Request |
|
|
|
The endpoint expects a binary audio file. Below are a cURL and a Python example using the `requests` library. |
|
|
|
**curl** |
|
|
|
```bash |
|
# load audio file |
|
wget https://cdn-media.huggingface.co/speech_samples/sample1.flac |
|
|
|
# run request |
|
curl --request POST \ |
|
--url https://{ENDPOINT}/ \ |
|
--header 'Content-Type: audio/x-wav' \ |
|
--header 'Authorization: Bearer {HF_TOKEN}' \ |
|
--data-binary '@sample.wav' |
|
``` |
|
|
|
**Python** |
|
|
|
```python |
|
import json |
|
from typing import List |
|
import requests as r |
|
import base64 |
|
import mimetypes |
|
|
|
ENDPOINT_URL="" |
|
HF_TOKEN="" |
|
|
|
def predict(path_to_audio:str=None): |
|
# read audio file |
|
with open(path_to_audio, "rb") as i: |
|
b = i.read() |
|
# get mimetype |
|
content_type= mimetypes.guess_type(path_to_audio)[0] |
|
|
|
headers= { |
|
"Authorization": f"Bearer {HF_TOKEN}", |
|
"Content-Type": content_type |
|
} |
|
response = r.post(ENDPOINT_URL, headers=headers, data=b) |
|
return response.json() |
|
|
|
prediction = predict(path_to_audio="sample.wav") |
|
|
|
prediction |
|
|
|
``` |
|
expected output |
|
|
|
```json |
|
{"diarization": [ |
|
{"label": "SPEAKER_01", "start": "0.4978125", "stop": "1.3921875"}, |
|
{"label": "SPEAKER_01", "start": "1.8984375", "stop": "2.7590624999999998"}, |
|
{"label": "SPEAKER_02", "start": "2.9953125", "stop": "3.5015625000000004"}, |
|
{"label": "SPEAKER_01", "start": "3.5690625000000002", "stop": "4.311562500000001"} |
|
... |
|
``` |
|
|
|
|