philschmid
/

pyannote-speaker-diarization-endpoint

Voice Activity Detection

pyannote-audio-pipeline

speaker-diarization

speaker-change-detection

overlapped-speech-detection

Inference Endpoints

Model card Files Files and versions Community

pyannote-speaker-diarization-endpoint / README.md

philschmid's picture

philschmid HF staff

Update README.md

0c82ac9 about 2 years ago

|

2.1 kB

	---
	license: mit
	tags:
	- pyannote
	- pyannote-audio
	- pyannote-audio-pipeline
	- audio
	- voice
	- speech
	- speaker
	- speaker-diarization
	- speaker-change-detection
	- endpoints-template
	library_name: generic
	---
	# 🎹 Speaker diarization with Pyannote and Inference Endpoints


	This repository implements a custom `handler` for `speaker-diarization` for 🤗 Inference Endpoints using Pyannote. The code for the customized pipeline is in the [handler.py](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/handler.py).

	There is also a [notebook](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py`

	### Request

	The endpoint expects a binary audio file. Below are a cURL and a Python example using the `requests` library.

	curl

	```bash
	# load audio file
	wget https://cdn-media.huggingface.co/speech_samples/sample1.flac

	# run request
	curl --request POST \
	--url https://{ENDPOINT}/ \
	--header 'Content-Type: audio/x-wav' \
	--header 'Authorization: Bearer {HF_TOKEN}' \
	--data-binary '@sample.wav'
	```

	Python

	```python
	import json
	from typing import List
	import requests as r
	import base64
	import mimetypes

	ENDPOINT_URL=""
	HF_TOKEN=""

	def predict(path_to_audio:str=None):
	# read audio file
	with open(path_to_audio, "rb") as i:
	b = i.read()
	# get mimetype
	content_type= mimetypes.guess_type(path_to_audio)[0]

	headers= {
	"Authorization": f"Bearer {HF_TOKEN}",
	"Content-Type": content_type
	}
	response = r.post(ENDPOINT_URL, headers=headers, data=b)
	return response.json()

	prediction = predict(path_to_audio="sample.wav")

	prediction

	```
	expected output

	```json
	{"diarization": [
	{"label": "SPEAKER_01", "start": "0.4978125", "stop": "1.3921875"},
	{"label": "SPEAKER_01", "start": "1.8984375", "stop": "2.7590624999999998"},
	{"label": "SPEAKER_02", "start": "2.9953125", "stop": "3.5015625000000004"},
	{"label": "SPEAKER_01", "start": "3.5690625000000002", "stop": "4.311562500000001"}
	...
	```