File size: 2,276 Bytes
12c06de
19692ef
 
 
 
 
 
31a76f2
12c06de
 
 
d6a5961
 
 
c8f7b53
f50c50e
 
c8f7b53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77e643c
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
title: VoID Demo from AISF Hackathon
emoji: 😤
colorFrom: blue
colorTo: yellow
sdk: gradio
python_version: 3.9
app_file: gradio_app.py
pinned: false
---

# VoID: Voice Identifier

Classifier to recognize the identity of a speaker

Project was done as a part of the AISF hackathon (www.aisf.co), VoID was a finalist :D

## Details

### Dataset

Custom dataset with a set of 5-7 pure, unaugmented voices from labeled speakers saying the same phrase (similar to a passcode). In our initial version, we had three voices saying the same phrase ("The quick brown fox jumps over the lazy dog"). In our demo, we used three voices saying their own individual name.

For test and training data, we then created hundreds of augmented copies of each pure voice using augmentation layers (as seen in the augmentation notebook under the `/notebooks/` folder). The types of distortions were added in randomly and were even compounded randomly, augmentations such as clipping, noise, reverb, and etc.

If you would like access to the dataset used in this project, feel free to contact me.

### Model Structure

```
CNNetwork(
  (conv1): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv3): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv4): Sequential(
    (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=21888, out_features=3, bias=True)
  (softmax): Softmax(dim=1)
)
```

## Demo

Watch the demo of our model [here](https://www.loom.com/share/a8cb126af7b64ddaaa67c6f00e23f4e9)!

## Running locally

Make sure to create and install the conda dependencies declared in `void.yml`.