Update README.md
Browse files
README.md
CHANGED
|
@@ -51,5 +51,24 @@ Notable differences from other available models include:
|
|
| 51 |
'Finger snapping'
|
| 52 |
```
|
| 53 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
## Fine-tuning
|
| 55 |
[`example_finetune_esc50.ipynb`](https://github.com/jimbozhang/hf_transformers_custom_model_ced/blob/main/example_finetune_esc50.ipynb) demonstrates how to train a linear head on the ESC-50 dataset with the CED encoder frozen.
|
|
|
|
| 51 |
'Finger snapping'
|
| 52 |
```
|
| 53 |
|
| 54 |
+
## Inference (Onnx)
|
| 55 |
+
```python
|
| 56 |
+
>>> from optimum.onnxruntime import ORTModelForAudioClassification
|
| 57 |
+
|
| 58 |
+
>>> model_name = "mispeech/ced-tiny"
|
| 59 |
+
>>> model = ORTModelForAudioClassification.from_pretrained(model_name, trust_remote_code=True)
|
| 60 |
+
|
| 61 |
+
>>> import torchaudio
|
| 62 |
+
>>> audio, sampling_rate = torchaudio.load("/path-to/JeD5V5aaaoI_931_932.wav")
|
| 63 |
+
>>> assert sampling_rate == 16000
|
| 64 |
+
>>> input_name = model.session.get_inputs()[0].name
|
| 65 |
+
>>> output = model(**{input_name: torch.randn(1, 16000)})
|
| 66 |
+
>>> logits = output.logits.squeeze()
|
| 67 |
+
>>> for idx in logits.argsort()[-2:][::-1]:
|
| 68 |
+
>>> print(f"{model.config.id2label[idx]}: {logits[idx]:.4f}")
|
| 69 |
+
'Finger snapping: 0.9155'
|
| 70 |
+
'Slap: 0.0567'
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
## Fine-tuning
|
| 74 |
[`example_finetune_esc50.ipynb`](https://github.com/jimbozhang/hf_transformers_custom_model_ced/blob/main/example_finetune_esc50.ipynb) demonstrates how to train a linear head on the ESC-50 dataset with the CED encoder frozen.
|