mispeech
/

ced-tiny

@@ -51,5 +51,24 @@ Notable differences from other available models include:
 'Finger snapping'
 ```
 ## Fine-tuning
 [`example_finetune_esc50.ipynb`](https://github.com/jimbozhang/hf_transformers_custom_model_ced/blob/main/example_finetune_esc50.ipynb) demonstrates how to train a linear head on the ESC-50 dataset with the CED encoder frozen.

 'Finger snapping'
 ```
+## Inference (Onnx)
+```python
+>>> from optimum.onnxruntime import ORTModelForAudioClassification
+>>> model_name = "mispeech/ced-tiny"
+>>> model = ORTModelForAudioClassification.from_pretrained(model_name, trust_remote_code=True)
+>>> import torchaudio
+>>> audio, sampling_rate = torchaudio.load("/path-to/JeD5V5aaaoI_931_932.wav")
+>>> assert sampling_rate == 16000
+>>> input_name = model.session.get_inputs()[0].name
+>>> output = model(**{input_name: torch.randn(1, 16000)})
+>>> logits = output.logits.squeeze()
+>>> for idx in logits.argsort()[-2:][::-1]:
+>>>   print(f"{model.config.id2label[idx]}: {logits[idx]:.4f}")
+'Finger snapping: 0.9155'
+'Slap: 0.0567'
+```
 ## Fine-tuning
 [`example_finetune_esc50.ipynb`](https://github.com/jimbozhang/hf_transformers_custom_model_ced/blob/main/example_finetune_esc50.ipynb) demonstrates how to train a linear head on the ESC-50 dataset with the CED encoder frozen.