GetmanY1
/

wav2vec2-large-multitask-swedish-ssd

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-large-multitask-swedish-ssd / README.md

GetmanY1's picture

Update README.md

2593caf 11 months ago

|

history blame contribute delete

No virus

2.07 kB

	---
	language:
	- sv
	tags:
	- multi-task
	---

	The best multi-task wav2vec 2.0 model for Swedish from [__Getman, Y., Al-Ghezi, R., Grósz, T., Kurimo, M. (2023) Multi-task wav2vec2 Serving as a Pronunciation Training System for Children__](https://www.isca-speech.org/archive/slate_2023/getman23_slate.html) that performs ASR and speech pronunciation rating task simultaneously.

	## Usage

	You must first install [aalto-speech/multitask-wav2vec2](https://github.com/aalto-speech/multitask-wav2vec2) to use this model. The model can then be used directly as follows:

	```python
	import torch
	import librosa
	import datasets
	from transformers import Wav2Vec2ForMultiTask, Wav2Vec2Processor

	def map_to_array(batch):
	speech, _ = librosa.load(batch["file"], sr=16000, mono=True)
	batch["speech"] = speech
	return batch

	def map_to_pred_multitask(batch):
	device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
	input_values = processor(batch["speech"], sampling_rate=16000, return_tensors="pt", padding="longest").input_values
	with torch.no_grad():
	logits = model(input_values.to(device)).logits
	predicted_ids_ctc = torch.argmax(logits[1], dim=-1)
	transcription = processor.batch_decode(predicted_ids_ctc)
	batch["transcription"] = transcription
	predicted_ids = torch.argmax(logits[0], dim=-1)
	batch['predictions'] = predicted_ids
	return batch

	processor = Wav2Vec2Processor.from_pretrained(MODEL_PATH)
	model = Wav2Vec2ForMultiTask.from_pretrained(MODEL_PATH)

	test_dataset = test_dataset.map(map_to_array)
	result = test_dataset.map(map_to_pred_multitask)
	```

	## Citation

	If you use our models or training scripts, please cite our article as:

	```bibtex
	@inproceedings{getman23_slate,
	author={Yaroslav Getman and Ragheb Al-Ghezi and Tamas Grosz and Mikko Kurimo},
	title={{Multi-task wav2vec2 Serving as a Pronunciation Training System for Children}},
	year=2023,
	booktitle={Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE)},
	pages={36--40},
	doi={10.21437/SLaTE.2023-8}
	}
	```