Update README.md

89cceec over 1 year ago

No virus

9.8 kB

	---
	language: ru
	datasets:
	- SberDevices/Golos
	- common_voice
	metrics:
	- wer
	- cer
	tags:
	- audio
	- automatic-speech-recognition
	- speech
	- common_voice
	- SberDevices/Golos
	license: apache-2.0
	widget:
	- example_title: test Russian speech "нейросети это хорошо" (in English, "neural networks are good")
	src: https://huggingface.co/bond005/wav2vec2-large-ru-golos-with-lm/resolve/main/test_sound_ru.flac
	model-index:
	- name: XLSR Wav2Vec2 Russian with Language Model by Ivan Bondarenko
	results:
	- task:
	name: Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Sberdevices Golos (crowd)
	type: SberDevices/Golos
	args: ru
	metrics:
	- name: Test WER
	type: wer
	value: 4.272
	- name: Test CER
	type: cer
	value: 0.983
	- task:
	name: Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Sberdevices Golos (farfield)
	type: SberDevices/Golos
	args: ru
	metrics:
	- name: Test WER
	type: wer
	value: 11.405
	- name: Test CER
	type: cer
	value: 3.628
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice ru
	type: common_voice
	args: ru
	metrics:
	- name: Test WER
	type: wer
	value: 19.053
	- name: Test CER
	type: cer
	value: 4.876
	---
	# Wav2Vec2-Large-Ru-Golos-With-LM

	The Wav2Vec2 model is based on [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53), fine-tuned in Russian using [Sberdevices Golos](https://huggingface.co/datasets/SberDevices/Golos) with audio augmentations like as pitch shift, acceleration/deceleration of sound, reverberation etc.

	The 2-gram language model is built on the Russian text corpus obtained from three open sources:

	- random 10% subset of [Taiga](https://tatianashavrina.github.io/taiga_site)
	- [Russian Wikipedia](https://ru.wikipedia.org)
	- [Russian Wikinews](https://ru.wikinews.org).

	## Usage

	When using this model, make sure that your speech input is sampled at 16kHz.

	You can use this model by writing your own inference script:

	```python
	import os
	import warnings

	import librosa
	import nltk
	import numpy as np

	import torch
	from datasets import load_dataset
	from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM

	MODEL_ID = "bond005/wav2vec2-large-ru-golos-with-lm"
	DATASET_ID = "bond005/sberdevices_golos_10h_crowd"
	SAMPLES = 20

	nltk.download('punkt')
	num_processes = max(1, os.cpu_count())

	test_dataset = load_dataset(DATASET_ID, split=f"test[:{SAMPLES}]")
	processor = Wav2Vec2ProcessorWithLM.from_pretrained(MODEL_ID)
	model = Wav2Vec2ForCTC.from_pretrained(MODEL_ID)

	# Preprocessing the datasets.
	# We need to read the audio files as arrays
	def speech_file_to_array_fn(batch):
	speech_array = batch["audio"]["array"]
	batch["speech"] = np.asarray(speech_array, dtype=np.float32)
	return batch

	removed_columns = set(test_dataset.column_names)
	removed_columns -= {'transcription', 'speech'}
	removed_columns = sorted(list(removed_columns))
	with warnings.catch_warnings():
	warnings.simplefilter("ignore")
	test_dataset = test_dataset.map(
	speech_file_to_array_fn,
	num_proc=num_processes,
	remove_columns=removed_columns
	)

	inputs = processor(test_dataset["speech"], sampling_rate=16_000,
	return_tensors="pt", padding=True)
	with torch.no_grad():
	logits = model(inputs.input_values,
	attention_mask=inputs.attention_mask).logits
	predicted_sentences = processor.batch_decode(
	logits=logits.numpy(),
	num_processes=num_processes
	).text

	with warnings.catch_warnings():
	warnings.simplefilter("ignore")
	for i, predicted_sentence in enumerate(predicted_sentences):
	print("-" * 100)
	print("Reference:", test_dataset[i]["transcription"])
	print("Prediction:", predicted_sentence)
	```

	```text
	----------------------------------------------------------------------------------------------------
	Reference: шестьдесят тысяч тенге сколько будет стоить
	Prediction: шестьдесят тысяч тенге сколько будет стоить
	----------------------------------------------------------------------------------------------------
	Reference: покажи мне на смотрешке телеканал синергия тв
	Prediction: покажи мне на смотрешке телеканал синергия тв
	----------------------------------------------------------------------------------------------------
	Reference: заказать яблоки зеленые
	Prediction: заказать яблоки зеленые
	----------------------------------------------------------------------------------------------------
	Reference: алиса закажи килограммовый торт графские развалины
	Prediction: алиса закажи килограммовый торт графские развалины
	----------------------------------------------------------------------------------------------------
	Reference: ищи телеканал про бизнес на тиви
	Prediction: ищи телеканал про бизнес на тви
	----------------------------------------------------------------------------------------------------
	Reference: михаила мурадяна
	Prediction: михаила мурадяна
	----------------------------------------------------------------------------------------------------
	Reference: любовницы две тысячи тринадцать пятнадцатый сезон
	Prediction: любовница две тысячи тринадцать пятнадцатый сезон
	----------------------------------------------------------------------------------------------------
	Reference: найди боевики
	Prediction: найди боевики
	----------------------------------------------------------------------------------------------------
	Reference: гетто сезон три
	Prediction: гетта сезон три
	----------------------------------------------------------------------------------------------------
	Reference: хочу посмотреть ростов папа на телевизоре
	Prediction: хочу посмотреть ростов папа на телевизоре
	----------------------------------------------------------------------------------------------------
	Reference: сбер какое твое самое ненавистное занятие
	Prediction: сбер какое твое самое ненавистное занятие
	----------------------------------------------------------------------------------------------------
	Reference: афина чем платят у китайцев
	Prediction: афина чем платят у китайцев
	----------------------------------------------------------------------------------------------------
	Reference: джой как работает досрочное погашение кредита
	Prediction: джой как работает досрочное погашение кредита
	----------------------------------------------------------------------------------------------------
	Reference: у тебя найдется люк кейдж
	Prediction: у тебя найдется люк кейдж
	----------------------------------------------------------------------------------------------------
	Reference: у тебя будет лучшая часть пинк
	Prediction: у тебя будет лучшая часть пинк
	----------------------------------------------------------------------------------------------------
	Reference: пожалуйста пополните мне счет
	Prediction: пожалуйста пополните мне счет
	----------------------------------------------------------------------------------------------------
	Reference: анне павловне шабуровой
	Prediction: анне павловне шабуровой
	----------------------------------------------------------------------------------------------------
	Reference: врубай на смотрешке муз тв
	Prediction: врубай на смотрешке муз тиви
	----------------------------------------------------------------------------------------------------
	Reference: найди на смотрешке лдпр тв
	Prediction: найди на смотрешке лдпр тв
	----------------------------------------------------------------------------------------------------
	Reference: сбер мне нужен педикюр забей мне место
	Prediction: сбер мне нужен педикюр забелье место
	```


	The Google Colab version of [this script](https://colab.research.google.com/drive/1SnQmrt6HmMNV-zK-UCPajuwl1JvoCqbX?usp=sharing) is available too.

	## Evaluation
	This model was evaluated on the test subsets of [SberDevices Golos](https://huggingface.co/datasets/SberDevices/Golos) and [Common Voice 6.0](https://huggingface.co/datasets/common_voice) (Russian part), but it was trained on the training subset of SberDevices Golos only.

	## Citation
	If you want to cite this model you can use this:

	```bibtex
	@misc{bondarenko2022wav2vec2-large-ru-golos,
	title={XLSR Wav2Vec2 Russian with 2-gram Language Model by Ivan Bondarenko},
	author={Bondarenko, Ivan},
	publisher={Hugging Face},
	journal={Hugging Face Hub},
	howpublished={\url{https://huggingface.co/bond005/wav2vec2-large-ru-golos-with-lm}},
	year={2022}
	}
	```