yongjian
/

wav2vec2-large-a

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-large-a / README.md

yongjian's picture

Update README.md

a3aa741 over 1 year ago

|

raw history blame

1.47 kB

	---
	language: en
	datasets:
	- LIUM/tedlium
	tags:
	- speech
	- audio
	- automatic-speech-recognition
	---
	Finetuned from [facebook/wav2vec2-large-960h-lv60-self](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self).

	# Installation
	1. PyTorch installation: https://pytorch.org/
	2. Install transformers: https://huggingface.co/docs/transformers/installation

	e.g., installation by conda
	```
	>> conda create -n wav2vec2 python=3.8
	>> conda install pytorch cudatoolkit=11.3 -c pytorch
	>> conda install -c conda-forge transformers
	```

	# Usage
	```python
	# Load the model and processor
	from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
	import numpy as np
	import torch

	model = Wav2Vec2ForCTC.from_pretrained(r'yongjian/wav2vec2-large-a') # Note: PyTorch Model
	processor = Wav2Vec2Processor.from_pretrained(r'yongjian/wav2vec2-large-a')

	# Load input
	np_wav = np.random.normal(size=(16000)).clip(-1, 1) # change it to your sample

	# Inference
	sample_rate = processor.feature_extractor.sampling_rate
	with torch.no_grad():
	model_inputs = processor(np_wav, sampling_rate=sample_rate, return_tensors="pt", padding=True)
	logits = model(model_inputs.input_values, attention_mask=model_inputs.attention_mask).logits # use .cuda() for GPU acceleration
	pred_ids = torch.argmax(logits, dim=-1).cpu()
	pred_text = processor.batch_decode(pred_ids)
	print('Transcription:', pred_text)
	```

	# Finetune Code
	Github Repo:
	https://github.com/CassiniHuy/wav2vec2_finetune