Full-text search
+ 1,000 results
facebook / wav2vec2-base
README.md
model
11 matches
tags:
transformers, pytorch, wav2vec2, pretraining, speech, en, dataset:librispeech_asr, arxiv:2006.11477, license:apache-2.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/)
The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
facebook / wav2vec2-base-960h
README.md
model
29 matches
tags:
transformers, pytorch, tf, safetensors, wav2vec2, automatic-speech-recognition, audio, hf-asr-leaderboard, en, dataset:librispeech_asr, arxiv:2006.11477, license:apache-2.0, model-index, endpoints_compatible, region:us
48
49
50
51
52
# Wav2Vec2-Base-960h
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/)
The base model pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. When using the model
facebook / wav2vec2-base-100k-voxpopuli
README.md
model
10 matches
tags:
transformers, pytorch, wav2vec2, pretraining, audio, automatic-speech-recognition, voxpopuli, multilingual, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 100k unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).
**Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model.
facebook / wav2vec2-base-100h
README.md
model
29 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, en, dataset:librispeech_asr, arxiv:2006.11477, license:apache-2.0, endpoints_compatible, region:us
11
12
13
14
15
# Wav2Vec2-Base-100h
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/)
The base model pretrained and fine-tuned on 100 hours of Librispeech on 16kHz sampled speech audio. When using the model
facebook / wav2vec2-base-10k-voxpopuli-ft-cs
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, cs, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in cs (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-de
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, de, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in de (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-en
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, en, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in en (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-es
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, es, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in es (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-fi
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, fi, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in fi (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-fr
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, fr, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in fr (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-hr
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, hr, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in hr (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-hu
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, hu, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in hu (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-it
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, it, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in it (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-nl
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, nl, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in nl (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-pl
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, pl, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in pl (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-ro
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, ro, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in ro (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-sk
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, sk, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in sk (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli-ft-sl
README.md
model
16 matches
tags:
transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, sl, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli-Finetuned
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in sl (refer to Table 1 of paper for more information).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-10k-voxpopuli
README.md
model
10 matches
tags:
transformers, pytorch, wav2vec2, pretraining, audio, automatic-speech-recognition, voxpopuli, multilingual, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us
10
11
12
13
14
# Wav2Vec2-Base-VoxPopuli
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10k unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
facebook / wav2vec2-base-bg-voxpopuli-v2
README.md
model
8 matches
tags:
transformers, pytorch, wav2vec2, pretraining, audio, automatic-speech-recognition, voxpopuli-v2, bg, dataset:voxpopuli, arxiv:2101.00390, license:cc-by-nc-4.0, region:us
13
14
15
16
17
# Wav2Vec2-base-VoxPopuli-V2
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained only in **bg** on **17.6k** unlabeled datat of the [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).
The model is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.