Full Text Search - Hugging Face

Full-text search

models datasets spaces

+ 1,000 results

facebook / wav2vec2-base

README.md

model

11 matches

tags: transformers, pytorch, wav2vec2, pretraining, speech, en, dataset:librispeech_asr, arxiv:2006.11477, license:apache-2.0, endpoints_compatible, region:us

# Wav2Vec2-Base

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/)

The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

facebook / wav2vec2-base-960h

README.md

model

29 matches

tags: transformers, pytorch, tf, safetensors, wav2vec2, automatic-speech-recognition, audio, hf-asr-leaderboard, en, dataset:librispeech_asr, arxiv:2006.11477, license:apache-2.0, model-index, endpoints_compatible, region:us

# Wav2Vec2-Base-960h

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/)

The base model pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. When using the model

facebook / wav2vec2-base-100k-voxpopuli

README.md

model

10 matches

tags: transformers, pytorch, wav2vec2, pretraining, audio, automatic-speech-recognition, voxpopuli, multilingual, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 100k unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).

**Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model.

facebook / wav2vec2-base-100h

README.md

model

29 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, en, dataset:librispeech_asr, arxiv:2006.11477, license:apache-2.0, endpoints_compatible, region:us

# Wav2Vec2-Base-100h

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/)

The base model pretrained and fine-tuned on 100 hours of Librispeech on 16kHz sampled speech audio. When using the model

facebook / wav2vec2-base-10k-voxpopuli-ft-cs

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, cs, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in cs (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-de

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, de, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in de (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-en

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, en, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in en (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-es

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, es, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in es (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-fi

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, fi, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in fi (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-fr

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, fr, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in fr (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-hr

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, hr, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in hr (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-hu

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, hu, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in hu (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-it

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, it, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in it (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-nl

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, nl, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in nl (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-pl

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, pl, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in pl (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-ro

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, ro, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in ro (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-sk

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, sk, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in sk (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli-ft-sl

README.md

model

16 matches

tags: transformers, pytorch, wav2vec2, automatic-speech-recognition, audio, voxpopuli, sl, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli-Finetuned

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in sl (refer to Table 1 of paper for more information).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-10k-voxpopuli

README.md

model

10 matches

tags: transformers, pytorch, wav2vec2, pretraining, audio, automatic-speech-recognition, voxpopuli, multilingual, arxiv:2101.00390, license:cc-by-nc-4.0, endpoints_compatible, region:us

# Wav2Vec2-Base-VoxPopuli

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained on the 10k unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation

facebook / wav2vec2-base-bg-voxpopuli-v2

README.md

model

8 matches

tags: transformers, pytorch, wav2vec2, pretraining, audio, automatic-speech-recognition, voxpopuli-v2, bg, dataset:voxpopuli, arxiv:2101.00390, license:cc-by-nc-4.0, region:us

# Wav2Vec2-base-VoxPopuli-V2

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained only in **bg** on **17.6k** unlabeled datat of the [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).

The model is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.