Nano-Speech Dataset

A compact, general-purpose speech dataset designed for lightweight ASR and speech processing tasks.

Dataset Description

  • Domain: General speech
  • Format: Compressed ZIP archive
  • Size: ~25 GB
  • Access: Public (no authentication required)

Contents

The dataset is provided as a single archive:

  • nano_speech.zip โ€” Contains speech audio files, ready for use in training and evaluation pipelines

Usage

from huggingface_hub import snapshot_download
snapshot_download("edwixx/nano-speech")

Or load directly with ๐Ÿค— Datasets (if a config is defined):

from datasets import load_dataset
dataset = load_dataset("edwixx/nano-speech")

Intended Uses

  • Automatic Speech Recognition (ASR) model training
  • Speech embedding and representation learning
  • Evaluating compact speech models on diverse audio
  • Lightweight / on-device speech processing research

License

This dataset is released under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support