Nano-Speech Dataset
A compact, general-purpose speech dataset designed for lightweight ASR and speech processing tasks.
Dataset Description
- Domain: General speech
- Format: Compressed ZIP archive
- Size: ~25 GB
- Access: Public (no authentication required)
Contents
The dataset is provided as a single archive:
nano_speech.zipโ Contains speech audio files, ready for use in training and evaluation pipelines
Usage
from huggingface_hub import snapshot_download
snapshot_download("edwixx/nano-speech")
Or load directly with ๐ค Datasets (if a config is defined):
from datasets import load_dataset
dataset = load_dataset("edwixx/nano-speech")
Intended Uses
- Automatic Speech Recognition (ASR) model training
- Speech embedding and representation learning
- Evaluating compact speech models on diverse audio
- Lightweight / on-device speech processing research
License
This dataset is released under the MIT License.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support