Spaces:

OFA-Sys
/

OFA-OCR

Runtime error

first commit

ee21b96 over 1 year ago

No virus

1.43 kB

	# Generative Spoken Language Modeling

	* [Paper](https://arxiv.org/abs/2102.01192)
	* [Demo](https://speechbot.github.io/gslm/index.html)

	We build and evaluate generative speech2speech systems using [Log Mel Filtebank](https://pytorch.org/audio/stable/compliance.kaldi.html#fbank), [Modified CPC](https://github.com/facebookresearch/CPC_audio), [HuBERT Base](https://github.com/pytorch/fairseq/tree/main/examples/hubert) and [Wav2Vec 2.0 Large](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec). Our system is composed of three components, namely, speech2unit, ulm and unit2speech. We explain about models and usage of these components in their respective sub-directories. See the links below.

	## Speech to Unit Model (speech2unit)
	Speech to unit model is used for quantizing raw speech into learned discrete speech units. [More details](speech2unit)

	## Unit Language Model (ulm)
	Unit Language Model is a generative language model trained on discrete speech units. [More details](ulm)

	## Unit to Speech Model (unit2speech)
	Unit to speech model is used for synthesizing speech from discrete speech units. [More details](unit2speech)

	## Metrics
	We show how to compute ASR based metrics as well as zero-shot metrics proposed in our paper [here](metrics).

	## Tools
	We share two tools to resynthesize a given spoken utterance, and generate novel spoken language given a spoken prompt. [More detail](tools)