birgermoell
/

lm-swedish

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

lm-swedish / build_n_gram.py

birgermoell's picture

WIP updated lm

c498527 over 2 years ago

history blame contribute delete

No virus

304 Bytes

	from datasets import load_dataset

	target_lang="sv" # change to your target lang
	username = "hf-test" # change to your username

	dataset = load_dataset(f"{username}/{target_lang}_corpora_parliament_processed", split="train")

	with open("text.txt", "w") as file:
	file.write(" ".join(dataset["text"]))