Spaces:

observablehq
/

fpdn

Running

fil commited on Jan 28, 2024

Commit

e94b7da

1 Parent(s): d01dd84

typo

Files changed (1) hide show

docs/index.md CHANGED Viewed

@@ -2,7 +2,7 @@
 A new fascinating dataset just dropped on 🤗. [French public domain newspapers](https://huggingface.co/datasets/PleIAs/French-PD-Newspapers) references about 3 million newspapers and periodicals with their full text OCR’ed and some meta-data.
-The data is stored in 320 chunks weighting about 700MB each, each continaing about 7,500 texts.
 The data loader for this Observable project uses DuckDB to read these 320 parquet files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents&nbsp;—, into a single parquet file. It takes only about 1 minute to run in a hugging-face Space.

 A new fascinating dataset just dropped on 🤗. [French public domain newspapers](https://huggingface.co/datasets/PleIAs/French-PD-Newspapers) references about 3 million newspapers and periodicals with their full text OCR’ed and some meta-data.
+The data is stored in 320 chunks weighting about 700MB each, each containing about 7,500 texts.
 The data loader for this Observable project uses DuckDB to read these 320 parquet files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents&nbsp;—, into a single parquet file. It takes only about 1 minute to run in a hugging-face Space.