Spaces:

observablehq
/

fpdn

Running

fil commited on Feb 21, 2024

Commit

c193f10

unverified ·

1 Parent(s): 62b0467

revert "typo"

Files changed (1) hide show

docs/index.md CHANGED Viewed

@@ -25,7 +25,7 @@
 <p class=signature>by <a href="https://observablehq.com/@fil">Fil</a>
-This new and fascinating dataset just dropped on Hugging Face&nbsp;: [French public domain newspapers](https://huggingface.co/datasets/PleIAs/French-PD-Newspapers) 🤗 references about **3&nbsp;million newspapers and periodicals** with their full text OCR’ed and some meta-data.
 The data is stored in 320 large parquet files. The data loader for this [Observable framework](https://observablehq.com/framework) project uses [DuckDB](https://duckdb.org/) to read these files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents&nbsp;—, into a single highly optimized parquet file. This takes only about 1 minute to run in a hugging-face Space.

 <p class=signature>by <a href="https://observablehq.com/@fil">Fil</a>
+This new fascinating dataset just dropped on Hugging Face&nbsp;: [French public domain newspapers](https://huggingface.co/datasets/PleIAs/French-PD-Newspapers) 🤗 references about **3&nbsp;million newspapers and periodicals** with their full text OCR’ed and some meta-data.
 The data is stored in 320 large parquet files. The data loader for this [Observable framework](https://observablehq.com/framework) project uses [DuckDB](https://duckdb.org/) to read these files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents&nbsp;—, into a single highly optimized parquet file. This takes only about 1 minute to run in a hugging-face Space.