Spaces:
Running
Running
revert "typo"
Browse files- docs/index.md +1 -1
docs/index.md
CHANGED
@@ -25,7 +25,7 @@
|
|
25 |
|
26 |
<p class=signature>by <a href="https://observablehq.com/@fil">Fil</a>
|
27 |
|
28 |
-
This new
|
29 |
|
30 |
The data is stored in 320 large parquet files. The data loader for this [Observable framework](https://observablehq.com/framework) project uses [DuckDB](https://duckdb.org/) to read these files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents —, into a single highly optimized parquet file. This takes only about 1 minute to run in a hugging-face Space.
|
31 |
|
|
|
25 |
|
26 |
<p class=signature>by <a href="https://observablehq.com/@fil">Fil</a>
|
27 |
|
28 |
+
This new fascinating dataset just dropped on Hugging Face : [French public domain newspapers](https://huggingface.co/datasets/PleIAs/French-PD-Newspapers) 🤗 references about **3 million newspapers and periodicals** with their full text OCR’ed and some meta-data.
|
29 |
|
30 |
The data is stored in 320 large parquet files. The data loader for this [Observable framework](https://observablehq.com/framework) project uses [DuckDB](https://duckdb.org/) to read these files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents —, into a single highly optimized parquet file. This takes only about 1 minute to run in a hugging-face Space.
|
31 |
|