Spaces:
Running
Running
simpler
Browse files- docs/index.md +1 -3
docs/index.md
CHANGED
@@ -2,9 +2,7 @@
|
|
2 |
|
3 |
A new fascinating dataset just dropped on 🤗. [French public domain newspapers](https://huggingface.co/datasets/PleIAs/French-PD-Newspapers) references about 3 million newspapers and periodicals with their full text OCR’ed and some meta-data.
|
4 |
|
5 |
-
The data is stored in 320 chunks weighting about 700MB each
|
6 |
-
|
7 |
-
The data loader for this Observable project uses DuckDB to read these 320 parquet files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents —, into a single parquet file. It takes only about 1 minute to run in a hugging-face Space.
|
8 |
|
9 |
The resulting file is small enough (about 8MB) that we can load it in the browser and create “live” charts with Observable Plot.
|
10 |
|
|
|
2 |
|
3 |
A new fascinating dataset just dropped on 🤗. [French public domain newspapers](https://huggingface.co/datasets/PleIAs/French-PD-Newspapers) references about 3 million newspapers and periodicals with their full text OCR’ed and some meta-data.
|
4 |
|
5 |
+
The data is stored in 320 chunks weighting about 700MB each. The data loader for this Observable project uses DuckDB to read these 320 parquet files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents —, into a single parquet file. It takes only about 1 minute to run in a hugging-face Space.
|
|
|
|
|
6 |
|
7 |
The resulting file is small enough (about 8MB) that we can load it in the browser and create “live” charts with Observable Plot.
|
8 |
|