fil commited on
Commit
8ea911d
·
1 Parent(s): 77379da

incredibly so

Browse files
Files changed (1) hide show
  1. docs/index.md +1 -1
docs/index.md CHANGED
@@ -29,7 +29,7 @@ This new fascinating dataset just dropped on Hugging Face : [French public
29
 
30
  The data is stored in 320 large parquet files. The data loader for this [Observable framework](https://observablehq.com/framework) project uses [DuckDB](https://duckdb.org/) to read these files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents —, into a single highly optimized parquet file. This takes only about 1 minute to run in a hugging-face Space.
31
 
32
- The resulting file is small enough (and almost incredibly small: about 2.5MB, _less than 1 byte per row!_), that we can load it in the browser and create “live” charts with [Observable Plot](https://observablehq.com/plot).
33
 
34
  In this project, I’m exploring two aspects of the dataset:
35
 
 
29
 
30
  The data is stored in 320 large parquet files. The data loader for this [Observable framework](https://observablehq.com/framework) project uses [DuckDB](https://duckdb.org/) to read these files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents —, into a single highly optimized parquet file. This takes only about 1 minute to run in a hugging-face Space.
31
 
32
+ The resulting file is small enough (and incredibly so: the file weighs about 560kB, _only 1.5 bits per row!_), that we can load it in the browser and create “live” charts with [Observable Plot](https://observablehq.com/plot).
33
 
34
  In this project, I’m exploring two aspects of the dataset:
35