Spaces:

observablehq
/

fpdn

Running

fpdn / docs /gazette.md

something readable

1be7f91 about 1 year ago

1.58 kB

	# Gazette

	```js echo
	import { DuckDBClient } from "npm:@observablehq/duckdb";
	const db = DuckDBClient.of({ presse: FileAttachment("data/presse.parquet") });
	```

	This page allows you to explore the 3 million newspapers by title. I called it “Gazette” because I was surprised that most of the corpus in the earlier years had a title containing this word.

	Type in words such as “jeune”, “révolution”, “république”, “soir”, “fille”, “femme”, “paysan”, “ouvrier”, “social”, etc., to see different historical trends.

	```js
	const search = view(
	Inputs.text({ type: "search", value: "gazette", submit: true })
	);
	```

	```js echo
	display(
	Plot.plot({
	x: { nice: true },
	y: {
	label: `Share of titles matching ${search}`,
	tickFormat: "%",
	},
	marks: [
	Plot.areaY(gazette, {
	x: "year",
	y: (d) => d.matches / d.total,
	fillOpacity: 0.2,
	}),
	Plot.lineY(gazette, {
	x: "year",
	y: (d) => d.matches / d.total,
	}),
	],
	})
	);
	```

	The query uses the [REGEXP_MATCHES](https://duckdb.org/docs/archive/0.9.2/sql/functions/patternmatching) operator to count occurrences; you can query for example “socialis[tm]e” to match both “socialiste” and “socialisme”. The 'i' flag makes it ignore case.

	```js echo
	const gazette = db.query(
	`SELECT year
	, SUM(CASE WHEN REGEXP_MATCHES(title, ?, 'i') THEN 1 ELSE 0 END)::int matches
	, COUNT(*) total
	FROM presse
	WHERE year > '1000'
	GROUP BY year
	ORDER BY year
	`,
	[search]
	);
	```