whereabouts-db / README.md
saunteringcat's picture
Update README.md
888e1b3 verified
metadata
license: other

Whereabouts: Reference databases

This is a space containing reference databases to be used by whereabouts.

Whereabouts is a geocoding package in Python that implements some clever record linkage algorithms in SQL using DuckDB. The package itself is available at whereabouts and can be installed via

pip install whereabouts

Installation of reference databases

Once the package is installed you will need to install a geocoding database, which has been built from a country's or region's address data. This repo contains a collection of these databases for different countries and regions. Currently it has files for

Australia:

  • Whole of country
  • Victoria, Australia
  • New South Wales, Australia

United States:

  • Florida, United States
  • California, United States
  • Massachusetts, United States

More are being added as I get around to cleaning the data and creating the corresponding databases. The file format is <country_abbreviation>_<states>_<size> where <size> is either sm or lg depending on whether the inverted index has been created using pairs of consecutive tokens or trigrams. The large models can handle lower quality address data at the expense of speed.

Example (install the small Australian geocoding database)

python -m whereabouts download au_all_sm

Start geocoding

Once you have installed the package and a database you can start geocoding your data.

from whereabouts.Matcher import Matcher

addresslist = ['122 station st fairfield vic', '643-645 sydney road brsunwick', '504 sydney rd brunswick']

matcher = Matcher(db_name='au_all_sm')
matcher.geocode(addresslist, how='standard')

License Disclaimer for Third-Party Data

Note that while the code from this package is licensed under the MIT license, the pre-built databases use data from data providers that may have restrictions for particular use cases:

Users of this software must comply with the terms and conditions of the respective data licenses, which may impose additional restrictions or requirements. By using this software, you agree to comply with the relevant licenses for any third-party data.