GeoQuery / docs /data /DATASET_SOURCES.md
GerardCB's picture
Deploy to Spaces (Final Clean)
4851501

Dataset Sources

Complete list of datasets available in GeoQuery with source attributions.


Administrative Boundaries

Panama Admin Levels (HDX)

Source: Humanitarian Data Exchange
Provider: INEC (National Institute of Statistics and Census)
Year: 2021
URL: https://data.humdata.org/dataset/panama-administrative-boundaries

Files:

  • hdx/pan_admin1_2021.geojson - 10 provinces + comarcas
  • hdx/pan_admin2_2021.geojson - 81 districts
  • hdx/pan_admin3_2021.geojson - 679 corregimientos

License: Creative Commons Attribution


Infrastructure

Roads (OpenStreetMap via Geofabrik)

Source: OpenStreetMap
Provider: Geofabrik
URL: https://download.geofabrik.de/central-america/panama.html

Files:

  • osm/roads.geojson - Highway network (motorways, primary, secondary roads)

License: ODbL (Open Database License)

Healthcare (Healthsites.io)

Source: Healthsites.io / OpenStreetMap
URL: https://healthsites.io/

Files:

  • osm/healthsites.geojson - 986 healthcare facilities

License: ODbL

Education (OpenStreetMap)

Source: OpenStreetMap

Files:

  • osm/universities.geojson - 67 universities
  • osm/schools.geojson - Schools and educational facilities

License: ODbL

Other POI (OpenStreetMap)

Files:

  • osm/traffic.geojson - Traffic signals and intersections
  • osm/amenities.geojson - Various amenities
  • osm/buildings.geojson - Building footprints

Socioeconomic

World Bank Development Indicators

Source: World Bank Open Data
URL: https://data.worldbank.org/

Files:

  • worldbank/indicators.geojson - Country-level indicators joined with geometry

Indicators Available:

  • GDP per capita
  • Life expectancy
  • Access to electricity
  • Internet users (% of population)
  • And more...

License: Creative Commons Attribution 4.0

Multidimensional Poverty Index (MPI)

Source: UNDP / Government of Panama

Files:

  • socioeconomic/mpi_panama.geojson - Poverty index by district

License: Open Data

Province Socioeconomic Data

Source: INEC Census 2023 (processed)

Files:

  • socioeconomic/province_socioeconomic.geojson - Province-level statistics

Metrics:

  • Population estimates
  • Area
  • Demographics

Population

Kontur Population Dataset

Source: Kontur
Provider: Meta/Facebook population estimates
URL: https://data.humdata.org/organization/kontur

Files:

  • kontur/kontur_population_PA_20220630.geojson - 33,000+ H3 hexagons

Description: High-resolution population density grid using H3 spatial index

License: Creative Commons Attribution International


Environmental

STRI GIS Portal

Source: Smithsonian Tropical Research Institute
URL: https://stridata-si.opendata.arcgis.com/

Files:

  • stri/protected_areas_2025.geojson - Protected areas
  • stri/forest_cover_2021.geojson - Forest cover classification

License: CC BY 4.0


Global Datasets

Natural Earth

Source: Natural Earth Data
URL: https://www.naturalearthdata.com/

Files:

  • global/countries_110m.geojson - Country boundaries (low resolution)

License: Public Domain


Dataset Statistics

Category Datasets Total Features
Administrative 3 ~770
Infrastructure 8 ~50,000
Socioeconomic 3 ~100
Population 1 33,000
Environmental 2 ~500
Global 1 177

Total: ~100 datasets, ~85,000 features


Data Update Schedule

Dataset Update Frequency Last Updated
OSM Data Monthly 2026-01
Admin Boundaries Yearly 2021
Kontur Population Quarterly 2022-06
STRI Environmental As released 2025
World Bank Annually 2023

Adding New Datasets

See ../backend/SCRIPTS.md for data ingestion procedures.

Quick Steps

  1. Download GeoJSON file
  2. Place in appropriate backend/data/ subdirectory
  3. Add entry to backend/data/catalog.json:
    "my_dataset": {
      "path": "category/my_dataset.geojson",
      "description": "Short description",
      "semantic_description": "Detailed description for AI",
      "categories": ["category"],
      "tags": ["tag1", "tag2"]
    }
    
  4. Regenerate embeddings:
    rm backend/data/embeddings.npy
    python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()"
    

Data Licenses Summary

  • OpenStreetMap: ODbL (share-alike, attribution required)
  • HDX/Government: CC BY (attribution required)
  • World Bank: CC BY 4.0
  • Natural Earth: Public Domain
  • STRI: CC BY 4.0
  • Kontur: CC BY International

All datasets permit commercial use with proper attribution.


Attribution in App

GeoQuery automatically generates citations for query results:

{
  "data_citations": [
    "Administrative boundary data from HDX/INEC, 2021",
    "Healthcare facilities from OpenStreetMap via Healthsites.io"
  ]
}

These appear in the chat response for user transparency.


Next Steps