GeoPlatform Data Pipeline

The GeoPlatform provides a suite of managed, highly available, and trusted geospatial data, services, and applications for use by federal agencies and their State, local, Tribal, and regional partners to meet their mission needs and the broader needs of the nation. The GeoPlatform is being implemented to help agencies meet their mission needs, including communicating with and publishing data and maps to the public. The GeoPlatform focuses on web applications that facilitate participatory information sharing, interoperability, user-centered design, and collaboration on the World Wide Web.

National Geospatial Data Asset (NGDA)

For NGDA’s and other selected datasets, GeoPlatform generates services using open standard formats (GeoJSON, GeoPackage, Shapefiles), tiles (Map Vector, XYZ Raster), and OGC Web Services (WMS, WFS).

GeoPlatform Artifact Catalog

A catalog of GeoPlatform-generated data files. These files are automatically converted from the original upstream data sources to GeoPackage, Zipped Shapefiles, GeoJSON formats. These files can be downloaded and used directly in many common spatial applications and analysis packages.

QGIS example using GeoPlatform’s Vector tile service

GeoPlatform Tile Service Catalog

A catalog of GeoPlatform-generated map tiles. These files are automatically converted from the original upstream data sources and can be used directly in many common spatial applications and analysis packages. Each link can be clicked on to bring up a preview of the tile service data.

Tile Service screenshot: TIGER/Line Shapefile, 2021, Nation, U.S., Tribal Census Tracts

https://tileservice.geoplatform.gov/?config=2b12bfb8_cf9f_4c9d_aaeb_72d7a0030f8b&tileType=raster#3/39/-105

Map Services Catalog

A catalog of GeoPlatform WMS/WFS services. These files are automatically converted from the original upstream data sources and can be downloaded and used directly in many common spatial applications and analysis packages. The catalog contains a list of all the layers configured in GeoPlatform and provides previews for each.

The GeoPlatform Data Pipeline

The GeoPlatform Data Pipeline consists of a sequence of AWS services (Lambdas, SQS, and scheduled tasks) which are used to ensure data in GeoPlatform are up to date with published resources in Data.gov. GeoPlatform’s metadata cache and data services are monitored to ensure they remain current with what is published at Data.gov. Emails and reports are generated at crucial steps in the pipeline detailing the state of GeoPlatform’s services.

The Data Pipeline is broken into two major tasks - caching metadata and managing the spatial data and services. Caching the metadata and the source data ensures that the metadata and services in GeoPlatform are kept in sync with the resources published to Data.gov For the geospatial data processing and services GeoPlatform uses the OSGEO GDAL and GeoServer tools.

Each pipeline task is broken up into separate Lambdas that perform specific tasks. The Lambdas are initialized by timers and connected via SQS messages to orchestrate the operations of the pipeline task.

For more information on GeoPlatform’s Harvest and Data Pipeline see the GeoPlatform Harvest Pipeline documentation on the GeoPlatform Demo Site.

GDAL Processing within the Data Pipeline