O'Reilly logo

Geospatial Data and Analysis by Bill Day, Jon Bruner, Aurelia Moser

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Mapping Data, More Tools, and Analysis

“Often people transition straight from low-scale, antiquated, commercial desktop GIS tools to Big Data tools, overlooking a system that supports ad hoc agility while being able to do subsecond polygon intersection tests against hundreds of millions of polygons if properly indexed,” said Boris Shimanovsky of Factual. Meanwhile, the user experience of “medium-scale” tools continues to improve, which, aided by open source releases, is enabling the creation and sharing of standards that make web cartography translatable between tools and companies. “When the data is not big, such as in the several TB range, [it’s often appropriate] to prototype in PostGreSQL and PostGIS,” notes Shimanovsky. “It’s incredibly versatile and much faster and more scalable than most people realize.”

With the variety of map types and publishing formats currently accessible to mapmakers, choosing the appropriate tools and data for modest to moderate-sized projects can seem daunting. In this chapter, we’ll continue our focus on some medium-scale options for dealing with geospatial data using standard commercial and open source tools. At times, projects extend beyond the small-scale work we’ve seen in previous chapters but don’t quite justify a Big Data solution, so here we’ll be focusing on the platforms and data sources that best support medium-scale products.

This chapter will also include a shallow dive into some toolmakers and cartographers making maps in contemporary corporate and open source communities. We’ll go over some of the approaches to managing the data and software components of mapping projects, and include helpful tips for design and visualization.

Mapping Tools

In 2014, the Washington Post’s Wonkblog republished a map showing the hypothetical view from the coasts of the Americas to their partner nations across the ocean, entitled “If you’re on the beach, this map shows you what’s across the ocean.” With a simple static map, the featured illustration provides data not accessible through unaided human perception, as no one can feasibly see across the ocean to confirm this information.

Echoing this project, Max Galka published a longer tutorial in 2016 explaining how even our reflexive assumptions—those we might form by tracing straight lines across a flat map—might not accurately identify these “ocean partners” because the Earth is curved rather than flat (though most maps suggest otherwise). He proved this by experimenting with other projections (the azimuthal projection, principally) and visualizing the appropriate partners across a great circle arc, developing his 2D proofs with QGIS and open data from Natural Earth (see Figure 4-1).

gaad 0401
Figure 4-1. What’s across the ocean: “Though the lines in the map above appear curved, all of them are actually straight lines (great circles) on the 3D globe” (Source: Max Galka)

As we saw with the choropleths and cartograms mentioned in Chapters 2 and 3 of this report, geospatial mapmaking implies a series of assumptions and representational limitations that displace viewers from their singular perspective on the terrestrial plane, and provide them a bird’s-eye (or astronaut’s-eye) view on a geospatial plane. Tools for designing maps allow the mapmaker to control how skewed that worldview is, and how much of the skew can be communicated to the viewers. The bedrock of this process involves the datasets that the mapmaker chooses to use, and the source material for the bedrock is ubiquitous and often free.

In the following section, we’ll provide some suggestions for sourcing data and some discussion of the partner software that will help you build your geospatial infrastructure for small- to medium-scale mapping.

Open Data for Medium-Scale Projects

At the root of every visualization project, geospatial or otherwise, is some data. Up to this point in our report, we’ve focused on the density or scale of data as the dominant challenge to geospatial visualization, but the data itself, including its inherent provenance issues and reference inaccuracies, also contributes to the complexity of every project. Factual’s Boris Shimanovsky notes that “simply processing this data is a challenge, but there is much more to it. There’s a great deal of uncertainty in places and mobile data. 2D coordinates alone can’t pinpoint that one is at a Starbucks—there is substantial and unquantified GPS error, fraud, altitude, and many unknowns.” Further, as Shimanovsky notes, “our world is probabilistic.” Our perception of it is susceptible to so much mutation in the translation, geocoding, and rectification process that we often lose the integrity of the data we began with.

Ensuring open and reliable geodata often seems like an appropriate responsibility for local governments, coupled with meeting the essential maintenance and accessibility needs of their broad audience. It’s become increasingly common for government offices to be the arbiters of publicly funded data, though the space is still fraught with buggy download utilities and gated access issues. Stefan Avesand of Spotify cites this as a continuing challenge to geospatial work: “There are still some municipalities that try to monetize their publicly funded data or require the completion of forms and human review to obtain it. A national mandate to open data and deposit it centrally would certainly lead to more innovation than town-by-town, county-by-county one-offs.”

For many projects, cleaning and appropriately preparing the data is the most challenging part, and government initiatives often benefit from commercial or industry partners to prod them toward smart formats. Tyler Bell notes that “governments increasingly see the advantages of open geodata, but not uniformly.” The US government recently removed “selective availability” from GPS coordinates in a bout of productive advancement; meanwhile, China maintains a proprietary coordinate system that applies a semirandom offset to every coordinate and thereby befuddles WGS84 system standards. With the help of industry partners, even more progressive enhancements have been pursued in recent years. In 2015, Mapbox partnered with the Humanitarian OpenStreetMap Team (HOT) to demonstrate some estimations of disaster potential in their “Mapping for Hurricane Response in Mexico” project, where before there was only disaster response. In a similar way, CARTO partnered with satellite providers in 2015 in an effort to democratize satellite data for the masses. Mikel Maron’s coverage of the US government’s commitment to “open mapping” and the dual commitment of industry partners speaks to the important and reciprocal influence of industry and governmental entities. Institutionally sponsored and grant-funded initiatives like the Open Geoportal, a web application for search and retrieval of open geodata, are examples of progressive efforts to provides free geospatial information. Open mapping can survive without political endorsement, but where it exists, map projects flourish.

Likewise, the number of industry partners willing to parse and process government data continues to grow. Exversion, mentioned in Chapter 3, maintains a data blog called Happy Endpoints, which recently featured an exploration of Medicare data and payment processes by state. Exversion processed and hosted the Open Payments data for 2013 (almost a gigabyte of content and well worth munging with the support of someone else’s hosting), and then walked through a tutorial of how to leverage that data. Enigma also provides a window to open data that needs a little munging, and perhaps comparatively provides more significant insights when including its library of correlated datasets. An added benefit of working with data from hosting firms is that they often clean and structure the data for you, and in best-case scenarios provide an API through which you can filter and segment that data without exhausting your own machine’s processing power. In other words, these datasets are perfect complements to medium-scale geodata projects, and might be a great way to get your feet wet.

For foundational datasets to support any map project, some of the most useful sources are available for open and free consumption. Natural Earth provides public domain map data at a variety of scales, with physical, terrestrial, and natural features if desired, and you can use the vector or raster tile sets provided through Natural Earth Tiles as the backbone of any web-mapping project. OpenStreetMap (OSM) provides an openly licensed world map developed with the help of volunteers, GPS tracks, and donated data sources. There are multiple ways of accessing this data, some more friendly than others. You can get OSM shapefile data from OpenStreetMapData, and can manually specify a download area via the OSM site, though the results might be spotty, inconsistent, and difficult to manage or align with existing boundary standards. Daily updated data extracts can be downloaded from Geofabrick GmbH, or weekly updated extracts can be downloaded from BBBike.org. Mike Migurski maintained OSM extracts for major world cities for a time, but Mapzen recently inherited this project, now known as Metro Extracts, and has become its de facto maintainer, updating it weekly and providing data in a variety of formats including shape files, GeoJSON, and OSM PBF or OSM XML (Figure 4-2). If you cannot find your area of focus, Mapzen welcomes suggestions for new areas to extract. Extracts agreeably compartmentalize OSM data for quick download and consumption. For bigger projects, you can download the whole planet’s OSM data from Planet OSM, but you should set aside a week or more to download, process, and import this data.

gaad 0402
Figure 4-2. Mapzen’s Metro Extracts page for New York (Source: Mapzen and OSM)

For thematic data, freely available GIS data categorized by topic and focus can be found in the Free GIS Data online catalog of over 300 geospatial sets and tools. Some cities take the extra step of publishing their GIS data as open data for the public to use. For example, Palo Alto, California, publishes much of its municipal data—the locations of trees lining streets, building permits, and more. Data about building permits comes in handy: Ciaran Gilsenan has been collecting the rights to building permit data across the world, published in aggregate as a customized search engine called buildingeye. Are you a building contractor who wants to evaluate market demand by zip code? Are you a homeowner wondering how much a kitchen remodel typically costs in your neighborhood? Are you an inventory manager at Home Depot estimating regional demand for DIY building projects? Better yet, when considering a home purchase, how about doing a join of the permits issued versus the history of inspections? Faulty foundations pop out instantaneously.

It’s important to note that OSM data can also be downloaded for thematic interest areas. OpenStreetMap tags are structured as key/value pairs, and defined in the OpenStreetMap wiki; you can search for thematic interest areas among the tags using TagFinder, which is based on the Taginfo usage analytics for OSM. OSM data is structured as nodes (points, actual points of interest, or parts of ways), ways (linear features like roads or rivers and area boundaries like forests and buildings), and relations (data structures that define relationships between two elements). You can read more about Map Features and naming conventions in the OSM wiki. Let’s take an example of some node or point data, and determine how best to map this data using some conventional open source tools.

Use Case: Mapping Outcrops

Say you have point data about geological phenomena across the United States. Outcrops are places where the bedrock or superficial deposits have become locally exposed and are directly accessible to analysis in OSM. You can extract outcrop data from OSM by searching for the key/value tag pair in Overpass Turbo (Figure 4-3), a service that allows you to query and extract data from OSM based on tags, also featured in the OSM wiki.

gaad 0404
Figure 4-3. Overpass Turbo data extract: geological=outcrop shown (Source: Martin Raifer)

This service is ideal for thematic extracts for medium-sized data. To export data from Overpass Turbo, follow these steps:

  1. Pan manually to an area in Overpass Turbo.

  2. Zoom out/in to the desired bounding box or geoquadrant you would like to select data from; if it’s the entire world, zoom out entirely.

  3. Go to the Wizard (Figure 4-4).

  4. Look up the appropriate OSM tag in the OSM wiki.

  5. Search for “geological=outcrop” or whatever tag pair you wish.

  6. Export your data as GeoJSON or KML.

  7. Upload the data into a GUI interface like geojson.io or CARTO, or pull it directly into your code.

gaad 0405
Figure 4-4. Overpass Turbo Wizard “geological=outcrop” search (Source: Martin Raifer)

For outcrop information, this kind of query works quickly with the Wizard because the landscape of resulting points is not as dense as those resulting from more popular tags in OpenStreetMap. For more popular tags like “highway” or “building,” the query might take much longer or even crash. To avoid this for small to medium datasets, you can also structure your query in the left pane of Overpass Turbo, specifying only nodes, ways, or relations to save yourself processing time and power. Likewise, you can restrict your bounding box to a smaller search area by zooming in on a smaller portion of the map, as larger queries can freeze Overpass Turbo.

Now let’s say you want to lead people to all of the outcrop stations globally. You can use the outcrop.kml file on GitHub. It’s helpful to check on initial test renderings of your geodata, which you can do for no cost and little effort if you load the data into a GUI program like CARTO (Figure 4-5). For a quick list of geoformats that CARTO supports, see the documentation.

gaad 0406
Figure 4-5. Outcrops data in CARTO: “geological=outcrop” search (Source: Aurelia Moser)

Google Maps products happily ingest .kml files, as do many other industry tools like CARTO. Keyhole Markup Language (KML) allows you to package styling information with your geospatial data and is both machine- and human-readable, though more verbose than GeoJSON or other data formats. The following shows how the locations are described:

<?xml version="1.0" encoding="UTF-8"?=<kml xmlns=
"http://www.opengis.net/kml/2.2"=
<Document=<name=overpass-turbo.eu export</name=
<description=Filtered OSM data converted to KML by overpass
turbo.
Copyright: The data included in this document is from
www.openstreetmap.org. The data is made available under ODbL.
Timestamp: 2015-09-14T14:05:02Z</description=<Placemark=
<name=Sonnberg</name=<Point=<coordinates=9.43865,50.6228007
</coordinates=</Point=<ExtendedData=<Data name="@id"=<value=
node/428215780</value=</Data=<Data name="geological"=<value=
outcrop</value=</Data=<Data name="name"=<value=Sonnberg</value=
</Data=<Data name="note"=<value=Rocky promontory, subject to
erosion. Middle Triassic shallow marine limestone,
Jena Formation (U.-Muschelkalk, Wellenkalk). Small outcrop
within E-W trending graben structure.
Contact to MIttlerer Buntsandstein exposed in river bed
southwest of crag.</value=</Data=</ExtendedData=</Placemark=
<Placemark=<name=Monkey rock</name=<Point=<coordinates=
177.127952,-17.7323069</coordinates=</Point=
...

Notice that the point coordinates are in the format <coordinates= longitude, latitude </coordinates=. Some geo-extracts feature notation as lat, long and others as long, lat; other times poor geocoding will route locations to so-called Null Island. For complex datasets, you might encounter multiple KML files, packaged together in a compressed ZIP format called KMZ.

As we’ve touched on in previous chapters, KML, GeoJSON, and shapefiles are possibly some of the most common geodata formats. Each has its pros and cons and your preference will likely depend on the amount of data you have and your chosen tools. Shapefiles, introduced in Chapter 2, were developed by Esri and are commonly used with government datasets like US Census data, as well as by real estate data providers like Zillow and corporate entities. As shapefiles imply multiple files (.shp, .shx, .dbf) packaged together, data management in this format can get unwieldy, is resistant to easy edits, and presents complications for web applications where a single data file to pass and post is ideal. Often, it’s preferable to convert to other formats—.kml or .geojson, for example—when you are building small- to medium-scale applications, as they solve for some of shapefiles’ kludginess. The following listing is the GeoJSON version of the preceding outcrop.kml file example. Notice that any styling information that might be packaged in your KML file will not be added in GeoJSON, so you’d need to add stylistic customizations in your JavaScript. Still, the GeoJSON format is an open standard, human-legible, and often more concise:

{
      "type": "FeatureCollection",
      "generator": "overpass-turbo",
      "copyright": "The data included in this document is from
      www.openstreetmap.org.
The data is made available under ODbL.",
      "timestamp": "2015-09-14T14:05:02Z",
      "features": [
        {
          "type": "Feature",
          "id": "node/428215780",
          "properties": {
            "@id": "node/428215780",
            "geological": "outcrop",
            "name": "Sonnberg",
            "note": "Rocky promontory, subject to erosion.
            Middle Triassic shallow marine limestone,
Jena Formation (U.-  Muschelkalk, Wellenkalk). Small outcrop
within E-W  trending graben structure. Contact to MIttlerer
Buntsandstein exposed in river bed southwest of crag."
          },
          "geometry": {
            "type": "Point",
            "coordinates": [
              9.43865,
              50.6228007
            ]
          }
        },
        {
          "type": "Feature",
          "id": "node/568331113",
          "properties": {
            "@id": "node/568331113",
            "geological": "outcrop",
            "name": "Monkey rock"
          },
          "geometry": {
            "type": "Point",
            "coordinates": [
              177.127952,
              −17.7323069
            ]
          }
        },

Many modern web tools and utilities partner well with GeoJSON datasets. GitHub renders files with the .geojson suffix as maps once they are committed, giving committers and viewers access to both the data and the visualized product. GeoJSONLint also validates and displays GeoJSON data, making editing and generating well-formed files easier. From the outcrop.kml file, you can apply GDAL’s ogr2ogr command-line tool or geojson.io to convert your data into GeoJSON. Beyond that, you can further convert GeoJSON to TopoJSON, which encodes topologies and saves space, à la LilJSON, for projects where storage efficiency is a concern. Ogr2ogr works for reciprocal file conversions across many geoformats, and has a simple syntax for specifying desired output and input files. For example, this line converts back from GeoJSON to KML:

$ ogr2ogr -f KML outcrop.kml outcrop.geojson

Open Tools and Toolkits for Medium-Scale Projects

To review quickly, the previous chapter touched on the distinction between GUI-based tools for mapmaking and scripting languages for a more custom experience, and both options might fall in the domain of small- to medium-scale tools and approaches. We’ve also discussed PostGIS, which projects with both desktop and cloud-based outlets, such as ArcGIS and CARTO. With a PostgreSQL backend, PostGIS extends the provisioning of an object-relational database, enabling complex queries with a suite of sophisticated functions meant for manipulating geospatial information. For modest to moderate-sized datasets, PostGIS interfaces with other tools of the OpenGeo Suite, including QGIS and GeoServer, together building a package suited to many geospatial projects. However, these do not necessarily scale as distributed systems, and if a project supports more varied or voluminous data it might benefit from a NoSQL store or something beyond the tabular design of a relational database management system (RDBMS).

Colloquially, geospatial tool defaults, college curricula, and corporate reflexes still tightly align with desktop and GUI tools. As desktop tools, both ArcMap and QGIS have a GUI interface, and both are often used for spatial queries and automating macros for search and analysis that might otherwise require coding proficiency. All manner of spatial operations are possible in ArcGIS, and (with plug-ins/extensions) in QGIS as well. The bulk of geospatial operations fit into three categories. The first category is feature services: manipulating points, lines, and polygons with queries, additions, deletions, and updates. The second category is geometric services: calculating areas, lengths, or densities; making joins and unions; and creating intersections and convex hull requests. The third category is spatial analysis services: creating buffers and models, collecting points as they intersect polygons, and building extensions with Python. Even off-desktop, there are web services like geocoding, basemaps, directions, and routing that any developer with an account can leverage.

In QGIS, you can load vector, raster, Open Geospatial Consortium (OGC), OSM, and GPS data, as well as spatial information from PostGIS, MSSQL, and Oracle. You can then manipulate, reproject, layer with other data, render, and measure distances, as well as exporting the data in different formats. You can also structure labels, style the map, create annotations, and execute queries or perform spatial operations with the built-in Python console.

One of the most popular QGIS applications is the easy execution of fairly pedestrian geospatial functions, like translating between file types or reorienting data to a new projection. When you are unpacking otherwise opaque geospatial formats (like SHP or KML files), it’s helpful to have a GUI viewer to preview your file data, and QGIS performs this function as well as facilitating translations between files (conversions from KML to SHP, for example). Be sure to note some of the QGIS default settings so you understand the control in your system before starting to experiment. For example, the default projection is WGS84, which might warp your data if your focus area does not suit that orientation, or if your data is beyond the confines of the terrestrial world.

A profusion of software tools and utilities with open orientation continues to disrupt the exclusive nature of “traditional” desktop GIS work, where formal training in geovisualization and cartography might have been a requirement. Command-line utilities and libraries like GDAL pride themselves on “user-oriented documentation,” but often the documentation is still inaccessible to non-programmers. Tutorials like Derek Watkins’s GDAL Cheat Sheet compile some of the most common operations with their definitions or output descriptions. Likewise, an online tool called broc-cli-geo combines the container tech of Docker instances with an open source GUI to simulate geo command-line utilities through a series of web-based chapterized lessons. It helps users learn the basics of geo command-line operations in the browser, without having to worry about installs or dependencies for GDAL. The documentation for GeoPandas is quite robust, but perhaps even more usefully, there are many tutorials to support learning GeoPandas or SciKit tools for geographic work.

The success of these open toolkits among mapmakers has even inspired the desktop GUI world of ArcMap to adopt more friendly open source features to echo its open source partner, QGIS. Tyler Bell maintains that “open source development is the rule rather than the exception, and is a core component of the Mapbox ethos.” Companies like Mapbox maintain and contribute to the Mapbox Vector Tile Spec, Carmen and Mapnik for tile serving, TileReduce for spacing and efficiency, and tools like Turf.js and the iD editor to replicate in the open source arena some of the analysis capabilities and efficiencies of desktop and proprietary GIS tools.

Open libraries for medium-scale analysis of data capably packaged in a Postgres database are increasingly important at a time when Big Data solutions might not be the right fit for everyone; Tom Faulhaber of Planet OS notes that “the rise of the Big Data ecosystems (Hadoop, Spark, etc.) is mostly not yet affecting that space.” The bridge toward an open source stack that can capably accommodate all data scales and sources is slowly being built via the Strata and Spark Summit crowdbase, and at conferences like NACIS, FOSS4G, or anything by OSGeo and LocationTech.

Even when the data volume expands beyond small-scale work, among independent mappers, developers, and web cartographers, open source software remains core to many operations and use cases. For a practical case, to return our outcrop mapping example, you might choose a few common tools that will process and plot your data in a similar number of steps. With the outcrop.kml data, you could make a map with several web-mapmaking tools, most fitting into the same process, with options for self-hosting the finished product. For example:

Using Google Maps
  1. Create an account if you don’t already have a Gmail account.

  2. Click on Import Map in top-lefthand menu (or My Maps → Create Map in some Google Maps UIs).

  3. Upload outcrop.kml.

  4. Explore changing the map features if you’d like.

Using Mapbox
  1. Create an account if you don’t already have one.

  2. Click on the Data tab at the top-righthand corner of the screen.

  3. Click on Import.

  4. Upload outcrop.kml.

  5. Select map features if you would like, then click on Import Features.

  6. Explore changing the map features if you’d like.

Using CARTO (with slight workflow differences for the newer interface)
  1. Create an account if you don’t already have one; use https://carto.com/signup?plan=academy to get boosted features.

  2. Click on Create Map; select Map View at the top of the screen.

  3. Click on the + or Add Layer option at the top of the righthand menu.

  4. Upload outcrop.kml.

  5. Explore changing the map features if you’d like.

In Table 4-1, we reference popular tools in both the open source and commercial spaces, as both types are in common use and they are often coordinated for the same mapping projects, dependent on needs. An anecdote illustrating the mixed architecture of modern web maps can be found in Spark Notebook creator and Data Fellas developer Andy Petrella’s admitted stack, which includes GeoTrellis for raster data processing, GeoServer/Network, OpenLayers, Leaflet, OpenGIS, PostGIS, Oracle + Geo cartridge, as well as JSTS; he notes the unfortunate absence of a vector processing library for any geometries more complex than points. Though there is no prescriptive stack shared by all geo developers, the use of open source technologies and a general understanding of but lack of use case for grid aggregation, “kriging,” or other integrations provided by ArcGIS are common. In most cases, the circumstances and datasets are so unique that there are no one-size-fits-all solutions, particularly when it comes to large-scale datasets. The problem that Petrella alludes to for vector data processing, for example, might partially be solved by tools like Magellan, an open source library for geospatial analytics atop Apache Spark; however, Magellan focuses on point data, so it might not be appropriate for complex geometries beyond points, with topological challenges and processing needs. Often the tools are ill-fitting for the task in geo, or they require customizations when the data scales beyond the typical small to moderate scope.

For most raster projects, geospatial development requires a geospatially enabled database, leveraged by a tile server that generates image tiles from converted data. As explained by Spotify’s Stefan Avesand, tiles relevant to a particular view (dependent on zoom level and window boundaries) are reassembled in a feature layer on top of a basemap, or the frontend user view, in response to a client request. For many in open source, a coordination of Postgres and PostGIS for backend storage and persistence, GeoServer for data publishing, and OpenLayers for client-facing visualization is part of the process of building a standard-scale geospatial project, though the particulars of the technologies and use cases vary.

Table 4-1. Mapping tools by license and development focus, often used together for mapping projectsa
Commercial tools Open source tools

Backend/DB

IBM DB2

ArcGIS Server/ArcSDE

Oracle Spatial and Graph

PostGIS/PostgreSQL

MySQL + Spatial

gsTiles

Application server

GeoServer

• ArcGIS Server

Autodesk MapGuide

ERDAS APOLLO

Intergraph GeoMedia

GeoWebCache

Tomcat

Frontend/UI

Google Maps

Tableau

Bing Maps

Mapbox

geoScore API

ArcGIS Desktop

OpenLayers

OpenStreetMap

QGIS

Boundless Desktop

CARTO Classic + CARTOBuilder

Google Earth

uDig

Simple Tiles

a It’s important to note that this table loosely buckets tools into an area of primary functional focus; some tools, like CARTO, include utilities that serve backend, frontend, and server functions, but the CARTO team emphasizes how their software allows developers to work with a backend store (PostgreSQL) while using front-end and Node.js technologies.

Mapping Software in Scripting Languages

Most software available for developers who work outside of GUIs also comes in a variety of scripting languages for web mapping, at times within and at times beyond this author’s expertise.

In the JavaScript camp, we’ve already discussed the OpenLayers framework and Leaflet library, two tools that support the use of OpenStreetMap data in building maps across browsers and devices; both are equipped with a series of plug-ins and are appropriately featured for most web-mapping projects. In addition, there is gmaps.js, a Google Maps API under the MIT license, providing the same services as the Google Maps process noted earlier. Finally, there is D3.js for “data-driven documents,” built for DOM manipulation with data and SVG and supported by DataMaps, a plug-in designed for map customization in D3.

Where these libraries struggle with legacy browsers or Windows OS, there exist multiple alternatives built on Raphael.js with JQuery dependency, like jHERE, Mapael, Maplace.js, and the particularly robust but lightweight Kartograph (also for Python!). A profusion of “niche” map projects with specific geographic areas, fantastic themes, or otherworldly geographies are also available for creative mappers. Stately focuses on generating US maps with HTML/CSS, solving specifically for the awkward projection issues posed by geometries that accurately display the size and placement of Alaska and Hawaii. GeoChart and Highmaps provide visualization services for maps abstracted from place names and basemap details, and are thus appropriate for choropleths or thematic maps. Among the notable JavaScript plug-ins are Geocomplete, a plug-in for jQuery that provides Google Maps API geocoding and autocomplete in a place names search bar, and map-tools.js, which supports TopoJSON/GeoJSON and Google Maps features.

In the Python camp there are various tools specifically designed for geospatial work, like Shapely for manipulating geometries, Descartes1 for plotting those geometries in Matplotlib, Rtree for querying, the Fiona API and fio cli for writing geodata formats, the geopy wrapper for geocoding, pyshp for reading and writing shapefiles, and even PyQGIS for mirroring your desired GIS features. When reference or geodesic calculations are necessary, pyproj can help you convert between projections, proj.4 will transform coordinate reference systems, and GeographicLib hosts a PyPI package (also in JavaScript!) for measuring geodesic routines. Additionally, some popular libraries, like Pandas for data handling and Matplotlib for anything plotting-related, provide many quick mapping plots and shader tests like those illustrated in Figure 4-6. When coupled with IPython or Jupyter notebooks, they provide a playground of opportunities in a coding and rendering development environment.

gaad 0407
Figure 4-6. Topographic shader demo with NumPy plus Matplotlib (Source: MatPlot Lib Development Team)

For complex analytics on high-resolution satellite images, tiles, and data, utilities like OpenCV for autodetection and recognition of objects and NumPy for data analysis prove sufficient in their feature set to accommodate most geospatial projects. With Global Forest Watch, a project that incorporates Landsat and satellite layers into a complex map tool for monitoring global forest change, developer Robin Kraft echoes this: “OpenCV and NumPy are my main tools. I’ve also started using Kubernetes2 to make data processing workflows work in a distributed environment.” Even beyond the geo-specific toolset, many Python utilities are quite useful for geospatial analyses and visualization, regardless of the dataset or datatype.

More Companies Producing Geospatial Analysis Tools

Supporting the software producers who develop and maintain the aforementioned libraries are a host of companies producing a range of geospatial analytics and visualization tools aimed at citizen cartographers, storytellers, developers, and journalists. Founded on libraries like Leaflet and supportive of OpenStreetMap integration, CARTO is both a platform for creating web maps in the browser and a suite of open source libraries and APIs for doing the same in JavaScript. Likewise, Mapbox allows you to design and publish beautiful maps, in the browser or on mobile, and maintains a strong link to OpenStreetMap tiling data. Mapbox maintains OSM’s associated metadata through its support of the iD map editor,, the development of OpenStreetMap.org, and its “improve this map” link for soliciting contributions to OpenStreetMap. Inspired by the bedrock of open data curation and production and the architecture of existing scripting libraries, these companies produce a mix of tools to support the evolution of mapping communities globally; each provides tools for journalism and storytelling projects and is building an increasingly diverse community of contributors to geospatial visualization.

There are several alternatives for interested parties who seek more or less complexity in their mapping process—for example, CGG GeoSoftware for scientific and algorithmic analytics, or OpenHeatMap, which with one upload and click will convert your spreadsheet into a geospatial analytics tool of its own. And there are even more experiment- and design-focused groups that can help power a singular and sophisticated look and feel for your project: Mapzen—an open source company funded by Samsung and focused on display, search, and navigation—hosts Tangram, a project for 2D and 3D graphics rendering with shaders and styles, and Turn-by-Turn is an open source routing service for client-side applications. Stamen Design, discussed in Chapter 3, is a consulting firm that collaborates with NGOs and institutions to design basemaps and projects that push the boundaries of the static map toward more unique and creative visual stories, augmented not only by data but by a thoughtful approach to design. Each company contributes to a growing ecosystem of open mapping tools aimed at open source in geospatial work.

Geo Tools in the Government Sector

Outside the experimental web-tech space, government entities still remain wedded to traditional GIS, and it’s here that the need for more robust tools to handle a variety of data scopes becomes more apparent. While governments, nonprofits, and NGOs seem to push for open data initiatives, the adoption of modern nonproprietary technologies remains slow. In fact, though the US Department of Defense invests much in Big Data solutions for geospatial analysis and surveillance, the investment of local government in nonproprietary technologies is weak. As John Thayer, senior technologist at the Palo Alto, California, Department of Information Technology, notes, “local government is responsible for the creation of geospatial and big data within the jurisdiction. State and federal government should support the local government data gathering and creation efforts.” The data focus means that most local agencies are comfortable with the importance of open data release and hosting, but perhaps less so with the more experimental web tech and Big Data engines in development of late. At present, the City of Palo Alto, according to Thayer, “uses a Windows operating system, client server, and services, along with Google Fusion Tables and Google Maps. [They] do not contribute code to operating systems.”

Though progress might seem slow in the government sector, it brings up some of the most important scaling issues with geotechnologies, and promotes an interest in scalable solutions that accommodate contemporary needs. In commenting on government tools outside the GUI space, Spotify’s Stefan Avesand notes that “most developers use simple web-based tools, but these scale very badly due to the low performance of JavaScript, lack of caching, and the time it takes to transfer the raw data.”

These projects point toward to increased support for geo software that can handle Big Data, based on popular open source frameworks like Apache Spark, Accumulo, and Cassandra.

Meanwhile, the industry demand for geospatial analytics tools for Big Data has expanded to de rigueur “unicorn” tech ventures such as Uber and Airbnb, the sharing economy mainstays of monetizing data science in Silicon Valley, perhaps as harbingers for what open government could provide for our shared infrastructure. Riley Newman of Airbnb describes the company’s data infrastructure as decidedly open source, consisting of Hadoop, Hive, Presto, and Spark. Their use case focuses on large-scale property and localization data, and their methods understandably focus on consistency and mitigating risk. Open source projects like the Aerosolve machine learning framework, which includes features to “automatically create local neighborhoods” based on k-d trees, kriging, and Airflow (a popular data pipeline coordination tool recently integrated into the Apache suite), have evolved into the stacks of many open source geo projects since their kickstart at Airbnb. As for Uber, their use case focuses on tons of driver trajectory data, and their methods understandably focus on analytics that optimize location-based planning and routing. Their cloud infrastructure recently transitioned from Postgres to MySQL, and the blog post about this decision solicited thoughtful community feedback and discussion on the merits of both data storage tools. Arguably, each use case has superior collection and analysis capabilities compared to most governments, and despite their for-profit business models, their projects and blogs are pushing the industry toward accommodating “bigger” data workflows for open source development.

With calls for scalable Postgres and an “Apache web server” for geospatial analytics on Big Data, the conversation in the open source community is progressing with a promising trajectory. Rainer Sternfeld of Planet OS confirms in reference to Big Data developments: “I think that the dynamics and speed within the open source community will be able to handle this when the time comes, as long as the community is large enough.” Sternfeld continues, “This might very well be based on existing Big Data platforms such as Hadoop or Spark. Already Esri/ArcGIS has open-sourced their geospatial tools for Hive. Performance needs to improve greatly, but the adherence to the SQL specification from the Open Geospatial Consortium [OGC] greatly helps with the integration with visualization software.”

The method to your mapmaking can vary and will often require flexibility and tool adoption beyond your comfort zone, defined by the constraints of your data and the expectations of your audience. As blogger Andy Woodruff wrote in “Web Cartography…That’s Like Google Maps, Right?”: “Cartography is in the thoughtful design of maps, no matter how they are built or delivered.”

Conclusion

The application of analysis technologies to geospatial use cases bring up another point that we’ll tackle in the next chapter: many of the most interesting things happening in geospatial analysis and visualization aren’t achieved with strictly geo tools. Many apply algorithms, computer vision techniques, or methodologies borrowed from math and scalable tech.

The optimism for progressive enhancements is clear among contemporary mapmakers; Director of Data Science Riley Newman of Airbnb notes that “geography is such a critical lens for understanding the world around us that I expect geo tools and datasets to continue to improve.” In the next chapter, we’ll talk about the breakdown of these enhancements as they apply to locative technologies, and new solutions for operating on geospatial data.

1 Fun fact: Descartes etymologically breaks down into “des” and “cartes,” the latter of which is the plural of maps in French. So, the library name translates to “of the maps.” Delightfully appropriate origin story.

2 Kubernetes is an open source toolset for automating deployment and containerized applications.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required