book

Python for Geospatial Data Analysis

Name: Python for Geospatial Data Analysis
Author: Bonny P. McClain
ISBN: 9781098104795

by Bonny P. McClain

October 2022

Beginner to intermediate

279 pages

6h 51m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Why Python?How This Book WorksWho Is This Book For?A Few Tips on ToolingFinding Your WayConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Introduction to Geospatial Analytics
Democratizing DataAsking Data QuestionsA Conceptual Framework for Spatial Data ScienceMap ProjectionsVector Data: Places as ObjectsRaster Data: Understanding Spatial RelationshipsEvaluating and Selecting DatasetsSummary
2. Essential Facilities for Spatial Analysis
Exploring Spatial Data in QGISInstalling QGISAdding Basemaps to QGISExploring Data ResourcesVisualizing Environmental Complaints in New York CityUploading Data to QGISSetting the Project CRSUsing the Query Editor to Filter DataVisualizing Population DataThe QGIS Python ConsoleLoading a Raster LayerRedlining: Mapping InequalitiesSummary
3. QGIS: Exploring PyQGIS and Native Algorithms for Spatial Analytics
Exploring the QGIS Workspace: Tree Cover and Inequality in San FranciscoThe Python Plug-inAccessing the DataWorking with Layer PanelsAddressing the Research QuestionWeb Feature Service: Identifying Environmental Threats in MassachusettsAccessing the DataDiscovering AttributesWorking with IteratorsLayer StylingUsing Processing Algorithms in the Python ConsoleWorking with AlgorithmsExtract by ExpressionBufferExtract by LocationSummary
4. Geospatial Analytics in the Cloud: Google Earth Engine and Other Tools
Google Earth Engine SetupUsing the GEE Console and geemapCreating a Conda EnvironmentOpening the Jupyter NotebookInstalling geemap and Other PackagesNavigating geemapLayers and ToolsBasemapsExploring the Landsat 9 Image CollectionWorking with Spectral BandsThe National Land Cover Database BasemapAccessing the DataBuilding a Custom LegendLeafmap: An Alternative to Google Earth EngineSummary
5. OpenStreetMap: Accessing Geospatial Data with OSMnx
A Conceptual Model of OpenStreetMapTagsMultidigraphsInstalling OSMnxChoosing a LocationUnderstanding Arguments and ParametersCalculating Travel TimesBasic Statistical Measures in OSMnxCircuityNetwork Analysis: Circuity in Paris, FranceBetweenness CentralityNetwork TypesCustomizing Your Neighborhood MapsGeometries from PlaceGeometries from AddressWorking with QuickOSM in QGISSummary
6. The ArcGIS Python API
SetupModules Available in the ArcGIS Python APIInstalling ArcGIS ProSetting Up Your EnvironmentInstalling PackagesConnecting to the ArcGIS Python APIConnecting to ArcGIS Online as an Anonymous UserConnecting to an ArcGIS User Account with CredentialsExploring Imagery Layers: Urban Heat Island MapsRaster FunctionsExploring Image AttributesImproving ImagesComparing a Location over Multiple Points in TimeFiltering LayersSummary
7. GeoPandas and Spatial Statistics
Installing GeoPandasWorking with GeoJSON filesCreating a GeoDataFrameWorking with US Census Data: Los Angeles Population Density MapAccessing Tract and Population Data Through the Census API and FTPAccessing Data from the Census API in Your BrowserUsing Data ProfilesCreating the MapSummary
8. Data Cleaning
Checking for Missing DataUploading to ColabNulls and Non-NullsData TypesMetadataSummary StatisticsReplacing Missing ValuesVisualizing Data with MissingnoMapping PatternsLatitude and LongitudeShapefilesSummary
9. Exploring the Geospatial Data Abstraction Library (GDAL)
Setting Up GDALInstalling SpyderInstalling GDALWorking with GDAL at the Command LineEditing Your Data with GDALThe Warp FunctionCapturing Input Raster BandsWorking with the GDAL Library in PythonGetting Oriented in SpyderExploring Your Data in SpyderTransforming Files in GDALUsing the Binmask in GDALThe Complete ScriptExploring Open Source Raster FilesUSGS EarthExplorerCopernicus Open Access HubGoogle Earth EngineSummary

10. Using Python to Measure Climate Data
Example 1: Examining Climate Prediction with Precipitation DataGoalsDownloading Your DataWorking in XarrayCombining Your 2015 and 2021 DatasetsGenerating the ImagesMore ExplorationExample 2: Deforestation and Carbon Emissions in the Amazon Rain Forest Using WTSS SeriesSetupCreating Your MapAnalysisRefinementsExample 3: Modeling and Forecasting Deforestation in Guadeloupe with Forest at RiskSetupPlotting the DataSampling the DataCorrelation PlotsModeling the Probability of Deforestation with the iCAR ModelThe MCMC Distance MatrixModeling Deforestation Probability with predict_raster_binomial_iCARCarbon EmissionsAnalysisSummary
A. Additional Resources
Python Libraries for Geospatial AnalysisResources for Further Exploration
Bibliography
Index
About the Author

Content preview from Python for Geospatial Data Analysis

Chapter 8. Data Cleaning

A universal problem when working with data is understanding the completeness of your data. Data engineering depends on the ability to clean, process, and visualize data. Now that you’re familiar with the basic functionality of and integration of data with notebook-based code editors, either locally in a Jupyter Notebook or in the cloud with Google Colab, it’s time to learn how to clean your data. Data is frequently incomplete (missing), inconsistently formatted, or otherwise inaccurate—problems often called messy data. Data cleaning is the process of addressing these problems and preparing the data for analysis.

In this chapter, we’ll explore some publicly available datasets, finding and cleaning up messes with a few packages that you can load into a Colab notebook. You’re going to work with NYPD_Complaint_Data_Historic, a dataset from the open data portal for New York City, NYC Open Data, updated on July 7, 2021. I filtered the data for 2020 to make it a little more manageable for viewing and manipulating. You can filter the data based on your data question and export it as a CSV file. This chapter will show you how to manage, remove, update, and consolidate data and process it with a few useful Python packages. Data analysis is only as accurate as the quality of the dataset or database, and this chapter will provide tools to assess and address common inconsistencies.

Checking for Missing Data

If you’ve ever participated in a data competition, like those ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098104788Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Python for Geospatial Data Analysis

by Bonny P. McClain

Chapter 8. Data Cleaning

Checking for Missing Data

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.