book

Data Visualization with Python and JavaScript

Name: Data Visualization with Python and JavaScript
Author: Kyran Dale
ISBN: 9781491920510

by Kyran Dale

July 2016

Beginner to intermediate

589 pages

11h 54m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
Introduction
Who This Book Is ForMinimal Requirements to Use This BookWhy Python and JavaScript?Why Not Python on the Browser?Why Python for Data ProcessingPython’s Getting Better All the TimeWhat You’ll LearnThe Choice of LibrariesPreliminariesThe Dataviz Toolchain1. Scraping Data with Scrapy2. Cleaning Data with Pandas3. Exploring Data with Pandas and Matplotlib4. Delivering Your Data with Flask5. Transforming Data into Interactive Visualizations with D3Smaller LibrariesUsing the BookA Little Bit of ContextSummaryRecommended Books
1. Development Setup
The Accompanying CodePythonAnacondaChecking the Anaconda InstallInstalling Extra LibrariesVirtual EnvironmentsJavaScriptContent Delivery NetworksInstalling Libraries LocallyDatabasesInstalling MongoDBIntegrated Development EnvironmentsSummary
I. Basic Toolkit
2. A Language-Learning Bridge Between Python and JavaScript
Similarities and DifferencesInteracting with the CodePythonJavaScriptBasic Bridge WorkStyle Guidelines, PEP 8, and use strictCamelCase Versus UnderscoreImporting Modules, Including ScriptsKeeping Your Namespaces CleanOutputting “Hello World!”Simple Data ProcessingString ConstructionSignificant Whitespace Versus Curly BracketsComments and doc-stringsDeclaring Variables, varStrings and NumbersBooleansData Containers: Dicts, Objects, Lists, ArraysFunctionsIterating: for Loops and Functional AlternativesConditionals: if, else, elif, switchFile Input and OutputClasses and PrototypesDifferences in PracticeMethod ChainingEnumerating a ListTuple UnpackingCollectionsUnderscoreFunctional Array Methods and List ComprehensionsMap, Reduce, and Filter with Python’s LambdasJavaScript Closures and the Module PatternThis Is ThatA Cheat SheetSummary
3. Reading and Writing Data with Python
Easy Does ItPassing Data AroundWorking with System FilesCSV, TSV, and Row-Column Data FormatsJSONDealing with Dates and TimesSQLCreating the Database EngineDefining the Database TablesAdding Instances with a SessionQuerying the DatabaseEasier SQL with DatasetMongoDBDealing with Dates, Times, and Complex DataSummary
4. Webdev 101
The Big PictureSingle-Page AppsTooling UpThe Myth of IDEs, Frameworks, and ToolsA Text-Editing WorkhorseBrowser with Development ToolsTerminal or Command PromptBuilding a Web PageServing Pages with HTTPThe DOMThe HTML SkeletonMarking Up ContentCSSJavaScriptDataChrome’s Developer ToolsThe Elements TabThe Sources TabOther ToolsA Basic Page with PlaceholdersFilling the Placeholders with ContentScalable Vector GraphicsThe <svg> ElementThe <g> ElementCirclesApplying CSS StylesLines, Rectangles, and PolygonsTextPathsScaling and RotatingWorking with GroupsLayering and TransparencyJavaScripted SVGSummary
II. Getting Your Data
5. Getting Data off the Web with Python
Getting Web Data with the requests LibraryGetting Data Files with requestsUsing Python to Consume Data from a Web APIUsing a RESTful Web API with requestsGetting Country Data for the Nobel DatavizUsing Libraries to Access Web APIsUsing Google SpreadsheetsUsing the Twitter API with TweepyScraping DataWhy We Need to ScrapeBeautifulSoup and lxmlA First Scraping ForayGetting the SoupSelecting TagsCrafting Selection PatternsCaching the Web PagesScraping the Winners’ NationalitiesSummary
6. Heavyweight Scraping with Scrapy
Setting Up ScrapyEstablishing the TargetsTargeting HTML with XpathsTesting Xpaths with the Scrapy ShellSelecting with Relative XpathsA First Scrapy SpiderScraping the Individual Biography PagesChaining Requests and Yielding DataCaching PagesYielding RequestsScrapy PipelinesScraping Text and Images with a PipelineSpecifying Pipelines with Multiple SpidersSummary

III. Cleaning and Exploring Data with Pandas
7. Introduction to NumPy
The NumPy ArrayCreating ArraysArray Indexing and SlicingA Few Basic OperationsCreating Array FunctionsCalculating a Moving AverageSummary
8. Introduction to Pandas
Why Pandas Is Tailor-Made for DatavizWhy Pandas Was DevelopedHeterogeneous Data and Categorizing MeasurementsThe DataFrameIndicesRows and ColumnsSelecting GroupsCreating and Saving DataFramesJSONCSVExcel FilesSQLMongoDBSeries into DataFramesPanelsSummary
9. Cleaning Data with Pandas
Coming Clean About Dirty DataInspecting the DataIndices and Pandas Data SelectionSelecting Multiple RowsCleaning the DataFinding Mixed TypesReplacing StringsRemoving RowsFinding DuplicatesSorting DataRemoving DuplicatesDealing with Missing FieldsDealing with Times and DatesThe Full clean_data FunctionSaving the Cleaned DatasetMerging DataFramesSummary
10. Visualizing Data with Matplotlib
Pyplot and Object-Oriented MatplotlibStarting an Interactive SessionInteractive Plotting with Pyplot’s Global StateConfiguring MatplotlibSetting the Figure’s SizePoints, Not PixelsLabels and LegendsTitles and Axes LabelsSaving Your ChartsFigures and Object-Oriented MatplotlibAxes and SubplotsPlot TypesBar ChartsScatter PlotsSeabornFacetGridsPairgridsSummary
11. Exploring Data with Pandas
Starting to ExplorePlotting with PandasGender DisparitiesUnstacking GroupsHistorical TrendsNational TrendsPrize Winners per CapitaPrizes by CategoryHistorical Trends in Prize DistributionAge and Life Expectancy of WinnersAge at Time of AwardLife Expectancy of WinnersIncreasing Life Expectancies over TimeThe Nobel DiasporaSummary
IV. Delivering the Data
12. Delivering the Data
Serving the DataOrganizing Your Flask FilesServing Data with FlaskDelivering Static FilesDynamic Data with FlaskA Simple RESTful API with FlaskUsing Static or Dynamic DeliverySummary
13. RESTful Data with Flask
A RESTful, MongoDB API with EveUsing AJAX to Access the APIDelivering Data to the Nobel Prize VisualizationRESTful SQL with Flask-RestlessCreating the APIAdding CORS SupportQuerying the APISummary
V. Visualizing Your Data with D3
14. Imagining a Nobel Visualization
Who Is It For?Choosing Visual ElementsMenu BarPrizes by YearA Map Showing Selected Nobel CountriesA Bar Chart Showing Number of Winners by CountryA List of the Selected WinnersA Mini-Biography Box with PictureThe Complete VisualizationSummary
15. Building a Visualization
PreliminariesCore ComponentsOrganizing Your FilesServing the DataThe HTML SkeletonCSS StylingThe JavaScript EngineImporting the ScriptsBasic Data FlowThe Core CodeInitializing the Nobel Prize VisualizationReady to GoData-Driven UpdatesFiltering Data with CrossfilterRunning the Nobel Prize Visualization AppSummary
16. Introducing D3—The Story of a Bar Chart
Framing the ProblemWorking with SelectionsAdding DOM ElementsLeveraging D3Measuring Up with D3’s ScalesQuantitative ScalesOrdinal ScalesUnleashing the Power of D3 with Data BindingThe enter MethodAccessing the Bound DataThe Update PatternAxes and LabelsTransitionsSummary
17. Visualizing Individual Prizes
Building the FrameworkScalesAxesCategory LabelsNesting the DataAdding the Winners with a Nested Data-JoinA Little Transitional SparkleSummary
18. Mapping with D3
Available MapsD3’s Mapping Data FormatsGeoJSONTopoJSONConverting Maps to TopoJSOND3 Geo, Projections, and PathsProjectionsPathsGraticulesPutting the Elements TogetherUpdating the MapAdding Value IndicatorsOur Completed MapBuilding a Simple TooltipSummary
19. Visualizing Individual Winners
Building the ListBuilding the Bio-BoxSummary
20. The Menu Bar
Creating HTML Elements with D3Building the Menu BarBuilding the Category SelectorAdding the Gender SelectorAdding the Country SelectorWiring Up the Metric Radio ButtonSummary
21. Conclusion
RecapPart I, Basic ToolkitPart II, Getting Your DataPart III, Cleaning and Exploring Data with PandasPart IV, Delivering the DataPart V, Visualizing Your Data with D3Future ProgressVisualizing Social Media NetworksInteractive Mapping with Leaflet and FoliumMachine-Learning VisualizationsFinal Thoughts
A. Moving from Development to Production
The Starting DirectoryConfigurationConfiguring FlaskConfiguring the JavaScript AppAuthenticationTesting Flask AppsTesting JavaScript AppsDeploying Flask AppsConfiguring ApacheLogging and Error Handling
Index

Content preview from Data Visualization with Python and JavaScript

Chapter 6. Heavyweight Scraping with Scrapy

As your scraping goals get more ambitious, hacking solutions with BeautifulSoup and requests can get very messy very fast. Managing the scraped data as requests spawn more requests gets tricky, and if your requests are being made synchronously, things start to slow down rapidly. A whole load of problems you probably hadn’t anticipated start to make themselves known. It’s at this point that you want to turn to a powerful, robust library that solves all these problems and more. And that’s where Scrapy comes in.

Where BeautifulSoup is a very handy little penknife for fast and dirty scraping, Scrapy is a Python library that can do large-scale data scrapes with ease. It has all the things you’d expect, like built-in caching (with expiration times), asynchronous requests via Python’s Twisted web framework, User-Agent randomization, and a whole lot more. The price for all this power is a fairly steep learning curve, which this chapter is intended to smooth, using a simple example. I think Scrapy is a powerful addition to any dataviz toolkit and really opens up possibilities for web data collection, but if you don’t have any need for heavyweight scraping fu right now, it’s fine to assume we’ve collected our Nobel Prize data and proceed to Part III. Otherwise, let’s buckle our seat belts and see what a real scraping engine can do.

In “Scraping Data”, we managed to scrape a dataset containing all the Nobel Prize winners by name, year, and category. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Data Visualization with Python and JavaScript, 2nd Edition

Publisher Resources

ISBN: 9781491920565Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Data Visualization with Python and JavaScript

by Kyran Dale

Chapter 6. Heavyweight Scraping with Scrapy

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.