book

Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud

Name: Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud
ISBN: 9780135404799

by Paul J. Deitel, Harvey M. Deitel

May 2019

Beginner

528 pages

29h 51m

English

Pearson

Read now

Unlock full access

Includes

Sandbox

Intro to Python® for Computer Science and Data Science
Deitel® Series Page
Intro to Python® for Computer Science and Data Science
Intro to Python® for Computer Science and Data Science
Contents
Preface
Python for Computer Science and Data Science EducationModular Architecture Audiences for the BookKey FeaturesChapter Dependencies Computing and Data Science CurriculaData Science Overlaps with Computer Science28Jobs Requiring Data Science SkillsJupyter Notebooks DockerClass Tested“Flipped Classroom”Special Feature: IBM Watson Analytics and Cognitive ComputingTeaching ApproachSoftware Used in the BookPython Documentation Getting Your Questions AnsweredStudent and Instructor SupplementsInstructor Supplements on Pearson’s Instructor Resource CenterInstructor Examination CopiesKeeping in Touch with the AuthorsAcknowledgmentsAbout the AuthorsAbout Deitel® & Associates, Inc.
Before You Begin
1 Introduction to Computers and Python
ObjectivesOutline
1.1 Introduction
1.2 Hardware and Software
1.2.1 Moore’s Law1.2.2 Computer OrganizationInput UnitOutput UnitMemory UnitArithmetic and Logic Unit (ALU)Central Processing Unit (CPU)Secondary Storage Unit Self Check for Section 1.2

1.3 Data Hierarchy
Self Check
1.4 Machine Languages, Assembly Languages and High-Level Languages
Self Check
1.5 Introduction to Object Technology
Self Check for Section 1.5
1.6 Operating Systems
Self Check for Section 1.6
1.7 Python
Self Check
1.8 It’s the Libraries!
1.8.1 Python Standard Library1.8.2 Data-Science Libraries Self Check for Section 1.8
1.9 Other Popular Programming Languages
Self Check
1.10 Test-Drive: Using IPython and Jupyter Notebooks
1.10.1 Using IPython Interactive Mode as a CalculatorEntering IPython in Interactive ModeEvaluating ExpressionsExiting Interactive Mode Self Check1.10.2 Executing a Python Program Using the IPython InterpreterChanging to This Chapter’s Examples FolderExecuting the ScriptCreating ScriptsProblems That May Occur at Execution Time Self Check1.10.3 Writing and Executing Code in a Jupyter NotebookOpening JupyterLab in Your BrowserCreating a New Jupyter NotebookRenaming the NotebookEvaluating an ExpressionAdding and Executing Another CellSaving the Notebook Notebooks Provided with Each Chapter’s ExamplesOpening and Executing an Existing NotebookClosing JupyterLabJupyterLab TipsMore Information on Working with JupyterLab Self Check
1.11 Internet and World Wide Web
1.11.1 Internet: A Network of Networks1.11.2 World Wide Web: Making the Internet User-Friendly1.11.3 The CloudMashups1.11.4 Internet of Things Self Check for Section 1.11
1.12 Software Technologies
Self Check
1.13 How Big Is Big Data?
Self Check1.13.1 Big Data Analytics1.13.2 Data Science and Big Data Are Making a Difference: Use Cases
1.14 Case Study—A Big-Data Mobile Application
1.15 Intro to Data Science: Artificial Intelligence—at the Intersection of CS and Data Science
Self Check
Exercises
2 Introduction to Python Programming
ObjectivesOutline
2.1 Introduction
2.2 Variables and Assignment Statements
Self Check
2.3 Arithmetic
Self Check
2.4 Function print and an Intro to Single- and Double-Quoted Strings
Self Check
2.5 Triple-Quoted Strings
Self Check
2.6 Getting Input from the User
Self Check
2.7 Decision Making: The if Statement and Comparison Operators
Self Check
2.8 Objects and Dynamic Typing
Self Check
2.9 Intro to Data Science: Basic Descriptive Statistics
Self Check
2.10 Wrap-Up
Exercises
3 Control Statements and Program Development
ObjectivesOutline
3.1 Introduction
3.2 Algorithms
Self Check
3.3 Pseudocode
Self Check
3.4 Control Statements
Self Check
3.5 if Statement
Self Check
3.6 if…else and if…elif…else Statements
Self Check
3.7 while Statement
Self Check
3.8 for Statement
3.8.1 Iterables, Lists and Iterators3.8.2 Built-In range FunctionOff-By-One Errors Self Check
3.9 Augmented Assignments
Self Check
3.10 Program Development: Sequence-Controlled Repetition
3.10.1 Requirements Statement3.10.2 Pseudocode for the Algorithm3.10.3 Coding the Algorithm in PythonExecution PhasesInitialization PhaseProcessing PhaseTermination Phase3.10.4 Introduction to Formatted Strings Self Check
3.11 Program Development: Sentinel-Controlled Repetition
Self Check
3.12 Program Development: Nested Control Statements
Self Check
3.13 Built-In Function range: A Deeper Look
Self Check
3.14 Using Type Decimal for Monetary Amounts
Self Check
3.15 break and continue Statements
3.16 Boolean Operators and, or and not
Self Check
3.17 Intro to Data Science: Measures of Central Tendency—Mean, Median and Mode
Self Check
3.18 Wrap-Up
Exercises
4 Functions
ObjectivesOutline
4.1 Introduction
4.2 Defining Functions
Self Check
4.3 Functions with Multiple Parameters
Self Check
4.4 Random-Number Generation
Self Check
4.5 Case Study: A Game of Chance
Self Check
4.6 Python Standard Library
Self Check
4.7 math Module Functions
4.8 Using IPython Tab Completion for Discovery
Self Check
4.9 Default Parameter Values
Self Check
4.10 Keyword Arguments
Self Check
4.11 Arbitrary Argument Lists
Self Check
4.12 Methods: Functions That Belong to Objects
4.13 Scope Rules
Self Check
4.14 import: A Deeper Look
Self Check
4.15 Passing Arguments to Functions: A Deeper Look
Self Check
4.16 Function-Call Stack
Self Check
4.17 Functional-Style Programming
Pure Functions
4.18 Intro to Data Science: Measures of Dispersion
Self Check
4.19 Wrap-Up
Exercises
5 Sequences: Lists and Tuples
ObjectivesOutline
5.1 Introduction
5.2 Lists
Self Check
5.3 Tuples
Self Check
5.4 Unpacking Sequences
Self Check
5.5 Sequence Slicing
Self Check
5.6 del Statement
Self Check
5.7 Passing Lists to Functions
Self Check
5.8 Sorting Lists
Self Check
5.9 Searching Sequences
Self Check
5.10 Other List Methods
Self Check
5.11 Simulating Stacks with Lists
Self Check
5.12 List Comprehensions
Self Check
5.13 Generator Expressions
Self Check
5.14 Filter, Map and Reduce
Self Check
5.15 Other Sequence Processing Functions
Self Check
5.16 Two-Dimensional Lists
Self Check
5.17 Intro to Data Science: Simulation and Static Visualizations
5.17.1 Sample Graphs for 600, 60,000 and 6,000,000 Die Rolls Self Check5.17.2 Visualizing Die-Roll Frequencies and PercentagesLaunching IPython for Interactive Matplotlib DevelopmentImporting the LibrariesRolling the Die and Calculating Die FrequenciesCreating the Initial Bar PlotSetting the Window Title and Labeling the x- and y-AxesFinalizing the Bar PlotRolling Again and Updating the Bar Plot—Introducing IPython MagicsSaving Snippets to a File with the %save Magic Command-Line Arguments; Displaying a Plot from a Script Self Check
5.18 Wrap-Up
Exercises
Exercises 5.24 through 5.26 are reasonably challenging. Once you’ve done them, you ought to be able to implement many popular card games.
6 Dictionaries and Sets
ObjectivesOutline
6.1 Introduction
6.2 Dictionaries
6.2.1 Creating a DictionaryDetermining if a Dictionary Is Empty Self Check6.2.2 Iterating through a Dictionary Self Check6.2.3 Basic Dictionary OperationsAccessing the Value Associated with a KeyUpdating the Value of an Existing Key–Value PairAdding a New Key–Value PairRemoving a Key–Value PairAttempting to Access a Nonexistent KeyTesting Whether a Dictionary Contains a Specified Key Self Check6.2.4 Dictionary Methods keys and values Dictionary ViewsConverting Dictionary Keys, Values and Key–Value Pairs to ListsProcessing Keys in Sorted Order Self Check6.2.5 Dictionary Comparisons Self Check6.2.6 Example: Dictionary of Student Grades6.2.7 Example: Word Counts2 Python Standard Library Module collections Self Check6.2.8 Dictionary Method update6.2.9 Dictionary Comprehensions Self Check
6.3 Sets
Self Check6.3.1 Comparing Sets Self Check6.3.2 Mathematical Set OperationsUnion Intersection Difference Symmetric Difference Disjoint Self Check6.3.3 Mutable Set Operators and MethodsMutable Mathematical Set OperationsMethods for Adding and Removing Elements Self Check6.3.4 Set Comprehensions
6.4 Intro to Data Science: Dynamic Visualizations
Self Check6.4.1 How Dynamic Visualization Works Animation FramesRunning RollDieDynamic.py Sample Executions Self Check6.4.2 Implementing a Dynamic Visualization Importing the Matplotlib animation ModuleFunction updateFunction update: Rolling the Die and Updating the frequencies ListFunction update: Configuring the Bar Plot and Text Variables Used to Configure the Graph and Maintain StateCalling the animation Module’s FuncAnimation Function Self Check
6.5 Wrap-Up
Exercises
7 Array-Oriented Programming with NumPy
ObjectivesOutline
7.1 Introduction
Self Check
7.2 Creating arrays from Existing Data
Self Check
7.3 array Attributes
Self Check
7.4 Filling arrays with Specific Values
7.5 Creating arrays from Ranges
7.6 List vs. array Performance: Introducing %timeit
7.7 array Operators
7.8 NumPy Calculation Methods
7.9 Universal Functions
7.10 Indexing and Slicing
7.11 Views: Shallow Copies
7.12 Deep Copies
7.13 Reshaping and Transposing
7.14 Intro to Data Science: pandas Series and DataFrames
7.14.1 pandas SeriesCreating a Series with Default IndicesDisplaying a SeriesCreating a Series with All Elements Having the Same ValueAccessing a Series’ ElementsProducing Descriptive Statistics for a SeriesCreating a Series with Custom IndicesDictionary InitializersAccessing Elements of a Series Via Custom IndicesCreating a Series of Strings Self Check7.14.2 DataFramesCreating a DataFrame from a DictionaryCustomizing a DataFrame’s Indices with the index Attribute Accessing a DataFrame’s Columns Selecting Rows via the loc and iloc AttributesSelecting Rows via Slices and Lists with the loc and iloc AttributesSelecting Subsets of the Rows and Columns Boolean IndexingAccessing a Specific DataFrame Cell by Row and ColumnDescriptive StatisticsTransposing the DataFrame with the T AttributeSorting by Rows by Their IndicesSorting by Column IndicesSorting by Column ValuesCopy vs. In-Place Sorting Self Check
7.15 Wrap-Up
Exercises
8 Strings: A Deeper Look
ObjectivesOutline
8.1 Introduction
8.2 Formatting Strings
8.2.1 Presentation TypesIntegersCharactersStringsFloating-Point and Decimal Values Self Check8.2.2 Field Widths and AlignmentExplicitly Specifying Left and Right Alignment in a Field Centering a Value in a Field Self Check8.2.3Numeric FormattingFormatting Positive Numbers with SignsUsing a Space Where a + Sign Would Appear in a Positive ValueGrouping Digits Self Check8.2.4String’s format MethodMultiple PlaceholdersReferencing Arguments By Position NumberReferencing Keyword Arguments Self Check
8.3 Concatenating and Repeating Strings
8.4 Stripping Whitespace from Strings
8.5 Changing Character Case
8.6 Comparison Operators for Strings
8.7 Searching for Substrings
8.8 Replacing Substrings
8.9 Splitting and Joining Strings
8.10 Characters and Character-Testing Methods
8.11 Raw Strings
8.12 Introduction to Regular Expressions
8.12.1 re Module and Function fullmatchMatching Literal CharactersMetacharacters, Character Classes and QuantifiersOther Predefined Character ClassesCustom Character Classes* vs. + QuantifierOther Quantifiers Self Check8.12.2 Replacing Substrings and Splitting StringsFunction sub—Replacing Patterns Function split Self Check8.12.3 Other Search Functions; Accessing MatchesFunction search—Finding the First Match Anywhere in a StringIgnoring Case with the Optional flags Keyword ArgumentMetacharacters That Restrict Matches to the Beginning or End of a StringFunction findall and finditer—Finding All Matches in a StringCapturing Substrings in a Match Self Check
8.13 Intro to Data Science: Pandas, Regular Expressions and Data Munging
Self Check
8.14 Wrap-Up
Exercises
Regular Expression ExercisesMore Challenging String-Manipulation Exercises
9 Files and Exceptions
ObjectivesOutline
9.1 Introduction
9.2 Files
9.3 Text-File Processing
9.3.1 Writing to a Text File: Introducing the with StatementThe with StatementBuilt-In Function open Writing to the File Contents of accounts.txt File Self Check9.3.2 Reading Data from a Text FileFile Method readlinesSeeking to a Specific File Position Self Check
9.4 Updating Text Files
Self Check
9.5 Serialization with JSON
Self Check
9.6 Focus on Security: pickle Serialization and Deserialization
9.7 Additional Notes Regarding Files
Self Check
9.8 Handling Exceptions
9.8.1 Division by Zero and Invalid InputDivision By Zero Invalid Input 9.8.2 try Statementstry Clauseexcept Clauseelse ClauseFlow of Control for a ZeroDivisionError Flow of Control for a ValueError Flow of Control for a Successful Division Self Check9.8.3 Catching Multiple Exceptions in One except Clause9.8.4 What Exceptions Does a Function or Method Raise?9.8.5 What Code Should Be Placed in a try Suite?
9.9 finally Clause
Self Check
9.10 Explicitly Raising an Exception
Self Check
9.11 (Optional) Stack Unwinding and Tracebacks
Self Check
9.12 Intro to Data Science: Working with CSV Files
9.12.1 Python Standard Library Module csvWriting to a CSV FileReading from a CSV FileCaution: Commas in CSV Data FieldsCaution: Missing Commas and Extra Commas in CSV Files Self Check9.12.2 Reading CSV Files into Pandas DataFramesDatasetsWorking with Locally Stored CSV Files 9.12.3 Reading the Titanic Disaster DatasetLoading the Titanic Dataset via a URLViewing Some of the Rows in the Titanic DatasetCustomizing the Column Names9.12.4 Simple Data Analysis with the Titanic Disaster Dataset9.12.5 Passenger Age Histogram Self Check
9.13 Wrap-Up
Exercises
10 Object-Oriented Programming
ObjectivesOutline
10.1 Introduction
10.2 Custom Class Account
10.2.1 Test-Driving Class AccountImporting Classes Account and DecimalCreate an Account Object with a Constructor ExpressionGetting an Account’s Name and BalanceDepositing Money into an Account Account Methods Perform Validation Self Check10.2.2 Account Class DefinitionDefining a Class Initializing Account Objects: Method __init__ Method deposit 10.2.3 Composition: Object References as Members of Classes Self Check
10.3 Controlling Access to Attributes
Self Check
10.4 Properties for Data Access
10.4.1 Test-Driving Class TimeCreating a Time ObjectDisplaying a Time ObjectGetting an Attribute Via a Property Setting the Time Setting an Attribute via a Property Attempting to Set an Invalid Value Self Check10.4.2 Class Time DefinitionClass Time: __init__ Method with Default Parameter ValuesClass Time: hour Read-Write PropertyClass Time: minute and second Read-Write PropertiesClass Time: Method set_time Class Time: Special Method __repr__Class Time: Special Method __str__ Self Check10.4.3 Class Time Definition Design NotesInterface of a ClassAttributes Are Always AccessibleInternal Data RepresentationEvolving a Class’s Implementation DetailsPropertiesUtility MethodsModule datetime Self Check
10.5 Simulating “Private” Attributes
Self Check
10.6 Case Study: Card Shuffling and Dealing Simulation
10.6.1 Test-Driving Classes Card and DeckOfCardsCreating, Shuffling and Dealing the Cards Dealing CardsClass Card’s Other Features10.6.2 Class Card—Introducing Class AttributesClass Attributes FACES and SUITS Card Method __init__ Read-Only Properties face, suit and image_name Methods That Return String Representations of a Card 10.6.3 Class DeckOfCardsMethod __init__Method shuffleMethod deal_cardMethod __str__10.6.4 Displaying Card Images with MatplotlibEnable Matplotlib in IPythonCreate the Base Path for Each ImageImport the Matplotlib FeaturesCreate the Figure and Axes ObjectsConfigure the Axes Objects and Display the ImagesMaximize the Image SizesShuffle and Re-Deal the Deck Self Check
10.7 Inheritance: Base Classes and Subclasses
Self Check
10.8 Building an Inheritance Hierarchy; Introducing Polymorphism
10.8.1 Base Class CommissionEmployeeAll Classes Inherit Directly or Indirectly from Class objectTesting Class CommissionEmployee Self Check10.8.2 Subclass SalariedCommissionEmployeeDeclaring Class SalariedCommissionEmployee Inheriting from Class CommissionEmployeeMethod __init__ and Built-In Function super Overriding Method earningsOverriding Method __repr__Testing Class SalariedCommissionEmployee Testing the “is a” Relationship Self Check10.8.3 Processing CommissionEmployees and SalariedCommissionEmployees Polymorphically Self Check10.8.4A Note About Object-Based and Object-Oriented Programming
10.9 Duck Typing and Polymorphism
10.10 Operator Overloading
Operator Overloading RestrictionsComplex Numbers 10.10.1 Test-Driving Class Complex10.10.2 Class Complex DefinitionMethod __init__ Overloaded + OperatorOverloaded += Augmented AssignmentMethod __repr__ Self Check
10.11 Exception Class Hierarchy and Custom Exceptions
10.12 Named Tuples
Self Check
10.13 A Brief Intro to Python 3.7’s New Data Classes
10.13.1 Creating a Card Data ClassImporting from the dataclasses and typing ModulesUsing the @dataclass DecoratorVariable Annotations: Class AttributesVariable Annotations: Data AttributesDefining a Property and Other MethodsVariable Annotation Notes Self Check10.13.2 Using the Card Data Class Self Check10.13.3 Data Class Advantages over Named Tuples10.13.4 Data Class Advantages over Traditional ClassesMore Information
10.14 Unit Testing with Docstrings and doctest
Self Check
10.15 Namespaces and Scopes
10.16 Intro to Data Science: Time Series and Simple Linear Regression
Self Check
10.17 Wrap-Up
Exercises
11 Computer Science Thinking: Recursion, Searching, Sorting and Big O
ObjectivesOutline
11.1 Introduction
11.2 Factorials
11.3 Recursive Factorial Example
Self Check
11.4 Recursive Fibonacci Series Example
Self Check
11.5 Recursion vs. Iteration
11.6 Self Check
11.6 Searching and Sorting
11.7 Linear Search
Self Check
11.8 Efficiency of Algorithms: Big O
Self Check
11.9 Binary Search
Self Check11.9.1 Binary Search ImplementationFunction binary_search Function remaining_elements Function main 11.9.2 Big O of the Binary Search
11.10 Sorting Algorithms
11.11 Selection Sort
11.11.1 Selection Sort ImplementationFunction selection_sort Function main 11.11.2 Utility Function print_pass 11.11.3 Big O of the Selection Sort Self Check
11.12 Insertion Sort
11.12.1 Insertion Sort ImplementationFunction insertion_sort11.12.2 Big O of the Insertion Sort Self Check
11.13 Merge Sort
11.13.1 Merge Sort ImplementationFunction merge_sort Recursive Function sort_array Function merge Function subarray_string Function main11.13.2 Big O of the Merge Sort Self Check
11.14 Big O Summary for This Chapter’s Searching and Sorting Algorithms
11.15 Visualizing Algorithms
11.15.1 Generator Functionsyield Statements11.15.2 Implementing the Selection Sort Animationimport Statementsupdate Function That Displays Each Animation Frameflash_bars Function That Flashes the Bars About to Be Swappedselection_sort Generator Functionmain Function That Launches the AnimationSound Utility Functions
11.16 Wrap-Up
Exercises
12 Natural Language Processing (NLP)
ObjectivesOutline
12.1 Introduction
12.2 TextBlob1
Self Check12.2.1 Create a TextBlob Self Check12.2.2 Tokenizing Text into Sentences and Words Self Check12.2.3 Parts-of-Speech Tagging Self Check12.2.4 Extracting Noun Phrases Self Check12.2.5 Sentiment Analysis with TextBlob’s Default Sentiment AnalyzerGetting the Sentiment of a TextBlobGetting the polarity and subjectivity from the Sentiment ObjectGetting the Sentiment of a Sentence Self Check12.2.6 Sentiment Analysis with the NaiveBayesAnalyzer Self Check12.2.7 Language Detection and Translation Self Check12.2.8 Inflection: Pluralization and Singularization Self Check12.2.9 Spell Checking and Correction Self Check12.2.10 Normalization: Stemming and Lemmatization Self Check12.2.11 Word Frequencies Self Check12.2.12 Getting Definitions, Synonyms and Antonyms from WordNetGetting DefinitionsGetting SynonymsGetting Antonyms Self Check12.2.13 Deleting Stop Words Self Check12.2.14 n-grams Self Check
12.3 Visualizing Word Frequencies with Bar Charts and Word Clouds
12.3.1 Visualizing Word Frequencies with PandasLoading the DataGetting the Word FrequenciesEliminating the Stop WordsSorting the Words by FrequencyGetting the Top 20 WordsConvert top20 to a DataFrame Visualizing the DataFrame 12.3.2 Visualizing Word Frequencies with Word CloudsInstalling the wordcloud ModuleLoading the TextLoading the Mask Image that Specifies the Word Cloud’s ShapeConfiguring the WordCloud ObjectGenerating the Word CloudSaving the Word Cloud as an Image FileGenerating a Word Cloud from a DictionaryDisplaying the Image with Matplotlib Self Check
12.4 Readability Assessment with Textatistic
Self Check
12.5 Named Entity Recognition with spaCy
Self Check
12.6 Similarity Detection with spaCy
Self Check
12.7 Other NLP Libraries and Tools
12.8 Machine Learning and Deep Learning Natural Language Applications
12.9 Natural Language Datasets
12.10 Wrap-Up
Exercises
13 Data Mining Twitter
ObjectivesOutline
13.1 Introduction
Self Check
13.2 Overview of the Twitter APIs
Self Check
13.3 Creating a Twitter Account
13.4 Getting Twitter Credentials—Creating an App
Self Check
13.5 What’s in a Tweet?
Key Properties of a Tweet ObjectSample Tweet JSONTwitter JSON Object Resources Self Check
13.6 Tweepy
13.7 Authenticating with Twitter Via Tweepy
Self Check
13.8 Getting Information About a Twitter Account
Self Check
13.9 Introduction to Tweepy Cursors: Getting an Account’s Followers and Friends
13.9.1 Determining an Account’s Followers Creating a CursorGetting ResultsAutomatic PagingGetting Follower IDs Rather Than Followers Self Check13.9.2 Determining Whom an Account Follows Self Check13.9.3 Getting a User’s Recent TweetsGrabbing Recent Tweets from Your Own Timeline Self Check
13.10 Searching Recent Tweets
13.11 Spotting Trends: Twitter Trends API
13.11.1 Places with Trending Topics Self Check13.11.2 Getting a List of Trending TopicsWorldwide Trending TopicsNew York City Trending Topics Self Check13.11.3 Create a Word Cloud from Trending Topics Self Check
13.12 Cleaning/Preprocessing Tweets for Analysis
Self Check
13.13 Twitter Streaming API
13.13.1 Creating a Subclass of StreamListener Class TweetListener Class TweetListener: __init__ Method Class TweetListener: on_connect Method Class TweetListener: on_status Method 13.13.2 Initiating Stream ProcessingAuthenticatingCreating a TweetListener Creating a Stream Starting the Tweet StreamAsynchronous vs. Synchronous StreamsOther filter Method ParametersTwitter Restrictions Note Self Check
13.14 Tweet Sentiment Analysis
13.15 Geocoding and Mapping
Self Check13.15.1 Getting and Mapping the TweetsGet the API ObjectCollections Required By LocationListenerCreating the LocationListener Configure and Start the Stream of TweetsDisplaying the Location StatisticsGeocoding the LocationsDisplaying the Bad Location StatisticsCleaning the DataCreating a Map with FoliumCreating Popup Markers for the Tweet LocationsSaving the Map Self Check13.15.2 Utility Functions in tweetutilities.py get_tweet_content Utility Functionget_geocodes Utility Function Self Check13.15.3 Class LocationListener
13.16 Ways to Store Tweets
13.17 Twitter and Time Series
13.18 Wrap-Up
Exercises
14 IBM Watson and Cognitive Computing
Outline
14.1 Introduction: IBM Watson and Cognitive Computing
Self Check
14.2 IBM Cloud Account and Cloud Console
Self Check
14.3 Watson Services
Watson AssistantVisual RecognitionSpeech to TextText to SpeechLanguage TranslatorNatural Language UnderstandingDiscoveryPersonality InsightsTone AnalyzerNatural Language ClassifierSynchronous and Asynchronous Capabilities Self Check
14.4 Additional Services and Tools
Watson StudioKnowledge StudioMachine LearningKnowledge CatalogCognos Analytics Self Check
14.5 Watson Developer Cloud Python SDK
Modules We’ll Need for Audio Recording and PlaybackSDK Examples Self Check
14.6 Case Study: Traveler’s Companion Translation App
Self Check14.6.1 Before You Run the AppRegistering for the Speech to Text Service Registering for the Text to Speech ServiceRegistering for the Language Translator ServiceRetrieving Your Credentials Self Check14.6.2 Test-Driving the AppProcessing the QuestionProcessing the Response Self Check14.6.3 SimpleLanguageTranslator.py Script WalkthroughImporting Watson SDK ClassesOther Imported ModulesMain Program: Function run_translator Function speech_to_text Function translate Function text_to_speech Function record_audio Function play_audio Executing the run_translator Function Self Check
14.7 Watson Resources
Self Check
14.8 Wrap-Up
Exercises
15 Machine Learning: Classification, Regression and Clustering
Outline
15.1 Introduction to Machine Learning
15.1.1 Scikit-LearnWhich Scikit-Learn Estimator Should You Choose for Your Project15.1.2 Types of Machine LearningSupervised Machine LearningDatasetsClassificationRegressionUnsupervised Machine LearningK-Means Clustering and the Iris DatasetBig Data and Big Computer Processing Power15.1.3 Datasets Bundled with Scikit-Learn15.1.4 Steps in a Typical Data Science Study Self Check
15.2 Case Study: Classification with k-Nearest Neighbors and the Digits Dataset, Part 1
Self Check15.2.1 k-Nearest Neighbors AlgorithmHyperparameters and Hyperparameter Tuning Self Check15.2.2 Loading the DatasetDisplaying the DescriptionChecking the Sample and Target SizesA Sample Digit ImagePreparing the Data for Use with Scikit-Learn Self Check15.2.3 Visualizing the DataCreating the Diagram Displaying Each Image and Removing the Axes Labels Self Check15.2.4 Splitting the Data for Training and TestingTraining and Testing Set Sizes Self Check15.2.5 Creating the Model15.2.6 Training the Model Self Check15.2.7 Predicting Digit Classes Self Check
15.3 Case Study: Classification with k-Nearest Neighbors and the Digits Dataset, Part 2
15.3.1 Metrics for Model AccuracyEstimator Method scoreConfusion MatrixClassification ReportVisualizing the Confusion Matrix Self Check15.3.2 K-Fold Cross-ValidationKFold ClassUsing the KFold Object with Function cross_val_score Self Check15.3.3 Running Multiple Models to Find the Best OneScikit-Learn Estimator Diagram Self Check15.3.4 Hyperparameter Tuning Self Check
15.4 Case Study: Time Series and Simple Linear Regression
Self Check
15.5 Case Study: Multiple Linear Regression with the California Housing Dataset
15.5.1 Loading the DatasetLoading the DataDisplaying the Dataset’s Description15.5.2 Exploring the Data with Pandas Self Check15.5.3 Visualizing the Features Self Check15.5.4 Splitting the Data for Training and Testing15.5.5 Training the Model Self Check15.5.6 Testing the Model15.5.7 Visualizing the Expected vs. Predicted Prices15.5.8 Regression Model Metrics Self Check15.5.9 Choosing the Best Model
15.6 Case Study: Unsupervised Machine Learning, Part 1—Dimensionality Reduction
Loading the Digits DatasetCreating a TSNE Estimator for Dimensionality ReductionTransforming the Digits Dataset’s Features into Two DimensionsVisualizing the Reduced DataVisualizing the Reduced Data with Different Colors for Each Digit Self Check
15.7 Case Study: Unsupervised Machine Learning, Part 2—k-Means Clustering
Self Check15.7.1 Loading the Iris DatasetChecking the Numbers of Samples, Features and Targets15.7.2 Exploring the Iris Dataset: Descriptive Statistics with Pandas15.7.3 Visualizing the Dataset with a Seaborn pairplotDisplaying the pairplot in One Color Self Check15.7.4 Using a KMeans EstimatorCreating the EstimatorFitting the ModelComparing the Computer Cluster Labels to the Iris Dataset’s Target Values Self Check15.7.5 Dimensionality Reduction with Principal Component AnalysisCreating the PCA ObjectTransforming the Iris Dataset’s Features into Two DimensionsVisualizing the Reduced Data Self Check15.7.6 Choosing the Best Clustering Estimator
15.8 Wrap-Up
Exercises
16 Deep Learning
ObjectivesOutline
16.1 Introduction
Self Check16.1.1 Deep Learning Applications16.1.2 Deep Learning Demos16.1.3 Keras Resources
16.2 Keras Built-In Datasets
16.3 Custom Anaconda Environments
Self Check
16.4 Neural Networks
Self Check
16.5 Tensors
Self Check
16.6 Convolutional Neural Networks for Vision; Multi-Classification with the MNIST Dataset
Self Check16.6.1 Loading the MNIST Dataset Self Check16.6.2 Data ExplorationVisualizing Digits16.6.3 Data PreparationReshaping the Image DataNormalizing the Image DataOne-Hot Encoding: Converting the Labels From Integers to Categorical Data Self Check16.6.4 Creating the Neural NetworkAdding Layers to the NetworkConvolutionAdding a Convolution LayerDimensionality of the First Convolution Layer’s OutputOverfittingAdding a Pooling LayerAdding Another Convolutional Layer and Pooling LayerFlattening the ResultsAdding a Dense Layer to Reduce the Number of FeaturesAdding Another Dense Layer to Produce the Final OutputPrinting the Model’s SummaryVisualizing a Model’s StructureCompiling the Model Self Check16.6.5 Training and Evaluating the ModelEvaluating the ModelMaking PredictionsLocating the Incorrect PredictionsVisualizing Incorrect PredictionsDisplaying the Probabilities for Several Incorrect Predictions Self Check16.6.6 Saving and Loading a Model Self Check
16.7 Visualizing Neural Network Training with TensorBoard
Self Check
16.8 ConvnetJS: Browser-Based Deep-Learning Training and Visualization
16.9 Recurrent Neural Networks for Sequences; Sentiment Analysis with the IMDb Dataset
Self Check16.9.1 Loading the IMDb Movie Reviews Dataset Self Check16.9.2 Data ExplorationMovie Review EncodingsDecoding a Movie Review16.9.3 Data PreparationSplitting the Test Data into Validation and Test Data Self Check16.9.4 Creating the Neural NetworkAdding an Embedding LayerAdding an LSTM LayerAdding a Dense Output LayerCompiling the Model and Displaying the Summary Self Check16.9.5 Training and Evaluating the Model
16.10 Tuning Deep Learning Models
Self Check
16.11 Convnet Models Pretrained on ImageNet
16.12 Reinforcement Learning
16.12.1 Deep Q-Learning16.12.2 OpenAI Gym
16.13 Wrap-Up
Exercises
Convolutional Neural NetworksRecurrent Neural NetworksConvnetJS VisualizationConvolutional Neural Network Projects and ResearchRecurrent Neural Network Projects and ResearchAutomated Deep Learning ProjectReinforcement Learning Projects and ResearchGenerative Deep LearningDeep FakesAdditional Research
17 Big Data: Hadoop, Spark, NoSQL and IoT
ObjectivesOutline
17.1 Introduction
Self Check for Section 17.1
17.2 Relational Databases and Structured Query Language (SQL)
Self Check17.2.1 A books Database Self Check17.2.2 SELECT Queries17.2.3 WHERE ClausePattern Matching: Zero or More Characters Pattern Matching: Any Character Self Check17.2.4 ORDER BY ClauseSorting By Multiple ColumnsCombining the WHERE and ORDER BY Clauses Self Check17.2.5 Merging Data from Multiple Tables: INNER JOIN Self Check17.2.6 INSERT INTO StatementNote Regarding Strings That Contain Single Quotes17.2.7 UPDATE Statement17.2.8 DELETE FROM Statement Self Check for Section 17.2
17.3 NoSQL and NewSQL Big-Data Databases: A Brief Tour
17.3.1 NoSQL Key–Value Databases17.3.2 NoSQL Document Databases17.3.3 NoSQL Columnar Databases17.3.4 NoSQL Graph Databases17.3.5 NewSQL Databases Self Check for Section 17.3
17.4 Case Study: A MongoDB JSON Document Database
17.4.1 Creating the MongoDB Atlas ClusterCreating Your First Database UserWhitelist Your IP AddressConnect to Your Cluster17.4.2 Streaming Tweets into MongoDBUse Tweepy to Authenticate with TwitterLoading the Senators’ DataConfiguring the MongoClient Setting up Tweet StreamStarting the Tweet StreamClass TweetListenerCounting Tweets for Each SenatorShow Tweet Counts for Each Senator Get the State Locations for Plotting Markers Grouping the Tweet Counts by State Creating the Map Creating a Choropleth to Color the Map Creating the Map Markers for Each State Displaying the Map Self Check for Section 17.4
17.5 Hadoop
17.5.1 Hadoop OverviewHDFS, MapReduce and YARNHadoop EcosystemHadoop ProvidersHadoop 317.5.2 Summarizing Word Lengths in Romeo and Juliet via MapReduce17.5.3 Creating an Apache Hadoop Cluster in Microsoft Azure HDInsightCreating an HDInsight Hadoop Cluster17.5.4 Hadoop Streaming17.5.5 Implementing the Mapper 17.5.6 Implementing the Reducer 17.5.7 Preparing to Run the MapReduce ExampleCopying the Script Files to the HDInsight Hadoop ClusterCopying RomeoAndJuliet into the Hadoop File System17.5.8 Running the MapReduce JobViewing the Word CountsDeleting Your Cluster So You Do Not Incur Charges Self Check for Section 17.5
17.6 Spark
17.6.1 Spark OverviewHistoryArchitecture and ComponentsProviders17.6.2 Docker and the Jupyter Docker StacksDockerInstalling DockerJupyter Docker StacksRun Jupyter Docker StackOpening JupyterLab in Your BrowserAccessing the Docker Container’s Command LineStopping and Restarting a Docker Container17.6.3 Word Count with SparkLoading the NLTK Stop WordsConfiguring a SparkContext Reading the Text File and Mapping It to WordsRemoving the Stop WordsCounting Each Remaining Word Locating Words with Counts Greater Than or Equal to 60Sorting and Displaying the Results17.6.4 Spark Word Count on Microsoft AzureCreate an Apache Spark Cluster in HDInsight Using the Azure PortalInstall Libraries into a ClusterCopying RomeoAndJuliet.txt to the HDInsight ClusterAccessing Jupyter Notebooks in HDInsightUploading the RomeoAndJulietCounter.ipynb NotebookModifying the Notebook to Work with Azure Self Check for Section 17.6
17.7 Spark Streaming: Counting Twitter Hashtags Using the pyspark-notebook Docker Stack
17.7.1 Streaming Tweets to a SocketExecuting the Script in the Docker Containerstarttweetstream.py import StatementsClass TweetListener Main Application 17.7.2 Summarizing Tweet Hashtags; Introducing Spark SQLImporting the Libraries Utility Function to Get the SparkSession Utility Function to Display a Barchart Based on a Spark DataFrameUtility Function to Summarize the Top-20 Hashtags So FarGetting the SparkContext Getting the StreamingContext Setting Up a Checkpoint for Maintaining State Connecting to the Stream via a SocketTokenizing the Lines of HashtagsMapping the Hashtags to Tuples of Hashtag-Count Pairs Totaling the Hashtag Counts So FarSpecifying the Method to Call for Every RDDStarting the Spark Stream Self Check for Section 17.7
17.8 Internet of Things and Dashboards
17.8.1 Publish and Subscribe 17.8.2 Visualizing a PubNub Sample Live Stream with a Freeboard DashboardSigning up for Freeboard.ioCreating a New DashboardAdding a Data SourceAdding a Pane for the Humidity SensorAdding a Gauge to the Humidity PaneAdding a Sparkline to the Humidity PaneCompleting the Dashboard17.8.3 Simulating an Internet-Connected Thermostat in PythonInstalling DweepyInvoking the simulator.py ScriptSending Dweets17.8.4 Creating the Dashboard with Freeboard.io17.8.5 Creating a Python PubNub SubscriberMessage FormatImporting the LibrariesList and DataFrame Used for Storing Company Names and PricesClass SensorSubscriberCallback Function UpdateConfiguring the FigureConfiguring the FuncAnimation and Displaying the WindowConfiguring the PubNub ClientSubscribing to the ChannelEnsuring the Figure Remains on the Screen Self Check for Section 17.8
17.9 Wrap-Up
Exercises
SQL and RDBMS ExercisesNoSQL Database ExercisesHadoop ExercisesSpark ExercisesIoT and Pub/Sub ExercisesPlatform ExercisesOther Exercises
Index

Content preview from Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud

17.5 Hadoop

The next several sections show how Apache Hadoop and Apache Spark deal with big-data storage and processing challenges via huge clusters of computers, massively parallel processing, Hadoop MapReduce programming and Spark in-memory processing techniques. Here, we discuss Apache Hadoop, a key big-data infrastructure technology that also serves as the foundation for many recent advancements in big-data processing and an entire ecosystem of software tools that are continually evolving to support today’s big-data needs.

17.5.1 Hadoop Overview

When Google was launched in 1998, the amount of online data was already enormous with approximately 2.4 million websites²⁰—truly big data. Today there are now nearly two billion websites²¹ (almost ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9780135404799

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud

by Paul J. Deitel, Harvey M. Deitel

17.5 Hadoop

17.5.1 Hadoop Overview

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

Python Crash Course, 3rd Edition

The Essential Machine Learning Foundations: Math, Probability, Statistics, and Computer Science (Video Collection)

Python Fundamentals with Paul Deitel, 2nd Edition

Introducing Python, 3rd Edition

Publisher Resources