book

Software Engineering for Data Scientists

Name: Software Engineering for Data Scientists
Author: Catherine Nelson
ISBN: 9781098136208

by Catherine Nelson

April 2024

Intermediate to advanced

260 pages

6h 22m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Quizzes

Preface
Who Is This Book For?Why Python?What Is Not in This BookGuide to This BookReading OrderConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. What Is Good Code?
Why Good Code MattersAdapting to Changing RequirementsSimplicityDon’t Repeat Yourself (DRY)Avoid Verbose CodeModularityReadabilityStandards and ConventionsNamesCleaning upDocumentationPerformanceRobustnessErrors and LoggingTestingKey Takeaways
2. Analyzing Code Performance
Methods to Improve PerformanceTiming Your CodeProfiling Your CodecProfileline_profilerMemory Profiling with MemrayTime ComplexityHow to Estimate Time ComplexityBig O NotationKey Takeaways
3. Using Data Structures Effectively
Native Python Data StructuresListsTuplesDictionariesSetsNumPy ArraysNumPy Array FunctionalityNumPy Array Performance ConsiderationsArray Operations Using DaskArrays in Machine Learningpandas DataFramesDataFrame FunctionalityDataFrame Performance ConsiderationsKey Takeaways
4. Object-Oriented Programming and Functional Programming
Object-Oriented ProgrammingClasses, Methods, and AttributesDefining Your Own ClassesOOP PrinciplesFunctional ProgrammingLambda Functions and map()Applying Functions to DataFramesWhich Paradigm Should I Use?Key Takeaways
5. Errors, Logging, and Debugging
Errors in PythonReading Python Error MessagesHandling ErrorsRaising ErrorsLoggingWhat to LogLogging ConfigurationHow to LogDebuggingStrategies for DebuggingTools for DebuggingKey Takeaways
6. Code Formatting, Linting, and Type Checking
Code Formatting and Style GuidesPEP8Import FormattingAutomatic Code Formatting with BlackLintingLinting ToolsLinting in Your IDEType CheckingType AnnotationsType Checking with mypyKey Takeaways
7. Testing Your Code
Why You Should Write TestsWhen to TestHow to Write and Run TestsA Basic TestTesting Unexpected InputsRunning Automated Tests with PytestTypes of TestsUnit TestsIntegration TestsData ValidationData Validation ExamplesUsing Pandera for Data ValidationData Validation with PydanticTesting for Machine LearningTesting Model TrainingTesting Model InferenceKey Takeaways
8. Design and Refactoring
Project Design and StructureProject Design ConsiderationsAn Example Machine Learning ProjectCode DesignModular CodeA Code Design FrameworkInterfaces and ContractsCouplingFrom Notebooks to Scalable ScriptsWhy Use Scripts Instead of Notebooks?Creating Scripts from NotebooksRefactoringStrategies for RefactoringAn Example Refactoring WorkflowKey Takeaways
9. Documentation
Documentation Within the CodebaseNamesCommentsDocstringsReadmes, Tutorials, and Other Longer DocumentsDocumentation in Jupyter NotebooksDocumenting Machine Learning ExperimentsKey Takeaways

10. Sharing Your Code: Version Control, Dependencies, and Packaging
Version Control Using GitHow Does Git Work?Tracking Changes and CommittingRemote and LocalBranches and Pull RequestsDependencies and Virtual EnvironmentsVirtual EnvironmentsManaging Dependencies with pipManaging Dependencies with PoetryPython PackagingPackaging Basicspyproject.tomlBuilding and Uploading PackagesKey Takeaways
11. APIs
Calling an APIHTTP Methods and Status CodesGetting Data from the SDG APICreating Your Own API Using FastAPISetting Up the APIAdding Functionality to Your APIMaking Requests to Your APIKey Takeaways
12. Automation and Deployment
Deploying CodeAutomation ExamplesPre-Commit HooksGitHub ActionsCloud DeploymentsContainers and DockerBuilding a Docker ContainerDeploying an API on Google CloudDeploying an API on Other Cloud ProvidersKey Takeaways
13. Security
What Is Security?Security RisksCredentials, Physical Security, and Social EngineeringThird-Party PackagesThe Python Pickle ModuleVersion Control RisksAPI Security RisksSecurity PracticesSecurity Reviews and PoliciesSecure Coding ToolsSimple Code ScanningSecurity for Machine LearningAttacks on ML SystemsSecurity Practices for ML SystemsKey Takeaways
14. Working in Software
Development Principles and PracticesThe Software Development LifecycleWaterfall Software DevelopmentAgile Software DevelopmentAgile Data ScienceRoles in the Software IndustrySoftware EngineerQA or Test EngineerData EngineerData AnalystProduct ManagerUX ResearcherDesignerCommunityOpen SourceSpeaking at EventsThe Python CommunityKey Takeaways
15. Next Steps
The Future of CodeYour Future in CodeThank You
Index
About the Author

Content preview from Software Engineering for Data Scientists

Chapter 4. Object-Oriented Programming and Functional Programming

In this chapter, I want to introduce you to two styles of programming that you’ll likely encounter in your data science career: object-oriented programming (OOP) and functional programming (FP). It’s extremely helpful to have an awareness of both. Even if you don’t ever write code in either of these styles, you’ll encounter packages that use one or other of them extensively. These include standard Python data science packages such as pandas and Matplotlib. I’d like to equip you with an understanding of OOP and FP so that you can use the code you encounter more effectively.

OOP and FP are programming paradigms based on underlying computer science principles. Some programming languages support only one of them or strongly favor one over the other. For example, Java is an object-oriented language. Python supports both. OOP is more popular as an overall style in Python, but you’ll also see the occasional use of FP.

These styles also give you a framework for ways to break down your code. When you’re writing code, you could just write everything you want to do as one single long script. This would still run just fine, but it’s hard to maintain and debug. As discussed in Chapter 1, it’s important to break code down into smaller chunks, and both OOP and FP can suggest good ways to do this.

In my code, I don’t stick strictly to the principles of either functional or object-oriented programming. I sometimes define my own ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098136192Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Software Engineering for Data Scientists

by Catherine Nelson

Chapter 4. Object-Oriented Programming and Functional Programming

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.