Chapter 13. Security

As a data scientist, your job is all about data. Some of this data may be very sensitive. Because of this, it’s important that you keep that data secure, and that you’re aware of potential security risks in the code you write. Security is a topic that software engineers are generally very familiar with, but it’s usually not included in data science courses. So in this chapter, I’ll give you an overview of the principles and terminology around security.

The data you work with may include people’s personal data (which may be personally identifiable information, or PII). It could also include data that is important to your company’s business, such as financial data or data about how many customers your company has. This type of data can harm users and your company if it is exposed publicly.

Knowledge of security is particularly important if you are writing production code. But even if this isn’t the case, it’s still useful to know the broad principles. In this chapter, I’ll give an introduction to security, then look at some security risks, with a focus on those risks you are more likely to encounter as a data scientist. I’ll also describe practices to mitigate these risks, and I’ll discuss some risks and security practices specific to machine learning.

What Is Security?

Security for software is concerned with protecting a system from theft of information, damage, disruption, or unwanted access to information. An attacker wishes to gain access to a system and ...

Get Software Engineering for Data Scientists now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.