CHAPTER 4CASE Statements
In Chapters 2, “The SELECT Statement,” and 3, “The WHERE Clause,” you learned how to specify which columns and rows you want to pull from a database table into your dataset. We used the WHERE clause to filter rows using conditional statements that must evaluate to TRUE in order for a row to be returned.
But what if, instead of using conditional statements to filter rows, you want a column or value in your dataset to be based on a conditional statement? For example, instead of filtering your results to purchases over $50, say you just want to return all rows and create a new column that flags each purchase as being above or below $50? Or, maybe the machine learning algorithm you want to use can't accept a categorical string column as an input feature, so you want to encode those categories into numeric values. These are a version of what SQL developers call “derived columns” or “calculated fields,” and creating new columns that present the values differently is what data scientists call “feature engineering.” This is where CASE statements come in.
CASE Statement Syntax
You use conditional reasoning in your daily life any time you think “If [one condition] is true, then [take this action]. Otherwise, [take this other action].” “If the weather forecast predicts ...
Get SQL for Data Scientists now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.