How it works...
To select a subset of columns from a DataFrame, use a list of specific column names. For instance, movie[['movie_title', 'director_name']] creates a new DataFrame with only the movie_title and director_name columns. Selecting columns by name is the default behavior of the indexing operator for a pandas DataFrame.
Step 3 neatly organizes all of the column names into separate lists based on their type (discrete or continuous) and by how similar their data is. The most important columns, such as the title of the movie, are placed first.
Step 4 concatenates all of the lists of column names and validates that this new list contains the same exact values as the original column names. Python sets are unordered and the equality statement ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access