9Functions and Multicolumn Operations

The possibility that users have to define custom functions is common in programming languages. It serves the purpose of letting users specify their own set of operations to be executed repeatedly with different data and parameters. Custom functions may implement new functionalities; it is typical to offer them built into independent packages or could be defined to have a more modular structure of the code and for convenience, in order not to rewrite the same sequence of operations simply with different data and parameters. We will see a few basic examples of general user-defined functions.

A particular case is of anonymous functions or lambda functions; the two definitions are synonyms. These ones represent a simplified version of the general user-defined function that turns out to be useful in specific situations when normal constructs or predefined functions are unable to efficiently support a certain kind of operation. This is the case, for example, of some types of column creation or special sorting criteria.

Multicolumn operations are useful to repeat operations on data frame columns, a common requirement in many situations. As usual in data science, size matters, and the scalability of a solution could be the single most important feature. Repeating a single operation on a few columns would not be a hassle in terms of time and efforts, but columns might be dozens, even hundreds, and operations could be more than a single one, so time ...

Get Data Science Fundamentals with R, Python, and Open Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.