Chapter 3Commonly Used Processes and Terms

—William B. Weeks and Juan M. Lavista Ferres

Common Processes

Machine learning is a discipline within the field of artificial intelligence that uses particular methodological processes to develop statistical methods to evaluate the models that its application generates. This section describes the processes that are generally used to analyze a dataset.

First, researchers obtain appropriate permissions to use a dataset. Depending on the type of data to be used, this might include verification that the publicly accessible dataset can be used for the purposes of the study or that an institutional review board has reviewed the use of the data.

Next, researchers evaluate the quality and volume of the dataset and perform some initial assessment of the dataset's variables (which might include information obtained from satellite imagery, audio recordings, photographs, videos, geolocators, or medical records, for instance). This data might be cross-sectional, collected at a single point in time, or longitudinal, repeatedly collected over time. Methods that use artificial intelligence or machine learning are very data hungry. If there is inadequate data volume or quality, the study might not be able to be completed or, perhaps, less sophisticated analytic approaches will be required.

Then, researchers randomly split the dataset into a training dataset, a validation dataset, and a testing dataset. Most of the data would be used for training, ...

Get AI for Good now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.