Chapter 2. Data Parsimony

Data is the new oil was a common idiom in the early 2010’s, used in the context of generating value via digital data. This phrase also unintentionally captures the increasing carbon footprint of storing and processing vast amounts of data. It is estimated that the lifecycle emissions for each TB of data on hard drive storage are anywhere between 2-20 kgCO2e per year, as illustrated for a commonly used storage devices.1

ghg-storage
Figure 2-1. Typical green house gas emissions across the life cycle of storage devices. Data Source: Seagate Sustainability Report

Large-scale computations on massive amounts of data have been essential to the progress in AI model development, with the most recent LLMs being trained on datasets that consist of more than 15 trillion data points (tokens).2 Not all of the data used for training ...

Get Sustainable AI now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.