Skip to Content
Sustainable AI
book

Sustainable AI

by Raghavendra Selvan
October 2025
Intermediate to advanced
292 pages
8h 9m
English
O'Reilly Media, Inc.
Content preview from Sustainable AI

Chapter 4. Data Parsimony

Data is the new oil was a common idiom in the early 2010s, used in the context of generating value via digital data. It also unintentionally captures the increasing carbon footprint of storing and processing vast amounts of data. Lifecycle emissions for each terabyte of data on hard drive storage are estimated to be anywhere between 2 and 20kgCO2e per year, as Figure 4-1 illustrates for commonly used storage devices.

Pie chart illustrating the greenhouse gas emissions of storage devices by life stage, highlighting that 93.8% of emissions occur during the use phase.
Figure 4-1. Typical GHG emissions across the lifecycle of storage devices. (Source: Seagate Sustainability Report.)

Large-scale computations on massive amounts of data have been essential to the progress in AI model development, with the most recent LLMs being trained on datasets that consist of more than 15 trillion data points (tokens).1 Not all of the data used for training ML models is informative, however. Uninformative or duplicative data can also contribute to the notion of AI waste that was presented in Chapter 3. Reducing the amount of data used can have a considerable impact on reducing the energy consumption and carbon footprint of selecting and developing AI models.

In this chapter we introduce methods of identifying informative data points and extracting useful information from them. This chapter offers a paradigm of developing DL models while reducing AI waste from a data perspective, which we refer to as data parsimony. It ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Communicate with Teams More Effectively

Communicate with Teams More Effectively

Charles Humble
What Successful Project Managers Do

What Successful Project Managers Do

W. Scott Cameron, Jeffrey S. Russell, Edward J. Hoffman, Alexander Laufer
Six Types of AI Startups, Explained

Six Types of AI Startups, Explained

Jeffrey P. Shay, Thomas H. Davenport

Publisher Resources

ISBN: 9781098155506Errata Page