Skip to Content
Fine-Tuning AI
book

Fine-Tuning AI

by Laurence Moroney
July 2027
Intermediate to advanced
400 pages
2h 49m
English
O'Reilly Media, Inc.
Content preview from Fine-Tuning AI

Chapter 5. Synthetic Data

Chapter 4 covered the strategies for collecting and cleaning real-world data. But what happens when you’ve exhausted those strategies and still don’t have enough coverage to train a model effectively? Maybe your domain is so specialized that public datasets don’t exist for it, or perhaps the data you need is locked behind privacy regulations (such as real patient questions or financial transactions), and you can’t use it for training even if you do have it!

You’ve tried with the data you have, and you’ve found that it just doesn’t work. Exploring different architecture might work, but if you’re honest with yourself, if you don’t have enough data, what happens next?

This is where synthetic data enters the picture.

The core idea is simple: use a large, capable model (the “teacher”) to generate training examples that a smaller, cheaper model (the “student”) will learn from. The teacher already knows how to answer medical questions, write legal analyses, or debug code. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

LLMOps

LLMOps

Abi Aryan
Visualizing Generative AI

Visualizing Generative AI

Priyanka Vergadia, Valliappa Lakshmanan
Agentic Mesh

Agentic Mesh

Eric Broda, Davis Broda
Evals for AI Engineers

Evals for AI Engineers

Shreya Shankar, Hamel Husain

Publisher Resources

ISBN: 0642572310455Errata Page