Chapter 2. The State of AI

In this chapter, I’ll offer a brief overview of how the latest breakthroughs in generative AI apply to video, what the high-level limiting factors are, and how we could identify use cases for real-time video applications, as discussed in Chapter 1.

Sticker Shock!

There’s an AI gold rush in progress. In addition to text-based tools that promise to solve every possible writing or programming problem one could imagine, we’ve also seen lots of startups that use AI to enhance your videos. They can make your face look different, replace it with someone else’s, place you convincingly in a generated virtual background, or have you talking in a foreign language with lips synchronized.

There is another group of more speculative video-oriented products. These technologies offer the potential to create short video sequences, complete virtual characters, or even personalized talking avatars that could substitute for your presence in a video. But it’s notable how these products have a tendency to launch with a waiting list; you don’t get access immediately. In reality this is often not because there’s so much pent-up demand for the product, but because the cost of running a public-facing AI video generator service is so exorbitantly high.

Practically all generative AI requires the massive computing power that’s provided by GPUs. These are built mostly by NVIDIA Corporation, and their chips have been in very high demand for the past few years. Cloud providers have ...

Get AI Processing and Automatic Editing for Real-Time Video now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.