Chapter 8. Using ML to Create Text
With the release of ChatGPT in 2022, the words generative AI entered the common lexicon. This simple application that allowed you to chat with a cloud-based AI seemed almost miraculous in how it could answer your queries with knowledge of almost everything in human experience. It worked by using a very advanced evolution beyond the recurrent neural networks you saw in the last chapter, by using a technique called transformers.
A transformer learns the patterns that turn one piece of text into another. With a large enough transformer architecture and a large enough set of text to learn from, the GPT model (GPT stands for generative pretrained transformers) could predict the next tokens to follow a piece of text. When GPT was wrapped in an application that made it more user friendly, a whole new industry was born.
While creating models with transformers is beyond the scope of this book, we will look at their architecture in detail in Chapter 15.
The principles involved in training models with transformers can be replicated with smaller, simpler, architectures like RNNs or LSTM. We’ll explore that in this chapter and with a much smaller corpus of text—traditional Irish songs.
So, for example, consider this line of text from a famous TV show:
You know nothing, Jon Snow.
A next-token-predictor model, created with RNNs, came up with these song lyrics in response:
You know nothing, Jon Snow
the place where he’s stationed
be it Cork or in the blue ...