Chapter 9Multimodal Manipulation

So far, I have discussed the significant risks associated with the latest generation of LLMs. Unfortunately, language is only the tip of the iceberg. The underlying technology, specifically the transformer architecture, has proven to have broad usefulness that extends far beyond natural language processing (NLP). The transformer architecture has come to be used as a multimodal architecture. The term multimodal implies that models built on top of the architecture can be used to process multiple types of data concurrently—in the case of transformers, combining text data with other types of data such as images, audio, and other more obscure data structures. The ability to integrate other modes of data into this architecture, which itself is so well optimized for scaling, has numerous significant implications of its own. As these new modalities become increasingly integrated into our language models, the risk of abuse only continues to grow. To understand the additional risks related to multimodal transformer models, we should first examine a brief history of transformers.

Converging on Transformers

The transformer model was introduced in a research paper entitled “Attention is all you need” as a feed-forward neural network architecture to optimize the use of deep learning for the purposes of NLP. As discussed throughout the entirety of this book, transformers have worked exceptionally well for language. But in the years that followed that publication, ...

Get The Language of Deception now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.