Chapter 2. Learning the Language of Proteins
Life as we know it operates on proteins. The human genome holds about 20,000 genes, each made of DNA, that serve as blueprints for building different proteins. Some proteins have simple, well-understood functions—like collagen, which provides structural support and elasticity to tissues, or hemoglobin, which transports oxygen and carbon dioxide between the lungs and the rest of the body. Others have slightly more abstract roles: they act as messengers, modulators, or signal carriers, transmitting information within and between cells. For example, insulin is a protein hormone that signals cells to absorb sugar from the bloodstream.
We’ll dive into how DNA and proteins work in more detail soon. But for now, imagine a protein as a blobby molecular machine bumping around in the crowded cell environment, occasionally making productive collisions. Its shape and movement may seem chaotic, but both have been fine-tuned by millions of years of evolution to carry out very specific molecular functions.
One key detail for this chapter: a protein can be represented as a sequence of its constituent building blocks, called amino acids. Just as English uses 26 letters to form words, proteins use an alphabet of 20 amino acids to form long chains with specific shapes and jobs. With that in mind, the goal of this chapter is simple: we’ll train a model to predict a protein’s function given its amino acid sequence. For example:
-
Given the sequence of ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access