book

Genomics in the AWS Cloud

Name: Genomics in the AWS Cloud
ISBN: 9781119573371

by David Wall, Catherine Vacher

May 2023

Intermediate to advanced

336 pages

8h 39m

English

Wiley

Read now

Unlock full access

Cover
Title Page
Introduction
Who Should Read This BookGenomicsCloud Computing and AWSWhat You'll Learn from This BookOur StoryGetting Under Way
CHAPTER 1: Why Do Genome Analysis Yourself When Commercial Offerings Exist?
Commercial Sequencing ServicesTypical ResultsSummary
CHAPTER 2: A Crash Course in Molecular Biology
DNA DNA at Work: RNA and ProteinsInheritanceSummary
CHAPTER 3: Obtaining Your Genome
Preparing to Have Your Genome SequencedSpecifying Lab WorkEngaging a LaboratoryGetting a Tissue Sample for DNA ExtractionShipping the SampleReceiving the ResultsSummary
CHAPTER 4: The Bioinformatics Workflow
Extraction of DNAFASTA FilesFASTQ FilesAlignment to a Reference GenomeReference GenomesQuality ControlTrimmingThe Alignment ProcessMarking DuplicatesRecalibrating Base Quality ScoreCalling SNVs and Indel VariantsAnnotating SNVs and Indel VariantsPrioritizing VariantsInheritance AnalysisIdentifying SVs and CNVsBioinformatics WorkflowSummary
CHAPTER 5: AWS Services for Genome Analysis
General ConceptsCustom EnvironmentsSummary
CHAPTER 6: Building Your Environment in the AWS Cloud
Setting Up a Virtual Private CloudSetting Up and Launching an EC2 InstanceSetting Up S3 BucketsConfiguring Your Account SecurelyCreating GroupsCreating UsersSetting Up Your Client EnvironmentSummary
CHAPTER 7: Linux and AWS Command-Line Basics for Genomics
Selecting a Linux DistributionAccessing Your AWS Linux Instance from Your Local ComputerGetting Familiar with the Command LineTransferring Files to and from Your AWS InstanceRunning Programs in the BackgroundUnderstanding File PermissionsCompressing and Archiving FilesManaging LinuxThe AWS Command-Line InterfaceAWS CLI EssentialsAn Alternative Approach: AWS Systems ManagerSummary

CHAPTER 8: Processing theSequencing Data
Getting from Data to InformationSetting Up AWS Services and Data StorageSummary
CHAPTER 9: Visualizing the Genome
Introducing Genome VisualizersInstalling the IGV Desktop VisualizerAnalyzing Variants in IGVSummary
CHAPTER 10: Containerizing Your Workflow on the Desktop
Introducing ContainerizationUnderstanding and Using DockerSummary
CHAPTER 11: Variants and Applications
Polygenic Risk ScoresMetagenomicsAlphaFoldSummary
CHAPTER 12: Cancer Genomics
Somatic GenomesCancerThe Promise and Reality of Cancer Precision MedicineSamplesSomatic Variant AnalysisCopy Number ChangesMeasuring Tumor Genomic InstabilitySummaryNotes
Index
Copyright
Dedication
Acknowledgments
About the Authors
End User License Agreement

Content preview from Genomics in the AWS Cloud

CHAPTER 4The Bioinformatics Workflow

This chapter is concerned with the overall workflow of bioinformatics, in which we go from a tissue sample all the way to a meaningful genome as complete as possible, identifying the places where the test subject's genome differs from the rest of the species.

We cover the process of taking some biological material—some cells—and extracting their genetic material. We explain how to convert the extracted genetic material—the DNA, in the case of most cells from complex organisms—and convert that into data that can be manipulated by an electronic computer. Following that, we show an overview of the bioinformatics workflow that will enable us to derive meaningful information from the data.

The process begins in a “wet lab,” with manipulation of actual cells. (We'll focus on human blood.) From there, a sequencing machine will convert the cells to data—an identified sequence of nucleotides, complete with a statement of how likely each nucleotide identification is to be accurate. After that, it's a matter of cleaning up the sequence to remove as many of its flaws as possible, before aligning it to a reference genome. With the alignment complete, it should be possible to see how the person from whom the sample was taken differs from the species average—and that variation is where the value of genomics is.

In this chapter, we will look at the essentials of practical genetics and bioinformatics. We will examine the processes, tools, and forms of data ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781119573371Purchase Link

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Genomics in the AWS Cloud

by David Wall, Catherine Vacher

CHAPTER 4The Bioinformatics Workflow

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.