Skip to Main Content
Data Algorithms
book

Data Algorithms

by Mahmoud Parsian
July 2015
Intermediate to advanced content levelIntermediate to advanced
778 pages
17h 9m
English
O'Reilly Media, Inc.
Content preview from Data Algorithms

Chapter 18. DNA Sequencing

Today, genome sequencing machines (such as Illumina’s HiSeq 4000) are able to generate thousands of gigabases of DNA and RNA sequencing data in a few hours for less than US$1,000 (a few years ago, the price was over US$100,000, and sequencing the first human genome cost about US$3 billion). Success in biology and the life sciences depends on our ability to properly analyze the big data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. MapReduce/Hadoop and Spark enable us to compute and analyze thousands of gigabytes/petabytes of data in hours (rather than days or weeks). For example, Spark was recently used to sort 100 TB of data using 206 machines in 23 minutes.1

In simple terms, DNA sequencing is the sequencing of whole genomes (such as human genomes). According to http://dnasequencing.com: “if finding DNA was the discovery of the exact substance holding our genetic makeup information, DNA sequencing is the discovery of the process that will allow us to read that information.” The main function of DNA sequencing is to find the precise order of nucleotides within a DNA molecule. Also, DNA sequencing is used to determine the order of the four bases—adenine (A), guanine (G), cytosine (C), and thymine (T)—in a strand of DNA.

What are some of the challenges of DNA sequencing? There are many, but here are some of the most important ones:

  • There are several sequencing technologies to generate FASTQ ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Algorithms with Spark

Data Algorithms with Spark

Mahmoud Parsian
Algorithms and Data Structures for Massive Datasets

Algorithms and Data Structures for Massive Datasets

Dzejla Medjedovic, Emin Tahirovic, Ines Schweigert
Data Mesh

Data Mesh

Zhamak Dehghani
Learning Algorithms

Learning Algorithms

George Heineman

Publisher Resources

ISBN: 9781491906170Errata PageSupplemental Content