O'Reilly logo

Bioinformatics with R Cookbook by Paurush Praveen Sinha

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Preprocessing the raw NGS data

FASTQ data has the sequences (the bases) as the corresponding quality scores (Phred) in terms of ASCII characters, as explained in the introductory part of the chapter. Once read into the R workspace, the data is ready to be analyzed. However, it needs some preprocessing to meet the desired conditions on quality and data instance according to our interest. For example, we need higher Phred scores and a particular strand. This preprocessing involves quality assessment and filtering. This recipe will deal with these aspects, specifically filtering and quality checks.

Getting ready

For this recipe, we will use the data downloaded from the SRA database. We will also continue to use the ShortRead library.

How to do it…

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required