How to do it...

Performing quality control and filtering on high-throughput sequence reads can be done using the following steps:

  1. Load the library and connect to a file:
library(ShortRead)fastq_file <- readFastq(file.path(getwd(), "datasets", "ch8", "ERR019652.fastq.gz") )
  1. Filter reads with any nucleotide with quality lower than 20:
qualities <- rowSums(as(quality(fastq_file), "matrix") <= 20) fastq_file <- fastq_file[qualities == 0] 
  1. Trim the right-hand side of the read:
cut_off_txt <- rawToChar(as.raw(40))trimmed <- trimTails(fastq_file, k =2, a= cut_off_txt)
  1. Set up a custom filter to remove N and homomeric runs:
custom_filter_1 <- nFilter(threshold=0)custom_filter_2 <- polynFilter(threshold = 10, nuc = c("A", "T", "C", "G"))custom_filter ...

Get R Bioinformatics Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.