Before we start coding, let's take a look at the FASTQ file, in which you will have many records, as shown in the following code:
@SRR003258.1 30443AAXX:1:1:1053:1999 length=51 ACCCCCCCCCACCCCCCCCCCCCCCCCCCCCCCCCCCACACACACCAACAC + =IIIIIIIII5IIIIIII>IIII+GIIIIIIIIIIIIII(IIIII01&III
Line 1 starts with @, followed by a sequence identifier and a description string. The description string will vary from a sequencer or a database source, but will normally be amenable to automated parsing.
The second line has the sequence DNA, which is just like a FASTA file. The third line is a +, sometimes followed by the description line on the first line.
The fourth line contains quality values for each base that's read on line two. Each letter ...