Let's generate a DNA sequence by executing the following steps:
- Now we will generate a list of DNA sequences, loop through the sequences, and split them into individual nucleotides, because we want these to be the input for our algorithm.
- We remove the tab characters, append the class assignment, and add the nucleotides to the dataset, as follows:
sequences = list(data.loc[:, 'Sequence'])dataset = {}for i, seq in enumerate(sequences):nucleotides = list(seq)nucleotides = [x for x in nucleotides if x != '\t']nucleotides.append(classes[i])dataset[i] = nucleotidesprint(dataset[0])
We now have all of our different columns. Each column contains either an individual nucleotide or a base pair. The nucleotides are thymine ...