Get full access to R Bioinformatics Cookbook and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

How it works...

The first step here is straightforward: we load in the sequences we're interested in and the classes they belong to. Because we're loading the ecoli_protein_classes.txt file into a dataframe, when we need a simple vector, we use the $ subset operator to extract the classes column from the dataframe. Doing so returns that single column in the vector object we need. After this, the workflow is straightforward:

Decide how much of the data should be training and how much should be test: Here, in step 1, we choose 75% of the data as the training set when we create the training_proportion variable. This is used in conjunction with num_seqs in the sample() function to randomly choose indices of the sequences to put into the training ...

Get R Bioinformatics Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now