Skip to Main Content
R Bioinformatics Cookbook
book

R Bioinformatics Cookbook

by Dan MacLean
October 2019
Intermediate to advanced content levelIntermediate to advanced
316 pages
9h 45m
English
Packt Publishing
Content preview from R Bioinformatics Cookbook

How it works...

The first step here is straightforward: we load in the sequences we're interested in and the classes they belong to. Because we're loading the ecoli_protein_classes.txt file into a dataframe, when we need a simple vector, we use the $ subset operator to extract the classes column from the dataframe. Doing so returns that single column in the vector object we need. After this, the workflow is straightforward:

  1. Decide how much of the data should be training and how much should be test: Here, in step 1, we choose 75% of the data as the training set when we create the training_proportion variable. This is used in conjunction with num_seqs in the sample() function to randomly choose indices of the sequences to put into the training ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

R Bioinformatics Cookbook - Second Edition

R Bioinformatics Cookbook - Second Edition

Dan MacLean
R Cookbook, 2nd Edition

R Cookbook, 2nd Edition

JD Long, Paul Teetor

Publisher Resources

ISBN: 9781789950694Supplemental Content