Skip to Main Content
R Programming for Bioinformatics
book

R Programming for Bioinformatics

by Robert Gentleman
July 2008
Intermediate to advanced content levelIntermediate to advanced
328 pages
10h 53m
English
Chapman and Hall/CRC
Content preview from R Programming for Bioinformatics
Working with Character Data 175
> mmT = matchPattern(TATA, chr22NoN, max.mismatch = 1)
> length(mmT)
[1] 102104
> mismatch(TATA, mmT[1:3])
[[1]]
[1] 2
[[2]]
[1] 5
[[3]]
[1] 7
5.6.2 Matching many query sequences
Matching a huge number of query sequences to a single target sequence is a
problem that is now relevant due to high throughput sequencing technologies.
These technologies typically yield a large number, sometimes in the tens of
millions, of short reads. One of the bioinformatic tasks is to match these to a
known genome. And the function matchPDict can be used for this. It is based
on the Aho-Corasick algorithm.
The following example is taken from the
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Computation in BioInformatics

Computation in BioInformatics

S. Balamurugan, Anand T. Krishnan, Dinesh Goyal, Balakumar Chandrasekaran, Boomi Pandi
R for Data Science Cookbook (n)

R for Data Science Cookbook (n)

Prabhanjan Narayanachar Tattar, Yu-Wei, Chiu (David Chiu)

Publisher Resources

ISBN: 9781420063677