Skip to Content
云端基因组学
book

云端基因组学

by Geraldine A. Van der Auwera, Brian D. O’Connor
April 2022
Beginner to intermediate
486 pages
10h 22m
Chinese
China Electric Power Press Ltd.
Content preview from 云端基因组学
442
14
原访问
受限数据
合成数据
其技术和生物学特征
与原数据等价
14
-
4
:我们用合成数据集替换无法公开的真实数据集,新数据集模仿原数据集的特征
14.2.1
总体方法论
使用合成基因组数据这种想法并不新鲜,且与原方法差别不是很大;研究者采用合
成数据已有一段时间。例如,
ICGC-TCGA DREAM Mutation
挑战赛就使用合成数据。
该挑战赛由一系列反复举行的竞赛组成,主办方提供的合成数据含特定已知变异。
要求参赛者开发分析方法,以高准确率和特异性识别其中的突变。
多个程序都能生成竞赛用途的合成序列数据;实际上,一些程序研究者开发它们的
部分原因正是为该类竞赛提供数据。生成这类数据的基本原理是根据参考基因组序
列,仿造读段,并将其输出为标准
FASTQ
BAM
文件。这些工具往往接受一个存
放变异识别结果的
VCF
文件作为次要输入,根据该文件修改数据模拟算法,使其所
得序列数据支持
VCF
输入文件所含变异。此外,还有一些程序能够为已存在的序列
数据引入(或插入)变异。你为这种工具提供一组变异,它将修改一段读段,使其
支持期望的变异识别结果。
实际上,在早先一轮的头脑风暴中,我们思考过编辑千人基因组项目真实样本,使
其包含我们感兴趣的变异,并将该数据集剩余样本留作
control
样本。该方法可以避
开生成合成数据这一步。然而我们最初的测试表明低覆盖度(
Low Coverage
)数据
集的外显子组数据的质量不够好,无法满足我们的目的。当时,我们在第
13
章介绍
的高覆盖度(
High Coverage
)数据集尚不可用,而且因为该数据是用 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

What Successful Project Managers Do

What Successful Project Managers Do

W. Scott Cameron, Jeffrey S. Russell, Edward J. Hoffman, Alexander Laufer
How to Overcome a Power Deficit

How to Overcome a Power Deficit

Cyril Bouquet, Jean-Louis Barsoux
The Human Factor in AI-Based Decision-Making

The Human Factor in AI-Based Decision-Making

Philip Meissner, Christoph Keding

Publisher Resources

ISBN: 9787519864422