Skip to Content
云端基因组学
book

云端基因组学

by Geraldine A. Van der Auwera, Brian D. O’Connor
April 2022
Beginner to intermediate
486 pages
10h 22m
Chinese
China Electric Power Press Ltd.
Content preview from 云端基因组学
444
14
是为具备实际经验的工具开发者做小规模测试和基准测试而准备的。我们很好奇,
这些工具使用门槛这么高,有多少是因为开发者以专家为目标用户开发工具而造成
的,又有多少是因为它们向普通用户的普及较少而造成的。无论如何,我们还没有
见过生物医学研究者个人使用这些工具,来提供我们所构思的这种可复现且扩展原
研究的工作成果。有鉴于此,我们在自己项目中要克服种种困难也就不足为怪。稍
后我们会讨论,正是前面这点进一步促使我们思考如何利用已取得的成果,方便他
人接受以合成数据作为研究的配套材料这一模式。
下节,我们将揭开我们是如何实现这一部分工作的光辉细节。讲解过程,如果我们
感觉穿插讲讲我们遇到的难题,可以为读者提供有价值的洞察力,或可以添几分滑
稽,让我们的讲解更轻松,那么我们偶尔也会停下来讲讲它们。
14.2.2
从千人基因组受试检索变异数据
如前所述,我们决定以千人基因组项目受试的
VCF
文件为基础,实现合成数据模拟
步骤。我们选择该数据集是因为它是完全公开、可以使用的最大基因组数据集,其
副本可以从
GCS
平台免费获取。然而我们所享受的便利到此为止。我们从一开始就
得克服重重困难,头一个原因是千人基因组变异识别结果当时是以多样本
VCF
文件
形式提供,
VCF
文件存放该项目所有受试的变异识别结果,按染色体切分。而我们
所需要的刚好与之相反:一个单样本
VCF
文件,存放该项目每个受试的所有染色体
的数据。
因此,我们首先要实现一个
WDL
工作流,它接受一个受试的标识符,用
GATK
SelectVariants
工具从每个染色体文件抽取变异识别结果 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

What Successful Project Managers Do

What Successful Project Managers Do

W. Scott Cameron, Jeffrey S. Russell, Edward J. Hoffman, Alexander Laufer
How to Overcome a Power Deficit

How to Overcome a Power Deficit

Cyril Bouquet, Jean-Louis Barsoux
The Human Factor in AI-Based Decision-Making

The Human Factor in AI-Based Decision-Making

Philip Meissner, Christoph Keding

Publisher Resources

ISBN: 9787519864422