Skip to Content
云端基因组学
book

云端基因组学

by Geraldine A. Van der Auwera, Brian D. O’Connor
April 2022
Beginner to intermediate
486 pages
10h 22m
Chinese
China Electric Power Press Ltd.
Content preview from 云端基因组学
302
10
10.3
理解和优化工作流的效率
你有没有注意用
PAPI
运行
scatter-haplotypecaller.wdl
需要多长时间吗?大约
10
钟,对吧?你还记得第
8
章你在虚拟机上运行它要多长时间吗?大约
2
分钟?这就
是说,在多台机器上并行运行同一工作流所用时间是在单台机器上运行相同作业的
5
倍。这也太糟糕了吧!
幸好该工作流主要目的是演示,其作业规模非常小。我们一直使用的这组片段,仅
覆盖基因组一个很小区域,并且
HaplotypeCaller
自身用很少时间就能处理这么短
的片段。因此当你在虚拟机本地运行工作流,
Cromwell
实际并没有多少工作要做:
GATK
容器镜像和文件已在上面,它真正需要做的就是读取
WDL
文件,并启动
GATK
命令。如前所述,这两项操作非常快。反之,当你告知
Cromwell
发送工作到
PAPI
,你启动这套计算装置后,它将开始一段长长的流程,其中包括创建虚拟机,
检索容器镜像,从
GCS
复制文件到本地等。所有开销加在一起,运行时间反而更长。
因此对于简短任务,用在“真正工作”上的总体运行时间远少于幕后搭环境的时间。
然而搭环境时间基本固定,因此对于运行时间更长的任务(如用该工作流处理更大
基因组片段),搭环境比起往谷歌桶存取数据,用时基本可以忽略。
带着对该示例的思考,我们来讲讲你需注意的一些事项。不管今后你计划开发自己
的工作流,还是在云上使用他人的工作流,这些事项都需注意。
10.3.1
操作粒度
我们这里运行的
HaplotypeCaller
工作流旨在处理覆盖大量数据的长片段,因此处 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

What Successful Project Managers Do

What Successful Project Managers Do

W. Scott Cameron, Jeffrey S. Russell, Edward J. Hoffman, Alexander Laufer
How to Overcome a Power Deficit

How to Overcome a Power Deficit

Cyril Bouquet, Jean-Louis Barsoux
The Human Factor in AI-Based Decision-Making

The Human Factor in AI-Based Decision-Making

Philip Meissner, Christoph Keding

Publisher Resources

ISBN: 9787519864422