Chapter 11. Running Many Workflows Conveniently in Terra

In Chapter 10, we gave you a tantalizing first taste of the power of the Cromwell plus Pipelines API combination. You learned how to dispatch individual workflows to PAPI, both directly through Cromwell and indirectly through the WDL Runner wrapper. Both approaches enabled you to rapidly marshal arbitrary amounts of cloud compute resources without needing to administer them directly, which is probably the most important lesson you can take from this book. However, as we’ve discussed, both approaches suffer from limitations that would prevent you from achieving the truly great scalability that the cloud has to offer.

In this chapter, we show you how to use a fully featured Cromwell server within Terra, a cloud-based platform operated by the Broad Institute. We begin by introducing you to the platform and walking you through the basics of running workflows in Terra. Along the way, you’ll have the opportunity to experiment with the call caching feature that allows the Cromwell server to resume failed or interrupted workflows from the point of failure. With that experience in hand, you’ll graduate to finally running a full-scale GATK Best Practices pipeline on a whole genome dataset.

Getting Started with Terra

You are just a few short hops away from experiencing the delights of a fully loaded Cromwell server thanks to Terra, a scalable platform for biomedical research operated by the Broad Institute in collaboration with Verily. ...

Get Genomics in the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.