Chapter 7. High-Performance Computing and Other Compute Services

Show of hands…how many of us used our university’s high-performance computing (HPC) environment and thought it was amazingly powerful, but despised having to submit jobs to the scheduling system and fight over resources with your department? In the previous chapters, we’ve talked a lot about using Azure Machine Learning or Databricks with Apache Spark for lots of data-intensive jobs as well as some bioinformatics tasks. However, not every piece of bioinformatics software that’s meant for an HPC cluster will natively scale in these services.

In Azure, there are plenty of other scalable compute options that allow us to replicate our familiar on-prem HPC environments without the hassle of fighting for resources or fighting against the dreaded scheduler. This allows us to use the plethora of distributed bioinformatics software in the cloud without having to reinvent the wheel. Plus, instead of investing millions of dollars on HPC equipment, we can turn our compute services on or off and pay only for what we use.

In this chapter, I’ll cover a few additional scalable compute options in Azure and begin talking about cloudifying your existing bioinformatics pipelines using these services.

While we’ll cover quite a few options, you won’t likely use more than one at a time in practice. Each of the services that we’ll cover have slightly different features that may fit your particular use case better than others.

Bring Your ...

Get Genomics in the Azure Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.