CHAPTER 7Linux and AWS Command-Line Basics for Genomics

Genomics involves manipulating large amounts of data. The Linux operating system is particularly suited to manipulating large files and has become the operating system of choice for manipulating and extracting information from all types of biological data, including genomics, proteomics, and other “omics” data. The Linux command-line interface allows easily visualization and manipulation of very large text files of millions of lines, whereas it is difficult to perform similar tasks using Windows Excel or other Windows tools. In addition, a lot of bioinformatics software is available only on Linux. Therefore, the first skill required to analyze genomics data is to become proficient with Linux. By the end of this chapter, you will be familiar and comfortable with all key Linux concepts necessary to run your genomics analyses.

As well, you can install an AWS package that enables you to manage your AWS resources directly from the command line. With the AWS CLI Tools installed, you can do such things as copy files from your local filesystem to an S3 bucket, start and stop EC2 instances, and manage user access privileges.

Selecting a Linux Distribution

You are probably aware that Linux comes in many flavors, called Linux distributions, such as Ubuntu, Red Hat, etc. To understand this concept, let's start with defining what an operating system is.

An operating system is a program, or software, that interfaces with all basic ...

Get Genomics in the AWS Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.