Chapter 4. Working with Remote Machines
Most data-processing tasks in bioinformatics require more computing power than we have on our workstations, which means we must work with large servers or computing clusters. For some bioinformatics projects, it’s likely you’ll work predominantly over a network connection with remote machines. Unsurprisingly, working with remote machines can be quite frustrating for beginners and can hobble the productivity of experienced bioinformaticians. In this chapter, we’ll learn how to make working with remote machines as effortless as possible so you can focus your time and efforts on the project itself.
Connecting to Remote Machines with SSH
There are many ways to connect to another machine over a network, but by far the most common is through the secure shell (SSH). We use SSH because it’s encrypted (which makes it secure to send passwords, edit private files, etc.), and because it’s on every Unix system. How your server, SSH, and your user account are configured is something you or your system administrator determines; this chapter won’t cover these system administration topics. The material covered in this section should help you answer common SSH questions a sysadmin may ask (e.g., “Do you have an SSH public key?”). You’ll also learn all of the basics you’ll need as a bioinformatician to SSH into remote machines.
To initialize an SSH connection to a host (in this case, biocluster.myuniversity.edu), we use
the ssh command:
$sshbiocluster.myuniversity.edu ...