Welcome to Effective Computation in Physics. By reading this book, you will learn the essential software skills that are needed by anyone in a physics-based field. From astrophysics to nuclear engineering, this book will take you from not knowing how to make a computer add two variables together to being the software development guru on your team.
Physics and computation have a long history together. In many ways, computers and modern physics have co-evolved. Only cryptography can really claim the same timeline with computers as physics. Yet in spite of this shared growth, physicists are not the premier software developers that you would expect. Physicists tend to suffer from two deadly assumptions:
Software development and software engineering are easy.
Simply by knowing physics, someone knows how to write code.
While it is true that some skills are transferable—for example, being able to reason about abstract symbols is important to both—the fundamental concerns, needs, interests, and mechanisms for deriving the truth of physics and computation are often distinct.
For physicists, computers are just another tool in the toolbox. Computation plays a role in physics that is not unlike the role of mathematics. You can understand physical concepts without a computer, but knowing how to speak the language(s) of computers makes practicing physics much easier. Furthermore, a physical computer is not unlike a slide rule or a photon detector or an oscilloscope. It is an experimental device that can help inform the science at hand when set up properly. Because computers are much more complicated and configurable than any previous experimental device, however, they require more patience, care, and understanding to properly set up.
More and more physicists are being asked to be software developers as part of their work or research. This book aims to make growing as a software developer as easy as possible. In the long run, this will enable you to be more productive as a physicist.
On the other end of the spectrum, computational modeling and simulation have begun to play an important part in physics. When experiments are too big or expensive to perform in statistically significant numbers, or when theoretical parameters need to be clamped down, simulation science fills a vital role. Simulations help tell experimenters where to look and can validate a theory before it ever hits a bench. Simulation is becoming a middle path for physicists everywhere, separate from theory and experiment. Many simulation scientists like to think of themselves as being more theoretical. In truth, though, the methods that are used in simulations are more similar to experimentalism.
All modern physicists, no matter how experimental, rely on a computer in some part of their scientific workflow. Some researchers only use computers as word processing devices. Others may employ computers that tirelessly collect data and churn analyses through the night, outpacing most other members of their research teams. This book introduces ways to harness computers to accomplish and automate nearly any aspect of research, and should be used as a guide during each phase of research.
Reading this book is a great way to learn about computational physics from all angles. It will help you to gain and hone software development skills that will be invaluable in the context of your work as a physicist. To the best of our knowledge, another book like this does not exist. This is not a physics textbook. This book is not the only way to learn about Python and other programming concepts. This book is about what happens when those two worlds inelastically collide. This book is about computational physics. You are in for a treat!
This book is for anyone in a physics-based field who must do some programming as a result of their job or one of their interests. We specifically cast a wide net with the term “physics-based field.” We take this term to mean any of the following fields: physics, astronomy, astrophysics, geology, geophysics, climate science, applied math, biophysics, nuclear engineering, mechanical engineering, material science, electrical engineering, and more. For the remainder of this book, when the term physics is used it refers to this broader sense of physics and engineering. It does not simply refer to the single area of study that shares that name.
Even though this book is presented in the Python programming language, the concepts apply to a wide variety of programming languages, both modern and historical. Python was chosen here because it is easy and intuitive to use in a wide variety of situations. While you are trying to learn concepts in computational physics, Python gets out of your way. You can take the skills that you learn here and apply them equally well in other programming contexts.
While anyone is welcome to read this book and learn, it is targeted at people in physics who need to learn computational skills. The examples will draw from a working knowledge of physics concepts. If you primarily work as a linguist or anthropologist, this book is probably not for you. No knowledge of computers or programming is assumed. If you have already been working as a software developer for several years, this book will help you only minimally.
To demonstrate, let’s take the example of a team of physicists using a new detector to measure the decay constants of radium isotopes at higher precision. The physicists will need to access data that holds the currently accepted values. They may also want to write a small program that gives the expected activity of each isotope as a function of time. Next, the scientists will collect experimental data from the detector, store the raw output, compare it to the expected values, and publish a paper on the differences. Since the heroes of this story value the tenets of science and are respectful of their colleagues, they’ll have been certain to test all of their analyses and to carefully document each part of the process along the way. Their colleagues, after all, will need to repeat this process for the thousands of other isotopes in the table of nuclides.
To access a library that holds nuclear data such as currently accepted nuclear decay constants, , for each isotope , our heroes may have to install the ENSDF database into their filesystem. Insights about the shell (Chapter 1) and systems for building software (Chapter 14) will be necessary in this simple endeavor.
The expected activity for an isotope as a function of time is very simple (). No matter how simple the equation, though, no one wants to solve it by hand (or by copying and pasting in Excel) for every second of the experiment. For this step, Chapter 2 provides a guide for creating a simple function in the Python programming language. For more sophisticated mathematical models, object orientation (Chapter 6), numerical Python (Chapter 9), and data structures (Chapter 11) may be needed.
A mature experiment is one that requires no human intervention. Said another way, a happy physicist sleeps at home while the experiment is running unaided all night back at the lab. The skills gained in Chapter 1 and Chapter 2 can help to automate data collection from an experiment. Methods for storing that data can be learned in Chapter 10, which covers HDF5.
Once the currently accepted values are known and the experimental data has been collected, the next step of the experiment is to compare the two datasets. Along with lessons learned from Chapter 1 and Chapter 2, this step will be aided by a familiarity with sophisticated tools for analysis and visualization (Chapter 7). For very complex data analysis, parallelism (the basics of which are discussed in Chapter 12) can speed up the work by employing many processors at once.
Because this is science, reproducibility is paramount. To make sure that they can repeat their results, unwind their analysis to previous versions, and replicate their plots, all previous versions of the scientists’ code and data should be under version control. This tool may be the most essential one in this book. The basics of version control can be found in Chapter 15, and the use of version control within a collaboration is discussed in Chapter 16.
In addition to being reproducible, the theory, data collection, analysis, and plots must be correct. Accordingly, Chapter 17 will cover the basics of how to debug software and how to interpret error messages. Even after debugging, the fear of unnoticed software bugs (and subsequent catastrophic paper retractions) compels our hero to test the code that’s been written for this project. Language-independent principles for testing code will be covered in Chapter 18, along with specific tools for testing Python code.
All along, our physicists should have been documenting their computing processes and methods. With the tools introduced in Chapter 19, creating a user manual for code doesn’t have to be its own project. That chapter will demonstrate how a clickable, Internet-publishable manual can be generated in an automated fashion based on comments in the code itself. Even if documentation is left to the end of a project, Chapter 19 can still help forward-thinking physicists to curate their work for posterity. The chapters on licenses (Chapter 22) and collaboration (Chapter 21) will also be helpful when it’s time to share that well-documented code.
Once the software is complete, correct, and documented, our physicists can then move on to the all-important writing phase. Sharing their work in a peer-reviewed publication is the ultimate reward of this successful research program. When the data is in and the plots are generated, the real challenge has often only begun, however. Luckily, there are tools that help authors be more efficient when writing scientific documents. These tools will be introduced in Chapter 20.
You learn by doing. We want you to learn, so we expect you to follow along with the examples. The examples here are practical, not theoretical. In the chapters on Python, you should fire up a Python session (don’t worry, we’ll show you how). Try the code out for yourself. Try out your own variants of what is presented in the book. Writing out the code yourself makes the software and the physics real.
If you run into problems, try to solve them by thinking about what went wrong. Googling the error messages you see is a huge help. The question and answer website Stack Overflow is your new friend. If you find yourself truly stuck, feel free to contact us. This book can only give you a finite amount of content to study. However, with your goals and imagination, you will be able to practice computational physics until the end of time.
Furthermore, if there are chapters or sections whose topics you already feel comfortable with or that you don’t see as being directly relevant to your work, feel free to skip them! You can always come back to a section if you do not understand something or you need a refresher. We have inserted many back and forward references to topics throughout the course of the text, so don’t worry if you have skipped something that ends up being important later. We’ve tried to tie everything together so that you can know what is happening, while it is happening. This book is one part personal odyssey and one part reference manual. Please use it in both ways.
The following typographical conventions are used in this book:
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
This element signifies a tip or suggestion.
This element signifies a general note.
This element indicates a warning or caution.
This book also makes use of a fair number of “code callouts.” This is where the coding examples are annotated with numbers in circles. For example:
These are useful for drawing your attention to specific parts of the code and to explain what is happening on a step-by-step basis. You should not type the circled numbers, as they are not part of the code itself.
Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/physics-codes/examples.
This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Effective Computation in Physics by Anthony Scopatz and Kathryn D. Huff (O’Reilly). Copyright 2015 Anthony Scopatz and Kathryn D. Huff, 978-1-491-90153-3.”
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at firstname.lastname@example.org.
This book will teach you to use and master many different software projects. That means that you will have to have a lot of software packages on your computer to follow along. Luckily, the process of installing the packages has recently become much easier and more consistent. We will be using the conda package manager for all of our installation needs.
If you have not done so already, please download and install Miniconda. Alternatively, you can install Anaconda. Miniconda is a stripped-down version of Anaconda, so if you already have either of these, you don’t need the other. Miniconda is a Python distribution that comes with Conda, which we will then use to install everything else we need. The Conda website will help you download the Miniconda version that is right for your system. Linux, Mac OS X, and Windows builds are available for 32- and 64-bit architectures. You do not need administrator privileges on your computer to install Miniconda. We recommend that you install the Python 3 version, although all of the examples in this book should work with Python 2 as well.
If you are on Windows, we recommend using Anaconda because it allievates some of the other package installation troubles. However, on Windows you can install Miniconda simply by double-clicking on the executable and following the instructions in the installation wizard.
If you are on Windows and are not using Anaconda, please download and install msysGit, which you can find on GitHub. This will provide you with the version control system called Git as well as the bash shell, both of these, which we will discuss at length. Neither is automatically available on Windows or through Miniconda. The default install settings should be good enough for our purposes here.
If you are on Linux or Mac OS X, first open your Terminal application. If you
do not know where your Terminal lives, use your operating system’s search
functionality to find it. Once you have an open terminal, type in the following
after the dollar sign (
$). Note that you may have to change the version number
in the filename (the
part) to match the file
that you downloaded:
# On Linux, use the following to install Miniconda:
# On Mac OS X, use the following to install Miniconda:
Here, we have downloaded Miniconda into our default download directory, ~/Downloads. The file we downloaded was the 64-bit version; if you’re using the 32-bit version you will have to adjust the filename accordingly.
On Linux, Mac OS X, and Windows, when the installer asks you if you would like
to automatically change or update the .bashrc file or the system
PATH, say yes.
That will make it so that Miniconda is automatically in your environment and will
ease further installation. Otherwise, all of the other default installation
options should be good enough.
Now that you have Conda installed, you can install the packages that you’ll need for this book. On Windows, open up the command prompt, cmd.exe. On Linux and Mac OS X, open up a terminal. You may need to open up a new terminal window for the installation of Miniconda to take effect. Now, no matter what your operating system is, type the following command:
$ conda install --yes numpy scipy ipython ipython-notebook matplotlib pandas \ pytables nose setuptools sphinx mpi4py
This may take a few minutes to download. After this, you are ready to go!
Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.
Members have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and hundreds more. For more information about Safari Books Online, please visit us online.
Please address comments and questions concerning this book to the publisher:
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/effective-comp.
To comment or ask technical questions about this book, send email to email@example.com.
For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
This work owes a resounding thanks to Greg Wilson and to Software Carpentry. The work you have done has changed the conversation surrounding computational science. You have set the stage for this book to even exist. The plethora of contributions to the community cannot be understated.
Equally, we must thank Paul P.H. Wilson and The Hacker Within for continuing to inspire us throughout the years. Independent of age and affiliation, you have always challenged us to learn from each other and unlock what was already there.
Stephen Scopatz and Bruce Rowe also deserve the special thanks afforded only to parents and professors. Without them helping connect key synapses at the right time, this book would never have been proposed.
The African Institute for Mathematical Sciences deserves special recognition for demonstrating the immense value of scientific computing, even to those of us who have been in the field for years. Your work inspired this book, and we hope that we can give back to your students by writing it.
We also owe thanks to our reviewers for keeping us honest: Jennifer Klay, Daniel Wooten, Michael Sarahan, and Denia Djokić.
To baristas all across the world, in innumerable cafés, we salute you.