Brian Granger is an associate professor of physics at Cal Poly San Luis Obispo, where he also teaches in the university’s undergraduate data science program. His primary area of research is interactive computing with data and, specifically, Jupyter. He started the original IPython Notebook in 2011 and is one of the co-founders of Project Jupyter. He is also an active contributor to and co-leader of the JupyterLab project, which aims to take the Jupyter Notebook interface to the next level with its flexible building blocks for interactive and collaborative computing. We recently discussed JupyterLab, how feedback from notebook users informed its design, how its features could benefit the scientific and technical computing communities, and the role of notebooks in academia and even data journalism.
What is JupyterLab and how does it represent what you've called the “evolution of the Jupyter web interface?”
The classic Jupyter Notebook offers a number of different building blocks for interactive computing: the notebook, file browser, text editor, terminal, outputs, etc. We view JupyterLab as the evolution of the classic notebook, as it allows a more flexible and powerful way for working with those same building blocks. From a user’s perspective, we hope that everything in JupyterLab is familiar, but even more delightful and productive to work with.
Before going further, I want to acknowledge the incredible team working on Jupyter on many fronts, including software, design and organizational aspects. In particular, individuals on the Jupyter Steering Council provide the foundation of the project and are all long-term contributors, without whom the project wouldn’t exist. The team building JupyterLab and PhosphorJS are the driving force behind the work I discuss here. Lastly, we are all grateful to Fernando Perez, creator of IPython and co-founder with us on Jupyter, for setting off into the uncharted wilderness of scientific open source software in 2001.
What need is JupyterLab fulfilling among Jupyter Notebook users, particularly those who work in data science or scientific/technical computing?
Since the IPython Notebook came out in 2011, we have spent a lot of time talking to individual and organizational users to understand what delights them about the notebook, and what remains really painful. Some of those pain points are specific to the Jupyter Notebook and others are more general challenges they face working with code and data. Also, in 2015, we worked with IBM to run a user experience survey (the results of which are posted on GitHub). Based on all of this feedback, three key factors led us to develop JupyterLab.
First, users love the notebook experience, and want it to improve, but without losing the core characteristics that make it the Jupyter notebook. This is important, because we could have re-thought the entire notebook abstraction itself in JupyterLab—but we didn’t. Thus, while we are making improvements to the notebook UI/UX in JupyterLab, notebooks are essentially the same (in fact, it is still 100% the same document format and server).
Second, users want to be able to combine, remix, and integrate the different building blocks to better support their workflows. A classic usage pattern we see is data scientists who begin working in an interactive Python shell, then migrate to a notebook, and eventually build and deploy a service based on that code. In the classic notebook, those transitions are really painful. In JupyterLab, we are trying to address the pain points of such an evolving workflow. For example, in JupyterLab we offer a document-less code console for quick exploration that supports Jupyter’s rich output, and which is integrated with the text file editor, so you can run blocks of code in a text file interactively, outside the notebook.
Third, collaboration is a significant area of need. As more and more groups and organizations adopt Jupyter, they need to work together with others on notebooks. Sure, notebooks can be shared on GitHub, Dropbox or email, but that is really painful and doesn’t represent how most of us work together. The next step beyond that is full real-time collaboration, in the style of Google Drive or Dropbox Paper. We have built JupyterLab with the abstractions needed to offer real-time collaboration on all documents (notebooks, text files, etc.). One of our postdocs at UC Berkeley has a prototype of real-time collaboration working in JupyterLab and our designers at Cal Poly are working on the UI/UX of this. We are having a lot of fun working on this right now, and are hopeful it is going to really help our users. It is worth mentioning that CoCalc (formerly SageMathCloud) now also offers a custom notebook front end for Jupyter that supports real-time collaboration.
What has surprised you about the evolution of this project?
We worked on designing the core abstractions of JupyterLab while iterating many times on the implementation. As we have converged on the current architecture and implementation, I have been surprised to see how the different abstractions can be combined to provide features we hadn’t initially envisioned. An example is the support we have for GeoJSON and CSV files. JupyterLab allows third-party extensions to declare handlers that enable users to edit or view different file types. We hadn’t appreciated how that would make it very easy for us and others to improve the user experience of working with common data formats. In JupyterLab, a user can double-click on a GeoJSON file and see a nicely rendered map based on Leaflet, or a nice table view of a CSV file. Those views are integrated with our document models, so a user can edit the files in the text editor and immediately see updated versions of the rendered file. The same abstractions also gave us an integrated Markdown editor/renderer for free.
There are a number of features planned for the summer release of JupyterLab 1.0, including theme switching, porting nbextensions, and hooking up kernels to output areas. How has the team prioritized features for beta and 1.0?
Originally, the primary characteristic of JupyterLab 1.0 was going to be its feature parity with the classic Jupyter Notebook. That is evolving and we are starting to make deliberate choices to postpone work on more obscure features of the classic notebook, to make sure that the commonly used features in JupyterLab are well designed, both from the software/API perspective and the UI/UX design. From the UI/UX side, we are attempting to address a number of usability issues that have been with the classic notebook from the beginning, such as collapsable code cells and a better user experience for creating, running, deleting and moving cells. Users do those things hundreds or thousands of times and they are a bit too clunky. From the software/API perspective, we are working hard to offer clean public APIs that can used and extended by other developers. Those APIs are, in a sense, UI/UX for developers.
Beyond 1.0, what are some planned features of JupyterLab, and how have they been determined?
A main priority after 1.0 is released is to help users and organizations make the transition from the classic notebook to JupyterLab. We realize the transition will take time and that we will need to provide a lot of support and documentation, etc. Beyond that, the real-time collaboration probably won’t be done for the 1.0 release, so that will be a significant area of focus beyond 1.0.
How do notebooks feature in your academic work at Cal Poly?
There are two components of my academic work: teaching and research. On the teaching side I am currently teaching in Cal Poly’s undergraduate Data Science degree program (for Statistics and CS majors). I, and the other instructors in this program, use the notebook (through JupyterHub) for in-class lectures, homework, and projects. The same is true in the Computational Physics course in our department. These courses are really fun to teach and at this point, I can’t imagine teaching about code+data without the notebook.
On the research side, Jupyter is my primary area of research. In very broad terms, academia is struggling to understand how code and data, rather than traditional peer-reviewed papers, can be the primary output of academic research. At Cal Poly, the key phrase is “external validation,” which basically means, they want evidence that others find your research significant and impactful. Publications in traditional, peer-reviewed journals are one form of external validation. I assert that open source software, with significant user and developer communities, also has external validation. To their credit, Cal Poly and my department granted my tenure in 2014, largely based on my work in developing the Jupyter Notebook. Today, interactive computing with data remains my main area of research. This includes Jupyter and other open source projects such as Altair.
Notebooks have broad utility in the computer science and data science communities. Are you seeing notebook usage increase in other related communities?
One usage that I am particularly pleased with is Jupyter’s usage in the data journalism community. In the past few years, major news organizations have invested significantly in data journalism. Amongst others, this includes FiveThirtyEight, LA Times, Pro Publica, and to many a surprise, BuzzFeedNews. Jeremy Singer-Vine and the team at BuzzFeedNews have led the way in setting a high bar for open and reproducible data journalism. In their GitHub repository, they provide the dataset and analysis (as a Jupyter Notebook) for all of the data-backed news articles published on BuzzFeedNews. Other news organizations are now following this practice as well. Given the highly politicized backlash against science and data-backed reality we are seeing in the U.S., I find this work extremely important and am proud that Jupyter has, in a small way, enabled it.