Paco Nathan leads the Learning group at O’Reilly Media. Known as a “player/coach” data scientist, Nathan led innovative data teams building ML apps at scale for several years and more recently was evangelist for Apache Spark, Apache Mesos, and Cascading.
Below, Nathan shares his thoughts on the current and future state of Jupyter. He will also be speaking at JupyterCon, August 22-25, 2017, in New York City.
1. How has Jupyter changed the way you work?
Having repeatable work that can be packaged and shared with others provides an enormous boost for how we work together. Jupyter Notebooks give context to the code. When you share your work, you're not just sharing some source files that have to be deciphered; you're sharing a whole train of thought, which makes it much easier to see what's going on. This isn't just true for your team members; it's equally true if you drop something and come back six months later.
2. How does Jupyter change the way your team works? How does it alter the dynamics of collaboration?
I think the human-in-the-loop design pattern for how we manage a large set of ML pipelines at O'Reilly makes possible some of our use of AI applications that wouldn't be manageable otherwise. Using the nbtransom package, we've essentially made the machine another collaborator on a set of notebooks. We use several different algorithms to score documents; when the algorithms give different results, we send the results to a human for resolution. Keeping the process within Jupyter Notebooks makes it much more convenient and efficient.
The dynamics are about people sharing this work within a team, but then maybe machines are doing 80-90% of the work, and those machines are also collaborating on documents via Jupyter.
3. How do you expect Jupyter to be extended in the coming year?
Collaborative documents is the big area I'm looking forward to. There have already been experiments with integrating notebooks and Google docs, and we're looking forward to having full collaboration (multiple authors working on a notebook simultaneously) in an upcoming version of JupyterHub. That would make our human-in-the-loop process even more efficient.
4. What will you be talking about at JupyterCon?
My talk is on the general theme of Jupyter as a front end for AI—in a couple ways. One, mentioned above, is where we have an "active learning" design pattern for human-in-the-loop ML pipelines. Another is where we're starting to build out conversational interfaces that leverage Jupyter network protocol.
What sessions are you looking forward to seeing at JupyterCon?
I especially want to learn about experiences with large-scale deployments—e.g., large JupyterHub in education.