O'Reilly logo
live online training icon Live Online training

First Steps: Writing Cleaner Code for Data Science using Python and R

Coding best practices for production-ready projects

enter image description here

Topic: Data
Charles Givre

One of the biggest challenges data scientists face is translating their models into production environments, where they’ll actually be used to generate value. Data scientists often hand off their work in the form of notebooks to operations or development teams to deploy and maintain. However, the two groups often have different goals and concerns, leading to misunderstandings that can slow deployment.

Join expert Charles Givre to learn how to bridge the gap between data scientists and the operations teams who’ll deploy and maintain their models. You’ll learn good coding practices that will help you write better, more maintainable code while staying mindful of ethical and legal considerations.

Charles shows you how to transition your model from a Jupyter notebook to production-quality code, walking you through writing useful and effective documentation, efficient coding techniques, clean coding, modularizing your code, and creating microservices to deploy your model. Discover how to create better products, deploy them faster, and create more value for stakeholders.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • How to make code clean and reusable
  • How to package code to maximize efficiency
  • The difference between production code and experimental code

And you’ll be able to:

  • Write effective unit tests
  • Deploy machine learning models to a production environment
  • Effectively debug code

This training course is for you because...

  • You’re a data scientist or business analyst.
  • You work with Python, R, or another coding language.
  • You want to write better code and get your models into production more quickly and easily.

Prerequisites

  • Experience building machine learning models and constructing data pipelines
  • Familiarity with Python and R (useful but not required—Examples will be in Python and R but are also generalizable to any language.)

Recommended preparation:

Recommended follow-up:

About your instructor

  • Charles Givre is a lead data scientist in the Cybersecurity Technology and Controls Group at JPMorgan Chase, where he works at the intersection of cybersecurity and data science. Previously, he was a senior lead data scientist at Booz Allen Hamilton on one of the firm's largest analytic programs, where he led data science efforts and worked to expand the role of data science in the program, and worked as a counterterrorism analyst at the Central Intelligence Agency for five years. One of his research interests is increasing the productivity of data science and analytic teams; to that end, he’s been working extensively to promote the use of Apache Drill in security applications and has contributed to the codebase. He’s also a coauthor of Learning Apache Drill from O’Reilly.

    Charles is passionate about teaching others data science and analytic skills and has led data science classes all over the world for clients, universities, and conferences, including Black Hat and the Center for Research in Applied Cryptography and Cyber Security at Bar-Ilan University. A sought-after speaker, he’s also delivered presentations at major industry conferences such as Strata-Hadoop World, Open Data Science Conference, and others. He recently served as program chair of the Strategic Analytics Program at Brandeis University's Graduate School of Professional Studies and is currently a member of the advisory board. He holds a master’s degree in Middle Eastern studies from Brandeis University as well as both a bachelor of science in computer science and a bachelor of music from the University of Arizona. Charles speaks French reasonably well and plays trombone. He lives in Baltimore with his family and in his nonexistent spare time is restoring a classic British sports car.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Tools of the trade (45 minutes)

  • Presentation: All about Jupyter notebooks; IDEs
  • Group discussion: When to use a notebook and when to use a script
  • Hands-on exercises: Use notebooking advanced features; work with an IDE
  • Q&A

Break (5 minutes)

Coding practices (50 minutes)

  • Presentation and group discussion: Clean coding practices; documenting code
  • Hands-on exercise: Identify opportunities to improve and document the provided code samples Katacoda interactive exercise: Given undocumented code, write simple documentation that works with the automated documentation generators
  • Q&A

Break (5 minutes)

Debugging (45 minutes)

  • Presentation: Kinds of errors; the debugging process
  • Hands-on exercise: Determine the kinds of errors in example code Katacoda interactive exercise: Use the process to debug “buggy” example code
  • Q&A

Testing your code (30 minutes)

  • Presentation: Unit tests and automated testing
  • Katacoda interactive exercise: Write your own unit tests for the example code
  • Q&A