In this episode of the Data Show, I spoke with Francesca Lazzeri, an AI and machine learning scientist at Microsoft, and her colleague Jaya Mathew, a senior data scientist at Microsoft. We conducted a couple of surveys this year—“How Companies Are Putting AI to Work Through Deep Learning” and “The State of Machine Learning Adoption in the Enterprise”—and we found that while many companies are still in the early stages of machine learning adoption, there’s considerable interest in moving forward with projects in the near future. Lazzeri and Mathew spend a considerable amount of time interacting with companies that are beginning to use machine learning and have experiences that span many different industries and applications. I wanted to learn some of the processes and tools they use when they assist companies in beginning their machine learning journeys.
Here are some highlights from our conversation:
Team data science process
Francesca Lazzeri: The Data Science Process is a framework that we try to apply in our projects. Everything begins with a business problem, so external customers come to us with a business problem or a process they want to optimize. We work with them to translate these into realistic questions, into what we call data science questions. And then we move to the data portion: what are the different relevant data sources, is the data internal or external? After that, you try to define the data pipeline. We start with the core part of the data science process—that is, data cleaning—and proceed to feature engineering, model building, and model deployment and management.
...There are also usually external agents involved. When I say external agents, I mean there are program managers and business experts who follow us during this process. These are individuals who are the data and domain experts. It's a very interactive process because you go back and forth trying to understand if what you are building is something that really can be interesting to the business owners.
What is holding back adoption of machine learning
Jaya Mathew: One of the biggest bottlenecks is lack of talent within the organization. A company really needs to invest in either up-scaling their existing employee base, which tends to be expensive and they're trying to figure out if that investment is really worth it. Or they need to try to hire, and hiring specific skill sets is difficult, as there is a talent shortage everywhere.
Then, in addition to that, there's also a little bit of hesitation because some of the AI and machine learning models are “black boxes”. ... I think many governments and many organizations need to be able to explain what's going on before they deploy a model.
Francesca Lazzeri and Jaya Mathew: “A day in the life of a data scientist: How do we train our teams to get started with AI?”
Ashok Srivastava on why “The real value of data requires a holistic view of the end-to-end data pipeline”
Jerry Overton on “Teaching and implementing data science and AI in the enterprise”
Carme Artigas on “Transforming organizations through analytics centers of excellence”
- "Managing risk in machine learning models": Andrew Burt and Steven Touw on how companies can manage models they cannot fully explain.