Training in the big data ecosystem
The O'Reilly Radar Podcast: Paco Nathan and Jesse Anderson on the evolution of the data training landscape.
Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.
Their discussion focuses on the training landscape in the big data ecosystem, their teaching techniques and particular content they choose, and a look at some expected future trends.
Here are a few snippets from their chat:
Traning vs PowerPoint slides
Anderson: “Often, when you have a startup and somebody says, ‘Well, we need some training,’ what will usually happen is one of the software developers will say, ‘OK, I’ve done some training in the past and I’ll put together some PowerPoints.’ The differences between a training thing and doing some PowerPoints, like at a meetup, is that a training actually has to have hands-on exercises. It has to have artifacts that you use right there in class. You actually need to think through, these are concepts, these are things that the person will need to be successful in that project. It really takes a lot of time and it takes some serious expertise and some experience in how to do that.”
Nathan: “Early on, you would get some committer to go out and do a meetup, maybe talk about an extension to an API or whatever they were working on directly. If there was a client firm that came up and needed training, then they’d peel off somebody. As it evolved, that really didn’t work. That kind of model doesn’t scale. The other thing too is, you really do need people who understand instructional design, who really understand how to manage a classroom. Especially when it gets to any size, it’s not just a afterthought for an engineer to handle.”
Framing the training
Anderson: “[Companies are] starting to look for, ‘OK, we’ve done Hadoop or we have Hadoop already in place, we want to start doing things in more real time.’ So, they’re starting to look at the real-time frameworks like Spark streaming and, in some cases Flink, but what we’re generally going to see is that a lot of these new technologies that are coming out that enable more real-time processing of big data, this is what they’re going after. One of those big technologies is Kafka. Part of the reason why I partnered with Confluent to do the Kafka training, I’d go into companies and they’d say ‘Okay, here’s our general architecture diagram and it all starts out with Kafka and goes into either Hadoop or Spark or both.'”
Nathan: “People want to focus on the context. Really, what’s the business context and what kind of architectural patterns work for the right kind of use cases? As Jesse was mentioning, I think that maybe the people who make a given framework aren’t necessarily going to be the ones who are as invested in showing all the different integrations possible. That’s the borderlands. I do find that people come to training to find out more about what integrations are possible, what ones make sense. Like we were mentioning about Spark and Kafka, maybe put Cassandra in that mix. That’s a pattern that shows up in banking, it shows up in genomics. It shows up all over. It’s a good illustration of the integrations that people are really craving.”
Educating the enterprise
Anderson: “One of the big gaps is that we’ve done great things for engineers and for analysts and for some of the operations people, but we really haven’t dealt with the business people. I saw this when I was teaching at Cloudera; I’d have managers or even sometimes director-level people that publicly traded companies sitting through my entire four-day class. I thought, they either have a great love of knowledge or they’re really wasting their time — they could have gotten about what they needed in a day or so. … A business person needs to have a general idea how these technologies work, but more importantly, they need to have an idea of how you make money at it.
I created a course I call the Business of Big Data to address that. We go through the technology at a high level, but it’s the applications of the technology that we focus on. It’s how do you use technology, big data technology, in order to improve things. … How do I use data in order to make a decision? What I call a data augmented decision. I’m going to use data in order to augment whatever decision I’m making but how do you work back from there? How do you create a successful big data project? One of the things I’ve seen is that companies will have difficulty carrying out that big data project because it’s quite a bit different than other ones. It involves more technology. It involves more expertise and I found that some of them would have a difficult time in actually carrying that out.”
Nathan: “I’ve definitely been involved in in-house training for companies where it’s hard to tell where the training stops and where the consulting starts. That’s fair game. I think there’s a lot of room for that because people really want to understand the context before they could ever jump into a contract. One of the things there is if a team internally comes to a training session, then they can start to explore some of those issues that they might want to follow up with a contractor for.
Another thing to embellish on something that Jesse was mentioning earlier, another really good reason why people come to these kinds of training outside the customized environment, though, is to hear what other people ask. There’s a lot of value in that. I’m not sure if I should use the term, but the ‘unknown unknowns.’ If you can hear the pain points that other companies and other teams are experiencing and hear how the experts answer them, you might actually save your company a lot of time.”
- Business of Big Data Workshop — Live online training with Jesse Anderson, November 10, 2015, from 9 a.m. to 3 p.m. PT.
- Learning Path: Architect and Build Big Data Applications — This Learning Path will take you through the entire process of designing and building data applications that can visualize, navigate, and interpret reams of data.