It made me wonder, are these factors essential to building a solid foundation for AI? Does high performance in these areas give an edge to AI projects? And, overall, my answer was: somewhat, but misleading. Let me explain, by block:
Learn faster. Dig deeper. See farther.
Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.
Massive data. IMHO, this is the red herring of AI. Too many believe “s/he who has the most data wins.” Data is absolutely valuable, but volume alone does not bring value. Within volume, you can have data that is generic or redundant. Therefore, massive amounts of data only help you if it can be used for differentiation. Specifically, you’re able to drive better results from that data. And, three other V’s define big data: variety, velocity, and veracity. Variety and velocity do not require “massive-ness.” As for veracity, you know the value of massive amounts of garbage data. Finally, I’d add that massive data can quickly lead to tyranny of popularity (i.e., those instances with the most data win). We all have examples of when one nugget of information was the key; sometimes the small data should win. Bottom line: big data is a building block—check; massive data—misleading.
Automatic data tagging systems. The automated tagging systems are AI, so we get caught in an infinite loop if we take this as a building block. Bottom line: automatic data tagging systems are sub-assemblies, not building blocks.
Top scientists. First, none of this is possible without research. None. HT to Bengio(s), LeCun, Ng, Hinton, et al. And, the WEF article calls out a combination of scientists and engineers, but with more of a waterfall approach versus one based on requirements. The question must be what you are trying to build and how important it is for you to create the algorithms versus use algorithms conceived or created by others. You need to decide this for your business—where is science important and where is implementation important? The two are different blocks, and both are critical. And, you might have different answers to different parts of your problem. Bottom line: top scientists and/or experienced engineers create the building blocks, but are not building blocks themselves.
Defined industry requirements. Requirements is where we are failing AI. I was recently invited to attend Intel’s AI Day as an industry influencer. The tech track was overflowing and the business track—which was fantastic, btw—well, we had plenty of room to stretch out our legs. We, as technologists, are so excited about the technology advancements that we are forgetting about the reason we need those advancements. We are drunk under the streetlight with our technology. I would argue against industry requirements, in favor of business requirements. While there will be overlap in industries, what is more critical is focusing AI on your business, your customers, your operations. Bottom line: defined business requirements—check; defined industry requirements—misleading.
Highly efficient compute power. I’m going to pick a nit that I’m not even sure I’m interested in picking, but here goes. Highly efficient compute power is the substrate, or the land we are building on, versus a building block. It is the common core. Why I think it is a nit worth picking is that it is less the distinguishing part than the others. Or maybe considering it as commoditized—for most applications—is a better way of looking at it. Bottom line: highly efficient compute power—substrate, not building block, thus misleading.
I propose the following three key building blocks to AI development, what I call the eggs, the chicken, and the bacon:
The eggs. Data are the eggs. We have not found one customer who does not already have enough data to start with AI and do a better job for themselves or their customers. The two biggest challenges we see in customers’ mindsets regarding data are:
Data silos. Organizations draw insane and inane lines around data, and departments act like feudal lords over “their” data.
Unstructured data. Gartner estimates that 80% of an enterprise’s data is unstructured, and in our experience, it is an untapped resource, which can provide valuable variety in data.
Your focus should not be on the amount of data, but the data available that could apply to the problem you have defined. You actually want to start with the smallest amount of data possible when testing, so you and your team have a better opportunity to uncover data issues and dependencies earlier. Bottom line: focus on quality of the data for your unique business problem, not quantity.
The chicken. The algorithm(s) are the chicken. I often show executives examples in the TensorFlow Playground of the interplay between algorithms and data. Depending on your objective and the data you have available, you will need to choose different algorithms. Depending on the algorithms available, you might need to find different data. Thus, the appropriateness of the chicken and egg reference. Bottom line: you cannot separate the algorithm from the data; they depend on each other.
The bacon. What is the bacon of business? Better business results. This must come first and last. Define your project around the results you need, then measure to make sure you are getting them. Rinse, repeat. I gave a talk at Strata + Hadoop World in Singapore about how to “hire” AI. The first step is to write the job description. What are the job requirements? Then, you need to evaluate the job done by the requirements you defined. Bottom line: do not forget the bacon!
These building blocks are highly dependent on each other. Like Lego blocks, they can be combined in a lot of different ways, but they still have to be designed to fit together. Oh, and, yes, you can just end your argument with “AI will bring us more bacon.”