Recognizing and rescuing a failing big data project

Common pitfalls and best practices every manager should know.

By Jim Scott
November 23, 2015
Galleria, the first modern shopping mall of Turkey. Galleria, the first modern shopping mall of Turkey. (source: SALTOnline, on Flickr)

If you ran an analysis, you may discover there is a good chance your big data project will not proceed according to plan. According to an Infochimps report on CIOs and big data, 55% of big data projects fail because of the lack of communication between the top managers who had the overall project vision and those who were in charge of actually implementing it. The result? Providers were unable to deliver the benefits promised when the project was in its planning stages.

In order to help keep your team’s data analytics project on track, I’ve compiled a list of signs that your big data project is heading toward failure, along with some suggestions based on my own industry experience on how to remediate the problems.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

The project is always expanding.

Research by McKinsey, conducted in collaboration with the University of Oxford, suggests that large IT projects typically run 45% over budget and 7% over schedule, while delivering 56% less value than predicted. The typical reasons for stakeholder disenchantment are poorly defined success criteria, often teamed with slippage of project deliverables. Some tips to overcome this obstacle are as follows:

  • When you notice that your project has been “90% done” for an extended period of time, stop building.
  • Define measurable key performance indicators (KPIs) that will signal when the project has met its defined objectives. If you don’t know what success looks like, you won’t be able to recognize it! Examples of KPIs include increased customer satisfaction via profiling, personalization and customer lifetime value analysis, and improved procurement and acquisition productivity through spend analysis.
  • Lock down the scope, define new milestones, and purge any tasks that cannot be delivered in incremental steps from the project process.
  • Prioritize aspects of the project that will yield savings and/or profit for the business.
  • Once the foundation is solid, proceed with the project.

You have very few users.

You built it, but they didn’t come? Take comfort in the fact that you’re not alone. In a Gartner study on enterprise big data projects, 70% of respondents reported having only between 1 and 20 users accessing Hadoop, and 4% reported having no users at all.

Before you can address this problem, you need to understand why there is such a lack of enthusiasm about your big data deployment. Users may not fully see the value in working with big data, or they may feel intimidated by the advanced technical skills they believe are a prerequisite to working with big data. Some suggestions I have for this problem are as follows:

  • Emphasize an iterative process over the end-product, and allow users to explore. Baby steps yield far better results and a faster feedback loop to positive outcomes.
  • Make tools available, such as Apache Drill, that enable users to leverage their existing SQL skills. This will allow users to ask questions quickly without the immediate need to learn new technologies.
  • Allow users to elevate their knowledge through training that can explain the basics of big data technologies to those who are non-technical. Training can also help those who are more technical to see new ideas and approaches to solving problems.
  • If you can gain managerial support, it would be useful to align employee performance incentives and annual goals with the company’s big data project goals.

Your data is still in silos.

Data streams need to converge for big data to deliver its full potential, but in-house politics or policies may stem the flow. If individual departments are still making business decisions based solely (or primarily) on their own data, you have a developing problem. For example, looking at historical sales data alone is significantly less useful than looking at sales trends, customer relationship management (CRM), and social media data together. History alone may soothe a company into thinking a dip in sales is due to a seasonal slowdown and can be ignored — but CRM and social media may point to a developing issue that is impacting brand value and is worth addressing right away. If you recognize your organization’s data is disjointed, here are some ideas to help bring it together:

  • Define a critical common goal that each department can agree to pursue, and use that goal to break data out of its silos. An example may be to pull data from each silo and identify new details for what causes a customer to stick to or bounce from a website. This is obviously not an easy fix, and you will need top-down support to make it happen.
  • Be ready with a plan to ensure that data is suitably scrubbed, verified, and secured. Tools such as Apache Drill or Apache Spark can be utilized to transform data from one format to another, or to clean the fields of data. This type of plan is important, as it will allow your team to deliver a common view of the data from multiple data sources back to the users. It will also help to prevent confusion among data sources where individuals may be accustomed to seeing data presented in one particular way.
  • Ensure users have access to solutions that support analysis of structured, semi-structured, and unstructured data. The goal here is to simplify the process and remove pain points for the users. A tool like Apache Drill, which uses SQL, can help to overcome this barrier.

Answers to questions are not helpful.

If the answers aren’t useful, it’s possible that your users — or your data scientists — are asking the wrong questions. It is important to manage expectations around your big data project. In the beginning, it is likely that no one will be asking the right questions or querying the right data sets. One example of a bad question would be to aggregate a set of numbers that consist of averages. Since asking a good question can be difficult, offer an example of the data sources and demo how to ask a question like: how many viewers visited the website in a certain time frame, from a certain region, and what is the demographic makeup of that region? Expect some failures in the beginning stages, but work with people to spread knowledge on how to frame useful queries and fully leverage big data potential. Additional tips include:

  • Encourage users to define a project by first establishing the desired outcome (e.g., we will increase conversions). Much like the scientific process, define your expectations upfront.
  • Determine what business decisions you will need to make to successfully support that outcome (e.g., what channels should we use? What message will resonate? How much will we spend? Historically, what percentage of leads can we expect to convert?).
  • Let the analytic results drive the project. Measure success by defining the benefits of the results.
  • Tweak processes if needed, and repeat with another project.

Your business has changed.

Significant IT projects, in an enterprise setting, typically take between 18 and 24 months to fully deploy. During that time, business conditions can shift, new compliance regulations may change data governance processes, in-house talent may leave to pursue other opportunities, or you may simply find that the technological decisions made a couple of years ago are not the decisions you’d make today.

Obviously, this situation is difficult to remediate in a cost-effective way after the deployment goes live. If you are still in the development process, you can build in some of the following best practices to help ensure the project meets your shifting needs as you move forward:

  • To avoid getting locked into a limited solution set, choose IT solutions that have robust support from your vendor and the user community. Tools like Apache Drill (which support ANSI SQL), Apache Spark (which can be used with common programming languages, including Java, Scala, and Python) are good choices. Look for solutions that are commonly used in the big data space and that leverage skills gained from multiple disciplines.
  • Select software that is designed for users with a variety of data analytics skills, such as Apache Spark. Enabling the largest number of users with a smallest number of technologies is a far simpler way to get started than supporting one technology per person. As experience grows, new technologies can always be added.
  • Bring in representatives across the enterprise in the early stages of project planning and keep them involved as the project progresses. Regular communication will allow your team time to adjust to any major changes in business direction or technologies.
  • Appoint project leaders who can effectively engage stakeholders across all business divisions to ensure alignment between your big data project and the primary business needs.
  • Set checkpoints during the buildout and deployment processes to verify that goals and needs have not changed, and to make sure your big data project remains focused on robust business objectives.

Final takeaways

Promptly implementing small pieces of a plan tends to lead to more successful implementations down the road, as those pieces can quickly prove business value. Proving business value with small pieces of functionality and new information is the best way to gain support and boost morale with projects of scale — all players can see the benefits early and gain assurance that the project is moving forward successfully.

Ensure that communication channels for positive and negative feedback exist. The last thing you want is for an unhappy user to torpedo the efforts of these projects because you didn’t know they were unhappy.

Finally, make sure that users know who to talk to about any technical issues. Having a gatekeeper to moderate technical issues, or having a list of subject matter experts available to consult with users, is paramount to the success of your project.

Post topics: Perils of big data