Chapter 3 Data—Collect, Clean, and Connect

This chapter discusses how to get raw data that you might find in a corporate environment and turn it into data that you can use in a graph. Good insights cannot result from dirty data! Once you have an objective, you need data. Make sure the data is valid, clean, and properly organized before proceeding on to analysis and visualization. Following are the data steps that you must follow:

  • Collect—Where is the data coming from? Graph data in corporate environments may be buried in many different data sets. This chapter discusses some of the different ways graphs may exist within common data.
  • Clean—What is the quality of this data? Are items identified consistently? Are there many empty values? Are there duplicate entries? Are there any privacy issues? There can be many issues you must resolve while preparing data before you are able to use it with graph software.
  • Connect—How do you turn data into graph data? You have many different ways to create graph data. Most require that you create a data set of nodes and edges, which may then be organized into one or more files. Finally, the data is ready to import into graph software.

Know the Objective

The authors once worked on a project for a senior vice president who said, “Here’s some data about our staff—what can you show me?” We prepared a beautiful interactive graph visualization and he replied, “This tells me nothing I don’t already know.”

This precautionary story illustrates that ...

Get Graph Analysis and Visualization: Discovering Business Opportunity in Linked Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.