Within Companies

We can think of integrating data across the Web as the big challenge, but microcosms of this challenge appear everywhere. It's especially striking (at least it was to me, when I first noticed it) how often large companies have several databases all referring to the same items and no way to query across them or even make employees aware that data about their area of interest exists in a database maintained by a coworker. This is often called the "information silo problem," referring to the fact that information is cleanly separated and largely inaccessible—like grain in a silo (yes, I always thought that metaphor was a bit of a stretch).

This problem was clearly noticeable when I worked in the biotech industry and spoke with people at many pharmaceutical companies about their data integration challenges. In many cases, the management structure of companies is divided into therapeutic areas (areas focused on a family of diseases). People in these groups might be working on a particular set of target proteins to hit with a drug or looking for genetic markers to predict whether a drug would work or not, all the time conducting expensive experiments and building up large sets of knowledge on these genes, proteins, and compounds.

At the same time, people in different parts of the company, or perhaps previous researchers on a long-since-finished project, were often studying or had studied the same or very similar genes, proteins, and compounds. Researchers on each project ...

Get Beautiful Data now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.