O'Reilly logo

Sharing Big Data Safely by Ellen Friedman, Ted Dunning

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 1. So Secure It’s Lost

What do buried 17th-century treasure, encoded messages from the Siege of Vicksburg in the US Civil War, tree squirrels, and big data have in common?

Someone buried a massive cache of gemstones, coins, jewelry, and ornate objects under the floor of a cellar in the City of London, and it remained undiscovered and undisturbed there for about 300 years. The date of the burying of this treasure is fixed with considerable confidence over a fairly narrow range of time, between 1640 and 1666. The latter was the year of the Great Fire of London, and the treasure appeared to have been buried before that destructive event. The reason to conclude that the cache was buried after 1640 is the presence of a small, chipped, red intaglio with the emblem of the newly appointed 1st Viscount Stafford, an aristocratic title that had only just been established that year. Many of the contents of the cache appear to be from approximately that time period, late in the time of Shakespeare and Queen Elizabeth I. Others—such as a cameo carving from Egypt—were probably already quite ancient when the owner buried the collection of treasure in the early 17th century.

What this treasure represents and the reason for hiding it in the ground in the heart of the City of London are much less certain than its age. The items were of great value even at the time they were hidden (and are of much greater value today). The location where the treasure was buried was beneath a cellar at what was then 30–32 Cheapside. This spot was in a street of goldsmiths, silversmiths, and other jewelers. Because the collection contains a combination of set and unset jewels and because the location of the hiding place was under a building owned at the time by the Goldsmiths’ Company, the most likely explanation is that it was the stock-in-trade of a jeweler operating at that location in London in the early 1600s.

Why did the owner hide it? The owner may have buried it as a part of his normal work—as perhaps many of his fellow jewelers may have done from time to time with their own stock—in order to keep it secure during the regular course of business. In other words, the hidden location may have been functioning as a very inconvenient, primitive safe when something happened to the owner.

Most likely the security that the owner sought by burying his stock was in response to something unusual, a necessity that arose from upheavals such as civil war, plague, or an elevated level of activity by thieves. Perhaps the owner was going to be away for an extended time, and he buried the collection of jewelry to keep it safe for his return. Even if the owner left in order to escape the Great Fire, it’s unlikely that that conflagration prevented him from returning to recover the treasure. Very few people died in the fire. In any event, something went wrong with the plan. One assumes that if the location of the valuables were known, someone would have claimed it.

Another possible but less likely explanation is that the hidden bunch of valuables were stolen goods, held by a fence who was looking for a buyer. Or these precious items might have been secreted away and hoarded up a few at a time by someone employed by (and stealing from) the jeweler or someone hiding stock to obscure shady dealings, or evade paying off a debt or taxes. That idea isn’t so far-fetched. The collection is known to contain two counterfeit balas rubies that are believed to have been made by the jeweler Thomas Sympson of Cheapside. By 1610, Sympson had already been investigated for alleged fraudulent activities. These counterfeit stones are composed of egg-shaped quartz treated to accept a reddish dye, making them look like a type of large and very valuable ruby that was highly desired at the time. Regardless of the reason the treasure was hidden, something apparently went wrong for it to have remained undiscovered for so many years.

Although the identity of the original owner and his particular reasons for burying the collection of valuables may remain a mystery, the surprising story of the treasure’s recovery is better known. Excavations for building renovations at that address were underway in 1912 when workers first discovered pieces of treasure, and soon the massive hoard was unearthed underneath a cellar. These workers sold pieces mainly to a man nicknamed “Stony Jack” Lawrence, who in turn sold this treasure trove to several London museums. It is fairly astounding that this now-famous Cheapside Hoard thus made its way into preservation in museum collections rather than entirely disappearing among the men who found it. It is also surprising that apparently no attempt was made for the treasure (or for compensation) to go to the owners of the land who had authorized the excavation, the Goldsmiths’ Company.1

Today the majority of the hoard is held by the Museum of London, where it has been previously put on public display. A few other pieces of the treasure reside with the British Museum and the Victoria and Albert Museum. The Museum of London collection comprises spectacular pieces, including the lighthearted emerald salamander pictured in Figure 1-1.

Figure 1-1. Emerald salamander hat ornament from the Cheapside Hoard, much of which is housed in the Museum of London. This elaborate and whimsical piece of jewelry reflects the international nature of the jewelry business in London in the 17th century when the collection was hidden, presumably for security. The emeralds came from Colombia, the diamonds likely from India, and the gold work is European in style. (Image credit: Museum of London, image ID 65634, used with permission.)

Salamanders were sometimes used as symbol of renewal because they were believed to be able to emerge unharmed from a fire. This symbol seems appropriate for an item that survived the Great Fire of London as well as 300 years of being hidden. It was so well hidden, in fact, that with the rest of the hoard, it was lost even to the heirs of the original owner. This lost treasure was a security failure.

It was as important then as it is now to keep valuables in a secure place, otherwise they would likely disappear at the hands of thieves. But in the case of the Cheapside Hoard, the security plan went awry. Although the articles were of great value, no one related to the original owner claimed them throughout the centuries. Regardless of the exact identity of the original owner who hid the treasure, this story illustrates a basic challenge: there is a tension between locking down things of value to keep them secure and doing so in a way that they can be accessed and used appropriately and safely. The next story shows a different version of the problem.

During the American Civil War in the 1860s, both sides made use of several different cipher systems to encode secret messages. The need to guard information about troop movements, supplies, strategies, and the whereabouts of key officers or political figures is obvious, so encryption was a good idea. However, some of the easier codes were broken, while others posed a different problem. The widely employed Vigenére cipher, for example, was so difficult to use for encryption or for deciphering messages that mistakes were often made. A further problem that arose because the cipher was hard to use correctly was that comprehension of an important message was sometimes perilously delayed.2 The Vigenére cipher table is shown in Figure 1-2.

Vingenere square
Figure 1-2. The Vigenére square used to encode and decipher messages. While challenging to break, this manual encryption system was reported to be very difficult to use accurately and in a timely manner. (Image by Brandon T. Fields. Public domain via Wikimedia Commons.)

One such problem occurred during the Vicksburg Campaign. A Confederate officer, General Johnson, sent a coded message to General Kirby requesting troop reinforcements. Johnson made errors in encrypting the message using the difficult Vigenére cipher. As a result, Kirby spent 12 hours trying to decode the message—unsuccessfully. He finally resorted to sending an officer back to Johnson to get a direct message. The delay was too long; no help could be sent in time. A strong security method had been needed to prevent the enemy from reading messages, but the security system also needed to allow reasonably functional and timely use by both the sender and the intended recipient.

This Civil War example, like the hidden and lost Cheapside treasure, illustrates the idea that sometimes the problem with security is not a leak but a lock. Keeping valuables or valuable information safe is important, but it must be managed in such a way that it does not lock out the intended user.

In modern times, this delicate balance between security and safe access is a widespread issue. Even individuals face this problem almost daily. Most people are sufficiently savvy to avoid using an obvious or easy-to-remember password such as a birthday, pet name, or company name for access to secure online sites or to access a bank account via a cash point machine or ATM. But the problem with a not-easy-to-remember password is that it’s not easy to remember!

This situation is rather similar to what happens when tree squirrels busily hide nuts in the lawn, presumably to protect their hoard of food. Often the squirrels forget where they’ve put the nuts—you may have seen them digging frantically trying to find a treasure—with the result of many newly sprouted saplings the next year.

In the trade-off of problems related to security and passwords, it’s likely more common to forget your password than to undergo an attack, but that doesn’t mean it’s a good idea to forego using an obscure password. For the relatively simple situation of passwords, people (unlike tree squirrels) can of course get help. There are password-management systems to help people handle their obscure passwords. Of course these systems must themselves be carefully designed in order to remain secure.

These examples all highlight the importance of protecting something of value, even valuable data, but avoiding the problem that it becomes “so secure it’s lost.”

Safe Access in Secure Big Data Systems

Our presumption is that you’ve probably read about 50 books on locking down data. But the issue we’re tackling in this book is quite a different sort of problem: how to safely access or share data after it is secured.

As we begin to see the huge benefits of saving a wide range of data from many sources, including system log files, sensor data, user behavior histories, and more, big data is becoming a standard part of our lives. Of course many types of big data need to be protected through strong security measures, particularly if it involves personally identifiable information (PII), government secrets, or the like. The sectors that first come to mind when considering who has serious requirements for security are the financial, insurance, and health care sectors and government agencies. But even retail merchants or online services have PII related to customer accounts. The need for tight security measures is therefore widespread in big data systems, involving standard security processes such as authentication, authorization, encryption, and auditing. Emerging big data technologies, including Hadoop- and NoSQL-based platforms, are being equipped with these capabilities, some through integrated features and others through add-on features or via external tools. In short, secured big data systems are widespread.

For the purposes of this book, we assume as our starting point that you’ve already got your data locked down securely.

Locking down sensitive data (or hiding valuables) well such that thieves cannot reach it makes sense, but of course you also need to be able to get access when desired, and that in turn can create vulnerability. Consider this analogy: if you want to keep an intruder from entering a door, the safest bet is to weld the door shut. Of course, doing so makes the door almost impossible to use—that’s why people generally use padlocks instead of welding. But the fact is, as soon as you give out the combination or key to the padlock, you’ve slightly increased the risk of an unwanted intruder getting entry. Sharing a way to unlock the door to important data is a necessary part of using what you have, but you want to do so carefully, and in ways that will minimize the risk.

So with that thought, we begin our look at what happens when you need to access or share secure data. Doing this safely is not always as easy as it sounds, as shown by the examples we discuss in the next chapter. Then, in Chapter 3 and Chapter 4, we introduce two different solutions to the problem that enable you to safely manage how you use secure data. We also describe some real-world success stories that have already put these ideas into practice. These descriptions, which are non-technical, show you how these approaches work and the basic idea of how you might put them to use in your own situations. The remaining chapters provide a technical deep-dive into the implementation of these techniques, including a link to open source code that should prove helpful.

1 Fosyth, Hazel. Cheapside Hoard: London’s Lost Treasures: The Cheapside Hoard. London: Philip Wilson Publishers, 2013.

2 Civil War Code

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required