Chapter 8

Playing in the Sandbox

IN THIS CHAPTER

check Using your sandbox as an analytical development environment

check Promoting analytics from the sandbox to production

check Playing around with data in the sandbox

At first glance, the sandbox area of your data lake really messes up the basic understanding of how data flows into, through, and then out of your data lake. Think about it: Part of the reason for using bronze, silver, and gold for the names of the three primary data lake zones is to clearly indicate a progression of your data. If you know anything about the Olympics, you know that gold medals are the best, silver medals, the second best, and bronze medals, third. But who ever heard of a sandbox medal?

Never fear. Your sandbox sits alongside your three primary data lake zones and serves three primary purposes:

  • To be a development environment for new analytical models
  • To compare different data lake architectural options
  • To be a place to experiment and “play around” with data

All three of these purposes have one important factor in common: isolating not-yet-ready-for-prime-time experimentation and development away from the production side of your organization’s analytics and data usage. ...

Get Data Lakes For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.