4Big Data Storage

After reading this chapter, you should be able to:

  • Identify the differences between different Big Data storage patterns
  • Set up on premise data storage
  • Use cloud data warehousing solutions
  • Come up with a hybrid data warehouse
  • Explain pros and cons of different solutions

In Chapter 3, few ways to deal with data were addressed. When the business requires more than simple solutions, we have to be ready. Before the emergence of big data, businesses already used solutions for analytics and business intelligence. However, these solutions are extremely expensive. Fortunately, the vast amount of data and advancement in distributed computing boosted the big data storage solutions. We will go through types of data storage and then discuss various technologies for different setups.

4.1 Big Data Storage Patterns

Over the years, storage patterns have changed drastically with regard to data processing. With the pace of big data, many concepts are presently obtained that are very close to each other but with subtle differences. The big data platform should effectively use relevant patterns to make the best out of the data. In this section, we will explain storage patterns: data lakes, data warehouses, and data marts.

4.1.1 Data Lakes

The appearance of the data lake concept is quite new. Dixon (2010) first referred to the term as a large body of water in a natural state. The catch here is that data lakes may contain raw data without any processing and need further processing ...

Get Designing Big Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.