3A Minimal Data Processing and Management System

After reading this chapter, you should be able to:

  • Use Unix tooling for data processing
  • Build a pipeline to extract information
  • Automate common data tasks
  • Use PostgreSQL for large data

Overcomplicating a problem more than it has to be is a general issue in computing. It is hard to per se “keep it simple stupid.” Experience and time are essential to formulate solutions that are good enough for the problem. In this chapter, solutions for relatively large data are developed using standard tools and open‐source software.

3.1 Problem Definition

Defining a problem and working on this through different strategies are the focus in this chapter. Let us assume we are hosting an online book store.

3.1.1 Online Book Store

Imagine a scenario where we have an online book store. The online bookstore has a shopping process where consumers directly buy books from the seller in real time. The book store has an interface for searching the books and a landing page with some of the recommended books. As the user navigates through the website, he/she can add books to the shopping basket. At any time, the user can decide to buy books and check out. If the user is not registered, the website first attempts to register the user. After that, the user can choose a shipping address. Finally, the website offers payment options.

3.1.2 User Flow Optimization

A special attention is given to the user flow optimization. We aim to have a better understanding ...

Get Designing Big Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.