Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
by EMC Education Services
11
Advanced Analytics—Technology and Tools: In-Database Analytics
Key Concepts
MADlib
Regular expressions
SQL
User-defined functions
Window functions
In-database analytics is a broad term that describes the processing of data within its repository. In many of the earlier R examples, data was extracted from a data source and loaded into R. One advantage of in-database analytics is that the need for movement of the data into an analytic tool is eliminated. Also, by performing the analysis within the database, it is possible to obtain almost real-time results. Applications of in-database analytics include credit card transaction fraud detection, product recommendations, and web advertisement selection tailored for a particular user.
A popular open-source database is PostgreSQL. This name references an important in-database analytic language known as Structured Query Language (SQL). This chapter examines basic as well as advanced topics in SQL. The provided examples of SQL code were tested against Greenplum database 4.1.1.1, which is based on PostgreSQL 8.2.15. However, the presented concepts are applicable to other SQL environments.
11.1 SQL Essentials
A relational database, part of a Relational Database Management System (RDBMS), organizes data in tables with established relationships between the tables. Figure 11-1 shows the relationships between five tables used to store details about orders placed at an e-commerce retailer.
FIGURE 11-1 Relationship diagram
The table orders ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access