Extract, transform, and load

Let's consider the task of loading HTTP logs in the database and preparing it for analysis. Such tasks in data warehouse solutions are called extract, transform, and load (ETL).

Suppose the HTTP server that runs the car portal application is writing access logs in files and they are recreated every day. Assuming that the server runs nginx, a popular HTTP server software, lines in such log files by default should look similar to this:

94.31.90.168 - - [01/Jul/2017:01:00:22 +0000] "GET / HTTP/1.1" 200 227 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"

It has the given fields, separated by a space: remote address, remote user, timestamp, 

Get Learning PostgreSQL 10 - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.