Loading data into a data warehouse

Let's consider the task of loading HTTP logs into the database and preparing them for analysis. Such tasks in data warehouse solutions are called extract, transform, and load (ETL).

Suppose the HTTP server that runs the car portal application is writing access logs in files and they are recreated every day. Assuming that the server runs nginx, a popular HTTP server software, lines in such log files by default should look similar to this:

94.31.90.168 - - [01/Jul/2017:01:00:22 +0000] "GET / HTTP/1.1" 200 227 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"

It has the following fields, separated by a space: remote address, remote user, ...

Get Learning PostgreSQL 11 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.