April 2018
Intermediate to advanced
408 pages
10h 42m
English
Once we have access to all of the lines of each log file, we can extract details of the access that's described. We'll use a regular expression to decompose the line. From there, we can build a namedtuple object.
Here is a regular expression to parse lines in a CLF file:
import reformat_pat = re.compile( r"(?P<host>[\d\.]+)\s+" r"(?P<identity>\S+)\s+" r"(?P<user>\S+)\s+" r"\[(?P<time>.+?)\]\s+" r'"(?P<request>.+?)"\s+' r"(?P<status>\d+)\s+" r"(?P<bytes>\S+)\s+" r'"(?P<referer>.*?)"\s+' # [SIC] r'"(?P<user_agent>.+?)"\s*')
We can use this regular expression to break each row into a dictionary of nine individual data elements. The use of [] and " to delimit complex fields such as the time, request, referrer ...