July 2017
Beginner to intermediate
420 pages
10h 56m
English
I can just apply some knowledge about my site, where I happen to know that all the legitimate pages on my site just end with a slash in their URL. So, let's go ahead and modify this again, to strip out anything that doesn't end with a slash:
URLCounts = {}with open (logPath, "r") as f: for line in (l.rstrip() for 1 in f): match= format_pat.match(line) if match: access = match.groupdict() agent = access['user_agent'] if (not('bot' in agent or 'spider' in agent or 'Bot' in agent or 'Spider' in agent or 'W3 Total Cache' in agent or agent =='-')): request = access['request'] fields = request.split() if (len(fields) == 3): (action, URL, protocol) = fields if (URL.endswith("/")): if (action ...Read now
Unlock full access