April 2018
Beginner
238 pages
7h 13m
English
We can use a similar script that loads the file, and then we use additional functions to pull out the records of interest. The coding is:
import pysparkif not 'sc' in globals(): sc = pyspark.SparkContext()textFile = sc.textFile("access_log")print(textFile.count(),"access records")gets = textFile.filter(lambda line: "GET" in line)print(gets.count(),"GETs")posts = textFile.filter(lambda line: "POST" in line)print(posts.count(),"POSTs")other = textFile.subtract(gets).subtract(posts)print(other.count(),"Other")for x in other.collect(): print(x)
This produces the output:

Interesting that so few other HTTP actions take place beyond ...