Since we now have labeled data for January and February 2019, we can evaluate how the model performs each month. First, we read in the 2019 data from the database:
>>> with sqlite3.connect('logs/logs.db') as conn:... logs_2019 = pd.read_sql(... """... SELECT * ... FROM logs ... WHERE datetime BETWEEN "2019-01-01" AND "2020-01-01";... """, conn, parse_dates=['datetime'], index_col='datetime'... )>>> hackers_2019 = pd.read_sql(... """... SELECT * ... FROM attacks ... WHERE start BETWEEN "2019-01-01" AND "2020-01-01";... """, conn, parse_dates=['start', 'end']... ).assign(... start_floor=lambda x: x.start.dt.floor('min'),... end_ceil=lambda x: x.end.dt.ceil('min')... )
Next, we isolate the January 2019 data:
>>> X_jan, y_jan ...