December 2018
Beginner to intermediate
682 pages
18h 1m
English
Now that we have found the longest streaks of on-time arrivals, we can easily find the opposite--the longest streak of delayed arrivals. The following function returns two rows for each group passed to it. The first row is the start of the streak, and the last row is the end of the streak. Each row contains the month and day that the streak started/ended, along with the total streak length:
>>> def max_delay_streak(df): df = df.reset_index(drop=True) s = 1 - df['ON_TIME'] s1 = s.cumsum() streak = s.mul(s1).diff().where(lambda x: x < 0) \ .ffill().add(s1, fill_value=0) last_idx = streak.idxmax() first_idx = last_idx - streak.max() + 1 df_return = df.loc[[first_idx, last_idx], ['MONTH', 'DAY']] df_return['streak'] = streak.max() ...