March 2017
Beginner to intermediate
866 pages
18h 4m
English
In upsampling, the frequency of the time series is increased. As a result, we have more sample points than data points. One of the main questions is how to account for the entries in the series where we have no measurement.
Let's start with hourly data for a single day:
>>> rng = pd.date_range('4/29/2015 8:00', periods=10, freq='H') >>> ts = pd.Series(np.random.randint(0, 100, len(rng)), index=rng) >>> ts.head() 2015-04-29 08:00:00 30 2015-04-29 09:00:00 27 2015-04-29 10:00:00 54 2015-04-29 11:00:00 9 2015-04-29 12:00:00 48 Freq: H, dtype: int64
If we upsample to data points taken every 15 minutes, our time series will be extended with NaN values:
>>> ts.resample('15min') >>> ts.head() 2015-04-29 08:00:00 30 2015-04-29 ...