pandas 时序数据方法

看过来

《pandas 教程》持续更新中，提供建议、纠错、催更等加作者微信: gr99123（备注：pandas教程）和关注公众号「盖若」ID: gairuo。跟作者学习，请进入 Python学习课程。欢迎关注作者出版的书籍：《深入浅出Pandas》和《Python之光》。

如同其他类型的数据，时序数据需要做一些汇总、位移、计算等操作。本文将介绍 Pandas 对时序数据的一些处理操作方法。

移动 Shifting

可能需要将时间序列中的值在时间上前后移动或滞后。 shift() 方法也可以在时序对象上使用。

rng = pd.date_range('2020-06-01', '2020-06-03')
ts = pd.Series(range(len(rng)), index=rng)
ts
'''
2020-06-01    0
2020-06-02    1
2020-06-03    2
Freq: D, dtype: int64
'''
ts.shift(1)
'''
2020-06-01    NaN
2020-06-02    0.0
2020-06-03    1.0
Freq: D, dtype: float64
'''

shift 方法接受 freq 参数，该参数可以接受 DateOffset 类或其他类似 timedelta 的对象，也可以接受偏移别名：

# 工作日
ts.shift(3, freq=pd.offsets.BDay())
'''
2020-06-04    0
2020-06-05    1
2020-06-08    2
Freq: B, dtype: int64
'''
# 工作日月末
ts.shift(3, freq='BM')
'''
2020-08-31    0
2020-08-31    1
2020-08-31    2
Freq: D, dtype: int64
'''

除了更改数据和索引的对齐方式之外，DataFrame和Series对象还具有 tshift() 便捷方法，该方法将索引中的所有日期更改指定的偏移量（只移动索引）：

ts.tshift(3, freq='D')
'''
2020-06-04    0
2020-06-05    1
2020-06-06    2
Freq: D, dtype: int64
'''

请注意，使用tshift时，最前边的条目将不再是NaN，因为不会重新对齐数据。

频率转换

更改频率的主要功能是 asfreq() 方法。

dr = pd.date_range('1/1/2010', periods=3, freq=3 * pd.offsets.BDay())
ts = pd.Series(np.random.randn(3), index=dr)
ts
'''
2010-01-01    0.896958
2010-01-06   -1.571894
2010-01-11    1.886263
Freq: 3B, dtype: float64
'''
# 从3个工作日转为一个工作日
ts.asfreq(pd.offsets.BDay())
'''
2010-01-01    1.069063
2010-01-04         NaN
2010-01-05         NaN
2010-01-06    0.784018
2010-01-07         NaN
2010-01-08         NaN
2010-01-11    0.490291
Freq: B, dtype: float64
'''

asfreq 提供了更多便利，因此您可以为频率转换后可能出现的任何间隙指定插值方法。

ts.asfreq(pd.offsets.BDay(), method='pad')
'''
2010-01-01    1.069063
2010-01-04    1.069063
2010-01-05    1.069063
2010-01-06    0.784018
2010-01-07    0.784018
2010-01-08    0.784018
2010-01-11    0.490291
Freq: B, dtype: float64
'''
# 对空值进行填充
ts.asfreq(freq='30S', fill_value=9.0)

对于 DatetimeIndex，这基本上只是reindex() 的一个很方便的包装器，该包装器生成 date_range 并调用 reindex。

date_index = pd.date_range('1/1/2010', periods=6, freq='D')
df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
                   index=date_index)
df2
'''
            prices
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
'''

# 假设我们决定扩展数据框以覆盖更大的日期范围
date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
df2.reindex(date_index2)
'''
            prices
2009-12-29     NaN
2009-12-30     NaN
2009-12-31     NaN
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN
'''

默认情况下，原始数据框中没有值的索引条目（例如，“ 2009-12-29”）将用NaN填充。如果需要，我们可以使用几个选项之一（{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}）来填写缺失值。

例如，要反向传播最后一个有效值以填充NaN值，请将 bfill 作为参数传递给method关键字。

df2.reindex(date_index2, method='bfill')
'''
            prices
2009-12-29   100.0
2009-12-30   100.0
2009-12-31   100.0
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN
'''

其他方法

# DatetimeIndex 转为 Python 原生 datetime.datetime 类型
df2.index.to_pydatetime()

< 时间序列索引 pandas 教程时间偏移 >

更新时间：2020-06-13 23:07:55 标签：pandas 时间方法