看过来
《pandas 教程》 持续更新中,提供建议、纠错、催更等加作者微信: gr99123(备注:pandas教程)和关注公众号「盖若」ID: gairuo。跟作者学习,请进入 Python学习课程。欢迎关注作者出版的书籍:《深入浅出Pandas》 和 《Python之光》。
Timedelta 是时间差异,也就是时长,以差异单位表示,例如 天,小时,分钟,秒。 它们可以是正数的也可以是负数的。也就是两个时间的距离长短,这在业务中会经常出现,比如体育比赛中短跑的时长,可能是正数,也可能是负数,正数代表时间多多少,负数代表时间少多少。
Timedelta
数据类型用来代表时间增量,两个固定时间相减会产生时差:
# 两个固定时间相减
pd.Timestamp('2020-11-01 15') - pd.Timestamp('2020-11-01 14')
# Timedelta('0 days 01:00:00')
pd.Timestamp('2020-11-01 08') - pd.Timestamp('2020-11-02 08')
# Timedelta('-1 days +00:00:00')
按以下格式传入字符串:
# 一天
pd.Timedelta('1 days')
# Timedelta('1 days 00:00:00')
pd.Timedelta('1 days 00:00:00')
# Timedelta('1 days 00:00:00')
pd.Timedelta('1 days 2 hours')
# Timedelta('1 days 02:00:00')
pd.Timedelta('-1 days 2 min 3us')
# Timedelta('-2 days +23:57:59.999997'
用关键字参数指定时间:
pd.Timedelta(days=5, seconds=10)
# Timedelta('5 days 00:00:10')
pd.Timedelta(minutes=3, seconds=2)
# Timedelta('0 days 00:03:02')
# 可以实现指定分钟有多少天,多少小时
pd.Timedelta(minutes=3242)
使用带周期量的偏移量别名:
# 一天
pd.Timedelta('1D')
# Timedelta('1 days 00:00:00')
# 两周
pd.Timedelta('2W')
# Timedelta('14 days 00:00:00')
# 一天2小时3分钟4秒
pd.Timedelta('1D2H3M4S')
带单位的整型数字:
# 一天
pd.Timedelta(1, unit='d')
# 100 秒
pd.Timedelta(100, unit='s')
# Timedelta('0 days 00:01:40')
# 4 周
pd.Timedelta(4, unit='w')
# Timedelta('28 days 00:00:00')
Python内置的datetime.timedelta或者Numpy的np.timedelta64:
import datetime
import numpy as np
# 一天10分钟
pd.Timedelta(datetime.timedelta(days=1, minutes=10))
# Timedelta('1 days 00:10:00')
# 100纳秒
pd.Timedelta(np.timedelta64(100, 'ns'))
# Timedelta('0 days 00:00:00.000000100')
负值:
# 负值
pd.Timedelta('-1min')
# Timedelta('-1 days +23:59:00')
# 空值,缺失值
pd.Timedelta('nan')
# NaT
# pd.Timedelta('nat')
# NaT
标准字符串(ISO 8601 Duration strings):
# ISO 8601 Duration strings
pd.Timedelta('P0DT0H1M0S')
# Timedelta('0 days 00:01:00')
pd.Timedelta('P0DT0H0M0.000000123S')
# Timedelta('0 days 00:00:00.000000')
也可以用 DateOffsets (Day, Hour, Minute, Second, Milli, Micro, Nano) 来构建:
pd.Timedelta(pd.offsets.Second(2))
# Timedelta('0 days 00:00:02')
可以直接生成单个时长数据:
pd.to_timedelta('1 days 06:05:01.00003')
# Timedelta('1 days 06:05:01.000030')
pd.to_timedelta('15.5us')
# Timedelta('0 days 00:00:00.000015')
pd.to_timedelta(pd.offsets.Day(3))
# Timedelta('3 days 00:00:00')
pd.to_timedelta('15.5min')
# Timedelta('0 days 00:15:30')
pd.to_timedelta(124524564574835)
# Timedelta('1 days 10:35:24.564574835')
类列表生成TimedeltaIndex
数据:
pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan'])
# TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015', NaT], dtype='timedelta64[ns]', freq=None)
pd.to_timedelta(np.arange(5), unit='s')
# TimedeltaIndex(['00:00:00', '00:00:01', '00:00:02', '00:00:03', '00:00:04'], dtype='timedelta64[ns]', freq=None)
pd.to_timedelta(np.arange(5), unit='d')
# TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
如时间戳数据一样,时长数据的存储也有上下限:
pd.Timedelta.min
# Timedelta('-106752 days +00:12:43.145224')
pd.Timedelta.max
# Timedelta('106751 days 23:47:16.854775')
时长可以相加:
pd.Timedelta(pd.offsets.Day(2)) + pd.Timedelta(pd.offsets.Second(2)) +\
pd.Timedelta('00:00:00.000123')
# Timedelta('2 days 00:00:02.000123')
以下是一些操作示例:
s = pd.Series(pd.date_range('2012-1-1', periods=3, freq='D'))
td = pd.Series([pd.Timedelta(days=i) for i in range(3)])
df = pd.DataFrame({'A': s, 'B': td})
df
'''
A B
0 2012-01-01 0 days
1 2012-01-02 1 days
2 2012-01-03 2 days
'''
df['C'] = df['A'] + df['B']
df
'''
A B C
0 2012-01-01 0 days 2012-01-01
1 2012-01-02 1 days 2012-01-03
2 2012-01-03 2 days 2012-01-05
'''
df.dtypes
'''
A datetime64[ns]
B timedelta64[ns]
C datetime64[ns]
dtype: object
'''
s - s.max()
'''
0 -2 days
1 -1 days
2 0 days
dtype: timedelta64[ns]
'''
s - datetime.datetime(2011, 1, 1, 3, 5)
'''
0 364 days 20:55:00
1 365 days 20:55:00
2 366 days 20:55:00
dtype: timedelta64[ns]
'''
s + datetime.timedelta(minutes=5)
'''
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
'''
s + pd.offsets.Minute(5)
'''
0 2012-01-01 00:05:00
1 2012-01-02 00:05:00
2 2012-01-03 00:05:00
dtype: datetime64[ns]
'''
s + pd.offsets.Minute(5) + pd.offsets.Milli(5)
'''
0 2012-01-01 00:05:00.005
1 2012-01-02 00:05:00.005
2 2012-01-03 00:05:00.005
dtype: datetime64[ns]
'''
# 两个时间标量之间想减
y = s - s[0]
y
'''
0 0 days
1 1 days
2 2 days
dtype: timedelta64[ns]
'''
# 位移后
y = s - s.shift()
y
'''
0 NaT
1 1 days
2 1 days
dtype: timedelta64[ns]
'''
# 绝对值会将负数变为正数
td1 = pd.Timedelta('-1 days 2 hours 3 seconds')
abs(td1)
# Timedelta('1 days 02:00:03')
其他:
y.fillna(pd.Timedelta(0))
时长序列可以做统计性计算:
y2 = pd.Series(pd.to_timedelta(['-1 days +00:00:05',
'nat',
'-1 days +00:00:05',
'1 days']))
y2
'''
0 -1 days +00:00:05
1 NaT
2 -1 days +00:00:05
3 1 days 00:00:00
dtype: timedelta64[ns]
'''
y2.mean()
# Timedelta('-1 days +16:00:03.333333')
y2.median()
# Timedelta('-1 days +00:00:05')
y2.quantile(.1)
# Timedelta('-1 days +00:00:05')
y2.sum()
# Timedelta('-1 days +00:00:10')
更新时间:2020-11-04 16:19:34 标签:pandas 时长