pandas between_time() 筛选指定时间段内数据

看过来

《pandas 教程》持续更新中，提供建议、纠错、催更等加作者微信: gr99123（备注：pandas教程）和关注公众号「盖若」ID: gairuo。跟作者学习，请进入 Python学习课程。欢迎关注作者出版的书籍：《深入浅出Pandas》和《Python之光》。

between_time() 是 pandas 中用于筛选指定时间段内数据的方法。它主要应用于具有 DatetimeIndex 或 TimedeltaIndex 的 DataFrame 或 Series，对于时间序列数据的分析非常有用。

语法

DataFrame.between_time(start_time, end_time,
	inclusive='both', axis=None)

适用数据类型

DataFrame
Series

参数意义

start_time:
- 类型: datetime.time 或者字符串格式的时间（如 '09:00'）
- 意义: 指定时间段的起始时间。
end_time:
- 类型: datetime.time 或者字符串格式的时间（如 '17:00'）
- 意义: 指定时间段的结束时间。
inclusive:
- 类型: {'both', 'neither', 'left', 'right'}，默认为 'both'
- 意义:
  - 'both'：包含 start_time 和 end_time。
  - 'neither'：不包含 start_time 和 end_time。
  - 'left'：仅包含 start_time。
  - 'right'：仅包含 end_time。
axis:
- 类型: int 或 None，默认为 None
- 意义:
  - None 或 0（默认）表示在行索引上操作。
  - axis=1 表示在列索引上操作。

返回值

返回值类型:
- 如果作用于 DataFrame，则返回一个新的 DataFrame，只包含在指定时间段内的行。
- 如果作用于 Series，则返回一个新的 Series，只包含在指定时间段内的行。

使用场景

提取特定时间段的数据: 在时间序列数据中，between_time() 可以方便地提取某个时间段内的所有记录，如工作时间的交易数据、白天的温度记录等。
数据过滤: 当需要分析数据集中某个特定时间段内的趋势时，可以使用 between_time() 过滤出该时间段的数据进行进一步的分析。

示例

i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
ts
'''
                     A
2018-04-09 00:00:00  1
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
2018-04-12 01:00:00  4
'''

ts.between_time('0:15', '0:45')
'''
                     A
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
'''

通过将start_time设置为晚于end_time，可以获得不在两个时间之间的时间：

ts.between_time('0:45', '0:15')
'''
                     A
2018-04-09 00:00:00  1
2018-04-12 01:00:00  4
'''

案例

例1: 基础用法

假设有一个包含日期和时间的 DatetimeIndex 的 DataFrame，每小时记录一次数据。

import pandas as pd
import numpy as np

# 构造示例数据
date_rng = pd.date_range(start='2023-01-01', end='2023-01-02', freq='H')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0, 100, size=(len(date_rng)))

df.set_index('date', inplace=True)

# 筛选每天上午 9 点到下午 5 点之间的数据
df_between_time = df.between_time('09:00', '17:00')

print(df)
print(df_between_time)

输出:

'''
                       data
date                       
2023-01-01 00:00:00     15
2023-01-01 01:00:00     50
2023-01-01 02:00:00     95
2023-01-01 03:00:00     73
2023-01-01 04:00:00     19
...                    ...
2023-01-01 21:00:00     63
2023-01-01 22:00:00     24
2023-01-01 23:00:00     78
2023-01-02 00:00:00     31

[26 rows x 1 columns]

                       data
date                       
2023-01-01 09:00:00     60
2023-01-01 10:00:00     72
2023-01-01 11:00:00     45
2023-01-01 12:00:00     88
2023-01-01 13:00:00     33
2023-01-01 14:00:00     49
2023-01-01 15:00:00     28
2023-01-01 16:00:00     80
2023-01-01 17:00:00     56

[9 rows x 1 columns]
'''

在这个例子中，between_time('09:00', '17:00') 提取了每天上午 9 点到下午 5 点之间的数据。

例2: 使用 inclusive 参数

假设您想要筛选出不包含起始时间点和结束时间点的数据。

import pandas as pd
import numpy as np

# 构造示例数据
date_rng = pd.date_range(start='2023-01-01', end='2023-01-02', freq='H')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0, 100, size=(len(date_rng)))

df.set_index('date', inplace=True)

# 筛选每天上午 9 点到下午 5 点之间的数据，不包含起始和结束时间点
df_between_time_exclusive = df.between_time('09:00', '17:00', inclusive='neither')

print(df)
print(df_between_time_exclusive)

输出:

'''
                       data
date                       
2023-01-01 00:00:00     15
2023-01-01 01:00:00     50
2023-01-01 02:00:00     95
2023-01-01 03:00:00     73
2023-01-01 04:00:00     19
...                    ...
2023-01-01 21:00:00     63
2023-01-01 22:00:00     24
2023-01-01 23:00:00     78
2023-01-02 00:00:00     31

[26 rows x 1 columns]

                       data
date                       
2023-01-01 10:00:00     72
2023-01-01 11:00:00     45
2023-01-01 12:00:00     88
2023-01-01 13:00:00     33
2023-01-01 14:00:00     49
2023-01-01 15:00:00     28
2023-01-01 16:00:00     80

[7 rows x 1 columns]
'''

在这个例子中，between_time('09:00', '17:00', inclusive='neither') 筛选出了每天上午 9 点到下午 5 点之间的数据，但不包含 9 点和 5 点的数据。

总结

between_time() 是一个非常实用的方法，可以帮助您在时间序列数据中快速提取指定时间段内的记录。它可以灵活设置是否包含起始和结束时间点，从而满足不同的分析需求。

参考

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.between_time.html

< between() 数据是否在指定的范围内 pandas 查询筛选数据 at_time() 筛选特定时间数据 >

更新时间：2024-08-09 08:41:54 标签：pandas python 时间 between