pandas 删除连续6个以上小于0.4的值

看过来

《pandas 教程》持续更新中，提供建议、纠错、催更等加作者微信: gr99123（备注：pandas教程）和关注公众号「盖若」ID: gairuo。跟作者学习，请进入 Python学习课程。欢迎关注作者出版的书籍：《深入浅出Pandas》和《Python之光》。

以下数据，需要把连续 6 个以上且值小于 0.4 的删除。首先将数据读取到 Pandas 的 DataFrame 中，并指定合适的列名。

'''
2019/1/1 0:00 2.5 
2019/1/1 0:10 2.4 
2019/1/1 0:20 2.6 
2019/1/1 0:30 2.7 
2019/1/1 0:40 2.9 
2019/1/1 0:50 0.3 
2019/1/1 1:00 0.35 
2019/1/1 1:10 1.8 
2019/1/1 1:20 2.9 
2019/1/1 1:30 2.7 
2019/1/1 1:40 0.2 
2019/1/1 1:50 0.1 
2019/1/1 2:00 0.25 
2019/1/1 2:10 0.31 
2019/1/1 2:20 0.24 
2019/1/1 2:30 0.24 
2019/1/1 2:40 0.24 
2019/1/1 2:50 0.21 
2019/1/1 3:00 3.1 
2019/1/1 3:10 3.2 
2019/1/1 3:20 2.9
'''

# 复制上列数据
df = (pd.read_clipboard(header=None,
                       parse_dates={'time':[0,1]})
      .rename(columns={2: 'x'})
)
'''
                  time     x
0  2019-01-01 00:00:00  2.50
1  2019-01-01 00:10:00  2.40
2  2019-01-01 00:20:00  2.60
3  2019-01-01 00:30:00  2.70
4  2019-01-01 00:40:00  2.90
5  2019-01-01 00:50:00  0.30
6  2019-01-01 01:00:00  0.35
7  2019-01-01 01:10:00  1.80
8  2019-01-01 01:20:00  2.90
9  2019-01-01 01:30:00  2.70
10 2019-01-01 01:40:00  0.20
11 2019-01-01 01:50:00  0.10
12 2019-01-01 02:00:00  0.25
13 2019-01-01 02:10:00  0.31
14 2019-01-01 02:20:00  0.24
15 2019-01-01 02:30:00  0.24
16 2019-01-01 02:40:00  0.24
17 2019-01-01 02:50:00  0.21
18 2019-01-01 03:00:00  3.10
19 2019-01-01 03:10:00  3.20
20 2019-01-01 03:20:00  2.90
'''

接着对是否小于 0.4 进行判断并按连续段进行标识。按标记进行筛选。

(
    df.assign(foo=(df.x < 0.4).ne((df.x < 0.4).shift()).cumsum())
)
'''
                  time     x  foo
0  2019-01-01 00:00:00  2.50    1
1  2019-01-01 00:10:00  2.40    1
2  2019-01-01 00:20:00  2.60    1
3  2019-01-01 00:30:00  2.70    1
4  2019-01-01 00:40:00  2.90    1
5  2019-01-01 00:50:00  0.30    2
6  2019-01-01 01:00:00  0.35    2
7  2019-01-01 01:10:00  1.80    3
8  2019-01-01 01:20:00  2.90    3
9  2019-01-01 01:30:00  2.70    3
10 2019-01-01 01:40:00  0.20    4
11 2019-01-01 01:50:00  0.10    4
12 2019-01-01 02:00:00  0.25    4
13 2019-01-01 02:10:00  0.31    4
14 2019-01-01 02:20:00  0.24    4
15 2019-01-01 02:30:00  0.24    4
16 2019-01-01 02:40:00  0.24    4
17 2019-01-01 02:50:00  0.21    4
18 2019-01-01 03:00:00  3.10    5
19 2019-01-01 03:10:00  3.20    5
20 2019-01-01 03:20:00  2.90    5
'''

(
    df.assign(foo=(df.x < 0.4).ne((df.x < 0.4).shift()).cumsum())
    .groupby('foo').filter(lambda x: len(x) < 6)
)
'''
                  time     x  foo
0  2019-01-01 00:00:00  2.50    1
1  2019-01-01 00:10:00  2.40    1
2  2019-01-01 00:20:00  2.60    1
3  2019-01-01 00:30:00  2.70    1
4  2019-01-01 00:40:00  2.90    1
5  2019-01-01 00:50:00  0.30    2
6  2019-01-01 01:00:00  0.35    2
7  2019-01-01 01:10:00  1.80    3
8  2019-01-01 01:20:00  2.90    3
9  2019-01-01 01:30:00  2.70    3
18 2019-01-01 03:00:00  3.10    5
19 2019-01-01 03:10:00  3.20    5
20 2019-01-01 03:20:00  2.90    5
'''

（完）

pandas 删除连续6个以上小于0.4的值

相关内容