pandas 爆炸字典类型数据

看过来

《pandas 教程》持续更新中，提供建议、纠错、催更等加作者微信: gr99123（备注：pandas教程）和关注公众号「盖若」ID: gairuo。跟作者学习，请进入 Python学习课程。欢迎关注作者出版的书籍：《深入浅出Pandas》和《Python之光》。

pandas 中的 explode() 将列表数据爆炸为行数据，在本例中我们会遇到一个字典格式数据，来看看如何使用 pandas 来处理。

数据与需求

以下为我们的源数据：

import pandas as pd

d = {100: ['1A', '2B'], 200: ['2A', '3B'], 300: ['3A', '3B', '4B'], }
df = pd.DataFrame({'a': ['new', 'old'], 'b': [d, ['99A', '99B']]})
df.style # 在 JupyterLab 上展示完整数据
'''
     a                                                                b
0  new  {100: ['1A', '2B'], 200: ['2A', '3B'], 300: ['3A', '3B', '4B']}
1  old                                                       [99A, 99B]
'''

期望得到的数据为：

'''
     a    b             c
0  new  100      [1A, 2B]
0  new  200      [2A, 3B]
0  new  300  [3A, 3B, 4B]
1  old   -1    [99A, 99B]
'''

思路

通过观察看到，a 列 old 行对应的 b 值是一个列表，而 new 对应的值是一个字典，我们可以将它也构造成一个字典，键为需求中的 -1。

用 explode() 爆炸时，b 列为字典的键，c 列为字典的值，现在缺少 c 列，我们再根据 b 列构造值为仅为字典值的 c 列。

最后用 explode() 爆炸 b 和 c 列。

代码

先重新指定 b 列，用 map() 方法判断值不为字典时构造成键为 -1 值为原值的字典：

(
    df.assign(b=df.b.map(lambda x: x if isinstance(x, dict) else {-1: x}))
)
'''
     a                                                                b
0  new  {100: ['1A', '2B'], 200: ['2A', '3B'], 300: ['3A', '3B', '4B']}
1  old                                             {-1: ['99A', '99B']}
'''

这样，两行数据均为字典类型，接下来，增加 c 列，c 列仅为字典值：

(
    df.assign(b=df.b.map(lambda x: x if isinstance(x, dict) else {-1: x}))
    .assign(c=lambda x: [i.values() for i in x.b])
)
'''
    a   b   c
0   new ...   ([1A, 2B], [2A, 3B], [3A, 3B, 4B])
1   old ... ['99A', '99B']}    ([99A, 99B])
# 省略了 b 列的显示
'''

c 列为 b 列的值视图，最后爆炸 b 和 c 列：

(
    df.assign(b=df.b.map(lambda x: x if isinstance(x, dict) else {-1: x}))
    .assign(c=lambda x: [i.values() for i in x.b])
    .explode(['b', 'c'])
)
'''
     a    b             c
0  new  100      [1A, 2B]
0  new  200      [2A, 3B]
0  new  300  [3A, 3B, 4B]
1  old   -1    [99A, 99B]
'''

得到了需要的结果，要注意的时，explode() 爆炸的数据类型不仅仅是列表，可以一个 list-like 鸭子类型，包括列表、元组、集合、Series 和 np.ndarray。

上例中 b 列是字典，爆炸后的值是字典的键，c 列是字典值视图。

（完）

pandas 爆炸字典类型数据

数据与需求

思路

代码

相关内容