看过来
《pandas 教程》 持续更新中,提供建议、纠错、催更等加作者微信: gairuo123(备注:pandas教程)和关注公众号「盖若」ID: gairuo。跟作者学习,请进入 Python学习课程。欢迎关注作者出版的书籍:《深入浅出Pandas》 和 《Python之光》。
如果原始数据集中的行列索引中均为层次索引,stack 过程表示将数据集的列旋转为行,同样 unstack 过程表示将数据的行旋转为列。
下面这堆叠的逻辑图示:
取消堆叠的示例:
这些方法本质上是:
stack 过程将数据集的列转行,unstack 过程为行转列。
上例中,原始数据集索引有两层,堆叠过程就是将最列转到最内测的行上,unstack 是将最内层的行转移到最内层的列索引中。
单层索引:
df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
index=['cat', 'dog'],
columns=['weight', 'height'])
df_single_level_cols
'''
weight height
cat 0 1
dog 2 3
'''
df_single_level_cols.stack()
'''
cat weight 0
height 1
dog weight 2
height 3
dtype: int64
'''
多层索引:
multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
('weight', 'pounds')])
df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
index=['cat', 'dog'],
columns=multicol1)
df_multi_level_cols1
'''
weight
kg pounds
cat 1 2
dog 2 4
'''
df_multi_level_cols1.stack()
'''
weight
cat kg 1
pounds 2
dog kg 2
pounds 4
'''
缺失值:
multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
('height', 'm')])
df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
index=['cat', 'dog'],
columns=multicol2)
df_multi_level_cols2
'''
weight height
kg m
cat 1.0 2.0
dog 3.0 4.0
'''
df_multi_level_cols2.stack()
'''
height weight
cat kg NaN 1.0
m 2.0 NaN
dog kg NaN 3.0
m 4.0 NaN
'''
指定索引层级:
df_multi_level_cols2.stack(0)
'''
kg m
cat height NaN 2.0
weight 1.0 NaN
dog height NaN 4.0
weight 3.0 NaN
'''
df_multi_level_cols2.stack([0, 1])
'''
cat height m 2.0
weight kg 1.0
dog height m 4.0
weight kg 3.0
dtype: float64
'''
删除缺失值:
df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]],
index=['cat', 'dog'],
columns=multicol2)
df_multi_level_cols3
'''
weight height
kg m
cat NaN 1.0
dog 2.0 3.0
'''
df_multi_level_cols3.stack(dropna=False)
'''
height weight
cat kg NaN NaN
m 1.0 NaN
dog kg NaN 2.0
m 3.0 NaN
'''
df_multi_level_cols3.stack(dropna=True)
'''
height weight
cat m 1.0 NaN
dog kg NaN 2.0
m 3.0 NaN
'''
index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
('two', 'a'), ('two', 'b')])
s = pd.Series(np.arange(1.0, 5.0), index=index)
s
'''
one a 1.0
b 2.0
two a 3.0
b 4.0
dtype: float64
'''
s.unstack(level=-1)
'''
a b
one 1.0 2.0
two 3.0 4.0
'''
s.unstack(level=0)
'''
one two
a 1.0 3.0
b 2.0 4.0
''
df = s.unstack(level=0)
df.unstack()
'''
one a 1.0
b 2.0
two a 3.0
b 4.0
dtype: float64
'''
更新时间:2024-08-11 10:56:31 标签:pandas 堆叠