pandas itertuples() 以命名元组逐行迭代

看过来

《pandas 教程》持续更新中，提供建议、纠错、催更等加作者微信: gr99123（备注：pandas教程）和关注公众号「盖若」ID: gairuo。跟作者学习，请进入 Python学习课程。欢迎关注作者出版的书籍：《深入浅出Pandas》和《Python之光》。

itertuples() 方法用于逐行迭代 DataFrame，每次迭代返回一个命名元组（namedtuple），其中包含行的数据。与 iterrows() 不同的是，itertuples() 更快且更高效，特别适合处理大规模数据。

语法

DataFrame.itertuples(index=True, name='Pandas')

参数

index（可选）：布尔值，是否包含行的索引。默认值为 True。
name（可选）：字符串，用于命名返回的命名元组。默认值为 'Pandas'。如果设为 None，则返回普通元组。

返回值

返回一个生成器，每次迭代产生一个命名元组或普通元组，取决于 name 参数的值。

使用场景

itertuples() 常用于需要逐行高效处理 DataFrame 的情况，比如数据清洗、特定计算等。由于其性能优于 iterrows()，在处理大规模数据时更为推荐。

示例

示例 1：基本用法

构造一个简单的 DataFrame，并逐行打印每行数据。

import pandas as pd

# 构造示例数据
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

print("DataFrame:")
print(df)

# 使用 itertuples() 逐行迭代
for row in df.itertuples():
    print(f"Row: {row}")

输出结果：

DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago

Row: Pandas(Index=0, Name='Alice', Age=25, City='New York')
Row: Pandas(Index=1, Name='Bob', Age=30, City='Los Angeles')
Row: Pandas(Index=2, Name='Charlie', Age=35, City='Chicago')

示例 2：不包含索引

在迭代过程中不包含行的索引。

import pandas as pd

# 构造示例数据
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# 使用 itertuples() 迭代，不包含索引
for row in df.itertuples(index=False):
    print(f"Row: {row}")

输出结果：

Row: Pandas(Name='Alice', Age=25, City='New York')
Row: Pandas(Name='Bob', Age=30, City='Los Angeles')
Row: Pandas(Name='Charlie', Age=35, City='Chicago')

示例 3：使用普通元组

在迭代过程中返回普通元组而不是命名元组。

import pandas as pd

# 构造示例数据
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# 使用 itertuples() 返回普通元组
for row in df.itertuples(name=None):
    print(f"Row: {row}")

输出结果：

Row: (0, 'Alice', 25, 'New York')
Row: (1, 'Bob', 30, 'Los Angeles')
Row: (2, 'Charlie', 35, 'Chicago')

示例 4：数据处理

在迭代过程中对数据进行处理，比如将所有名字改为大写。

import pandas as pd

# 构造示例数据
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# 使用 itertuples() 修改数据
for row in df.itertuples():
    df.at[row.Index, 'Name'] = row.Name.upper()

print("Modified DataFrame:")
print(df)

输出结果：

Modified DataFrame:
      Name  Age         City
0    ALICE   25     New York
1      BOB   30  Los Angeles
2  CHARLIE   35      Chicago

示例 5：计算新列

在迭代过程中计算一个新列，例如根据年龄计算出生年份。

import pandas as pd

# 构造示例数据
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# 使用 itertuples() 计算新列
current_year = 2024
df['Birth Year'] = None
for row in df.itertuples():
    df.at[row.Index, 'Birth Year'] = current_year - row.Age

print("DataFrame with Birth Year:")
print(df)

输出结果：

DataFrame with Birth Year:
      Name  Age         City Birth Year
0    Alice   25     New York       1999
1      Bob   30  Los Angeles       1994
2  Charlie   35      Chicago       1989

注意事项

itertuples() 的性能优于 iterrows()，特别是在处理大规模数据时。
返回的命名元组使得访问列更方便，但不能对元组进行修改。

通过这些示例，逐步展示了 itertuples() 的基本用法和一些实际应用场景，从而更好地理解和掌握这个方法。

< iterrows() 逐行迭代 pandas 数据迭代遍历

更新时间：2024-07-19 21:32:29 标签：pandas python itertuples 迭代