跳转至

按列值分割熊猫数据框

原文:https://www.geesforgeks.org/split-pandas-data frame-by-column-value/

有时,为了更准确地分析数据帧,我们需要将其分成 2 个或更多部分。熊猫提供了根据列索引、行索引和列值等拆分数据帧的功能。

让我们看看如何在 Python 中按列值拆分熊猫数据帧

现在,让我们创建一个数据帧:

蟒蛇 3

# importing pandas library
import pandas as pd

# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]

# creating a pandas dataframe
df = pd.DataFrame(player_list, 
                  columns = ['Name', 'Age', 
                             'Weight', 'Salary'])

# show the dataframe
df

输出:

dataframe

方法一:采用 布尔掩蔽 的方法。

此方法仅用于打印数据帧中我们传递布尔值“真”的部分。

例 1:

蟒蛇 3

# importing pandas library
import pandas as pd

# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]

# creating a pandas dataframe
df = pd.DataFrame(player_list, 
                  columns = ['Name', 'Age', 
                             'Weight', 'Salary'])

# splitting the dataframe into 2 parts
# on basis of 'Age' column values
# using Relational operator
df1 = df[df['Age'] >= 37]

# printing df1
df1

输出:

蟒蛇 3

df2 = df[df['Age'] < 37]

# printing df2
df2

输出:

在上面的例子中,基于列“ Age 的值,数据帧“df”被分成 2 个部分“df1”和“df2”。

例 2:

蟒蛇 3

# importing pandas library
import pandas as pd

# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]

# creating a pandas dataframe
df = pd.DataFrame(player_list, 
                  columns = ['Name', 'Age', 
                             'Weight', 'Salary'])

# splitting the dataframe into 2 parts
# on basis of 'Weight' column values
mask = df['Weight'] >= 80

df1 = df[mask]

# invert the boolean values
df2 = df[~mask]

# printing df1
df1

输出:

蟒蛇 3

# printing df2
df2

输出:

在上面的例子中,基于列“权重的值,数据帧“df”被分成 2 个部分“df1”和“df2”。

方法二:使用data frame.group by()

此方法用于根据某些标准将数据分成组。

示例:

蟒蛇 3

# importing pandas library
import pandas as pd

# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]

# creating a pandas dataframe
df = pd.DataFrame(player_list, 
                  columns = ['Name', 'Age', 
                             'Weight', 'Salary'])

# splitting the dataframe into 2 parts
# on basis of 'Salary' column values
# using dataframe.groupby() function
df1, df2 = [x for _, x in df.groupby(df['Salary'] < 4528000)]

# printing df1
df1

输出:

蟒蛇 3

# printing df2
df2

输出:

在上面的例子中,数据帧“df”根据列“工资的值被分成两部分“df1”和“df2”。



回到顶部