




  • 无:无是一个 Python 单例对象,通常用于 Python 代码中缺失的数据。
  • NaN:NaN(Not a Number 的缩写),是一个特殊的浮点值,被所有使用标准 IEEE 浮点表示的系统所识别


在本文中我们使用的是 CSV 文件,要下载使用的 CSV 文件,点击这里




为了检查熊猫数据框中的空值,我们使用isnull()函数,该函数返回布尔值的数据框,对于 NaN 值为真。


# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from list
df = pd.DataFrame(dict)

# using isnull() function  



# importing pandas package 
import pandas as pd 

# making data frame from csv file 
data = pd.read_csv("employees.csv") 

# creating bool series True for NaN values 
bool_series = pd.isnull(data["Gender"]) 

# filtering data 
# displaying data only with Gender = NaN 

输出: 如输出图像所示,仅显示具有Gender = NULL的行。


为了检查熊猫数据框中的空值,我们使用 notnull()函数,该函数返回布尔值的数据框,对于 NaN 值,该值为 False。


# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe using dictionary
df = pd.DataFrame(dict)

# using notnull() function 



# importing pandas package 
import pandas as pd 

# making data frame from csv file 
data = pd.read_csv("employees.csv") 

# creating bool series True for NaN values 
bool_series = pd.notnull(data["Gender"]) 

# filtering data 
# displayind data only with Gender = Not NaN 

输出: 如输出图像所示,仅显示具有Gender = NOT NULL的行。


为了填充数据集中的空值,我们使用fillna()replace()interpolate()函数,这些函数用它们自己的值替换 NaN 值。所有这些函数都有助于在数据框的数据集中填充空值。Interpolate()函数主要用于填充数据帧中的NA值,但它使用各种插值技术来填充缺失的值,而不是对值进行硬编码。

代码#1: 用单个值填充空值

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from dictionary
df = pd.DataFrame(dict)

# filling missing value using fillna()  


代码#2: 用之前的值填充空值

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from dictionary
df = pd.DataFrame(dict)

# filling a missing value with
# previous ones  
df.fillna(method ='pad')


代码#3: 用下一个填充空值

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from dictionary
df = pd.DataFrame(dict)

# filling  null value using fillna() function  
df.fillna(method ='bfill')


代码#4: 在 CSV 文件中填充空值

# importing pandas package 
import pandas as pd 

# making data frame from csv file 
data = pd.read_csv("employees.csv")

# Printing the first 10 to 24 rows of
# the data frame for visualization   


# importing pandas package 
import pandas as pd 

# making data frame from csv file 
data = pd.read_csv("employees.csv") 

# filling a null values using fillna() 
data["Gender"].fillna("No Gender", inplace = True) 


输出: 代码#5: 使用 replace()方法填充空值

# importing pandas package 
import pandas as pd 

# making data frame from csv file 
data = pd.read_csv("employees.csv")

# Printing the first 10 to 24 rows of
# the data frame for visualization   

输出: 现在我们要用-99 值替换数据框中的所有 Nan 值。

# importing pandas package 
import pandas as pd 

# making data frame from csv file 
data = pd.read_csv("employees.csv") 

# will replace  Nan value in dataframe with value -99  
data.replace(to_replace = np.nan, value = -99) 


代码#6: 使用 interpolate()函数使用线性方法填充缺失值。

# importing pandas as pd 
import pandas as pd 

# Creating the dataframe  
df = pd.DataFrame({"A":[12, 4, 5, None, 1], 
                   "B":[None, 2, 54, 3, None], 
                   "C":[20, 16, None, 3, 8], 
                   "D":[14, 3, None, None, 6]}) 

# Print the dataframe 


# to interpolate the missing values 
df.interpolate(method ='linear', limit_direction ='forward')

输出: 正如我们看到的输出,第一行中的值不能被填充,因为值的填充方向是向前的,并且没有可以在插值中使用的先前值。



代码#1: 删除至少有 1 个空值的行。

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}

# creating a dataframe from dictionary
df = pd.DataFrame(dict)


现在我们删除至少有一个 Nan 值(空值)的行

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}

# creating a dataframe from dictionary
df = pd.DataFrame(dict)

# using dropna() function  

输出: 代码#2: 如果该行中的所有值都丢失,则丢弃该行。

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}

# creating a dataframe from dictionary
df = pd.DataFrame(dict)



# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}

df = pd.DataFrame(dict)

# using dropna() function    
df.dropna(how = 'all')


代码#3: 删除至少有 1 个空值的列。

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[60, 67, 68, 65]}

# creating a dataframe from dictionary 
df = pd.DataFrame(dict)


现在我们删除至少有 1 个缺失值的列

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[60, 67, 68, 65]}

# creating a dataframe from dictionary  
df = pd.DataFrame(dict)

# using dropna() function     
df.dropna(axis = 1)


代码#4: 删除 CSV 文件中至少有 1 个空值的行

# importing pandas module 
import pandas as pd 

# making data frame from csv file 
data = pd.read_csv("employees.csv") 

# making new data frame with dropped NA values 
new_data = data.dropna(axis = 0, how ='any') 


输出: 现在我们比较数据帧的大小,这样我们就可以知道有多少行至少有 1 个空值

print("Old data frame length:", len(data))
print("New data frame length:", len(new_data)) 
print("Number of rows with at least 1 NA value: ", (len(data)-len(new_data)))


Old data frame length: 1000
New data frame length: 764
Number of rows with at least 1 NA value:  236

由于差异是 236,所以有 236 行在任何列中都至少有 1 个空值。
