Parameters | 說明 |
---|---|
axis | 0為行 1為列,default 0,數(shù)據(jù)刪除維度 |
how | {‘a(chǎn)ny', ‘a(chǎn)ll'}, default ‘a(chǎn)ny',any:刪除帶有nan的行;all:刪除全為nan的行 |
thresh | int,保留至少 int 個非nan行 |
subset | list,在特定列缺失值處理 |
inplace | bool,是否修改源文件 |
測試:
>>>df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'], "toy": [np.nan, 'Batmobile', 'Bullwhip'], "born": [pd.NaT, pd.Timestamp("1940-04-25"), pd.NaT]})
>>>df name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
刪除至少缺少一個元素的行:
>>>df.dropna() name toy born 1 Batman Batmobile 1940-04-25
刪除至少缺少一個元素的列:
>>>df.dropna(axis=1) name 0 Alfred 1 Batman 2 Catwoman
刪除所有元素丟失的行:
>>>df.dropna(how='all') name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
只保留至少2個非NA值的行:
>>>df.dropna(thresh=2) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
從特定列中查找缺少的值:
>>>df.dropna(subset=['name', 'born']) name toy born 1 Batman Batmobile 1940-04-25
修改原數(shù)據(jù):
>>>df.dropna(inplace=True) >>>df name toy born 1 Batman Batmobile 1940-04-25
以上。
補充:Pandas 之Dropna濾除缺失數(shù)據(jù)
約定:
import pandas as pd import numpy as np from numpy import nan as NaN
濾除缺失數(shù)據(jù)
pandas的設(shè)計目標之一就是使得處理缺失數(shù)據(jù)的任務(wù)更加輕松些。pandas使用NaN作為缺失數(shù)據(jù)的標記。
使用dropna使得濾除缺失數(shù)據(jù)更加得心應手。
通過**dropna()**濾除缺失數(shù)據(jù):
se1=pd.Series([4,NaN,8,NaN,5]) print(se1) se1.dropna()
代碼結(jié)果:
0 4.0 1 NaN 2 8.0 3 NaN 4 5.0 dtype: float64 0 4.0 2 8.0 4 5.0 dtype: float64
通過布爾序列也能濾除:
se1[se1.notnull()]
代碼結(jié)果:
0 4.0 2 8.0 4 5.0 dtype: float64
處理DataFrame對象比較復雜,因為你可能需要丟棄所有的NaN或部分NaN。
df1=pd.DataFrame([[1,2,3],[NaN,NaN,2],[NaN,NaN,NaN],[8,8,NaN]]) df1
代碼結(jié)果:
0 | 1 | 2 | |
---|---|---|---|
0 | 1.0 | 2.0 | 3.0 |
1 | NaN | NaN | 2.0 |
2 | NaN | NaN | NaN |
3 | 8.0 | 8.0 | NaN |
默認濾除所有包含NaN:
df1.dropna()
代碼結(jié)果:
0 | 1 | 2 | |
---|---|---|---|
0 | 1.0 | 2.0 | 3.0 |
傳入**how=‘a(chǎn)ll'**濾除全為NaN的行:
df1.dropna(how='all')
代碼結(jié)果:
0 | 1 | 2 | |
---|---|---|---|
0 | 1.0 | 2.0 | 3.0 |
1 | NaN | NaN | 2.0 |
3 | 8.0 | 8.0 | NaN |
傳入axis=1濾除列:
df1[3]=NaN df1
代碼結(jié)果:
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 1.0 | 2.0 | 3.0 | NaN |
1 | NaN | NaN | 2.0 | NaN |
2 | NaN | NaN | NaN | NaN |
3 | 8.0 | 8.0 | NaN | NaN |
df1.dropna(axis=1,how="all")
代碼結(jié)果:
0 | 1 | 2 | |
---|---|---|---|
0 | 1.0 | 2.0 | 3.0 |
1 | NaN | NaN | 2.0 |
2 | NaN | NaN | NaN |
3 | 8.0 | 8.0 | NaN |
傳入thresh=n保留至少有n個非NaN數(shù)據(jù)的行:
df1.dropna(thresh=1)
代碼結(jié)果:
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 1.0 | 2.0 | 3.0 | NaN |
1 | NaN | NaN | 2.0 | NaN |
3 | 8.0 | 8.0 | NaN | NaN |
df1.dropna(thresh=3)
代碼結(jié)果:
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 1.0 | 2.0 | 3.0 | NaN |
以上為個人經(jīng)驗,希望能給大家一個參考,也希望大家多多支持腳本之家。如有錯誤或未考慮完全的地方,望不吝賜教。