NaN in Pandas
In this class, We discuss NaN in Pandas.
For Complete YouTube Video: Click Here
The reader should have prior knowledge of data frame creation using csv file. Click here.
NaN in Pandas
In pandas, nan is used to filling missing data in the data frame.
Take an example and understand better about nan in pandas.
import pandas as pd
df = pd.read_csv('Test.csv')
print(df)
Output:
Sno name gender amount
0 1 rajesh m 2000.00
1 2 suresh m 3000.00
2 3 mahesh NaN 4000.78
3 5 mahitha f 8000.00
4 6 mohith m NaN
5 7 NaN m 28.98
6 8 moulesh m NaN
7 9 murali m 2.89
print(df.dtypes)
Output:
Sno int64
name object
gender object
amount float64
dtype: object
The missing data in the data frame placed with the value nan.
We can observe the nan values from the output.
When we display the data types of the columns of our data frame, we have int, float, and object.
The point to understand. Nan is not affecting the type of the column.
nan does not belong to any data type of pandas.
Important:
Python considers two nan values as different.
The code to show two nan values are different is given below.
# Important to understand python consider two nan values as different
if(df.loc[4,'amount']==df.loc[6,'amount']):
print("equal")
else:
print("not equal")
Output:
not equal
In the code, we are comparing two nan values from the column amount.
The code shows the two nan values are not equal.
Reason: even nan shows empty space. Two nan values belong to two different empty spaces.
So two nan values are considered different.
isna Method
# isna method in data frame
print(df.isna())
Output:
Sno name gender amount
0 False False False False
1 False False False False
2 False False True False
3 False False False False
4 False False False True
5 False True False False
6 False False False True
7 False False False False
isna method will return true if there is a nan value in the data frame.
The above code displays true value if there is a nan value in the data frame.
Removing Rows with NaN Values
The code to remove rows that contain nan values is given below.
# removing rows with nan values
rowsnan = []
for index, row in df.iterrows():
isnanpresent = row.isna()
if isnanpresent.any():
rowsnan.append(index)
print(rowsnan)
Output:
[2, 4, 5, 6]
In the code, we use the isna method to identify nan values and any method to check the rows containing true values.
In our previous class, we discussed any method. Click here.
We are identifying the index of rows that contain nan values. we can delete them using the method drop.
fillna Method
To replace the element nan with any other element, we use the method fillna.
In the example given below. We are using zero to replace the nan value.
# fillna method
print(df.fillna(0))
Output:
Sno name gender amount
0 1 rajesh m 2000.00
1 2 suresh m 3000.00
2 3 mahesh 0 4000.78
3 5 mahitha f 8000.00
4 6 mohith m 0.00
5 7 0 m 28.98
6 8 moulesh m 0.00
7 9 murali m 2.89
dropna Method
To delete the rows or columns that contain nan values, we use the method dropna.
axis=0 parameter in dropna method will delete rows that contain nan value.
axis=1 will delete columns that have nan values.
The below-given code is the example to show the method dropna.
# dropna method
print(df.dropna(axis=0))
Output:
Sno name gender amount
0 1 rajesh m 2000.00
1 2 suresh m 3000.00
3 5 mahitha f 8000.00
7 9 murali m 2.89
# dropna method
print(df.dropna(axis=1))
Output:
Sno
0 1
1 2
2 3
3 5
4 6
5 7
6 8
7 9
# dropna method
print(df.dropna(axis=0),how=any or all)
The parameter how used to delete based on options all or any.
all will delete rows or columns contain all nan values.
any will delete rows or columns contain any one nan value.