NaN in Pandas

In this class, We discuss NaN in Pandas.

For Complete YouTube Video: Click Here

The reader should have prior knowledge of data frame creation using csv file. Click here.

NaN in Pandas

In pandas, nan is used to filling missing data in the data frame.

Take an example and understand better about nan in pandas.

import pandas as pd
df = pd.read_csv('Test.csv')
print(df)

Output:
   Sno     name gender   amount
0    1   rajesh      m  2000.00
1    2   suresh      m  3000.00
2    3   mahesh    NaN  4000.78
3    5  mahitha      f  8000.00
4    6   mohith      m      NaN
5    7      NaN      m    28.98
6    8  moulesh      m      NaN
7    9   murali      m     2.89

print(df.dtypes)

Output:
Sno         int64
name       object
gender     object
amount    float64
dtype: object

The missing data in the data frame placed with the value nan.

We can observe the nan values from the output.

When we display the data types of the columns of our data frame, we have int, float, and object.

The point to understand. Nan is not affecting the type of the column.

nan does not belong to any data type of pandas.

Important:

Python considers two nan values as different.

The code to show two nan values are different is given below.

# Important to understand python consider two nan values as different
if(df.loc[4,'amount']==df.loc[6,'amount']):
    print("equal")
else:
    print("not equal")

Output:
not equal

In the code, we are comparing two nan values from the column amount.

The code shows the two nan values are not equal.

Reason: even nan shows empty space. Two nan values belong to two different empty spaces.

So two nan values are considered different.

isna Method

# isna method in data frame
print(df.isna())

Output:
     Sno   name  gender  amount
0  False  False   False   False
1  False  False   False   False
2  False  False    True   False
3  False  False   False   False
4  False  False   False    True
5  False   True   False   False
6  False  False   False    True
7  False  False   False   False

isna method will return true if there is a nan value in the data frame.

The above code displays true value if there is a nan value in the data frame.

Removing Rows with NaN Values

The code to remove rows that contain nan values is given below.

# removing rows with nan values
rowsnan = []
for index, row in df.iterrows():
    isnanpresent = row.isna()
    if isnanpresent.any():
        rowsnan.append(index)

print(rowsnan)

Output:
[2, 4, 5, 6]

In the code, we use the isna method to identify nan values and any method to check the rows containing true values.

In our previous class, we discussed any method. Click here.

We are identifying the index of rows that contain nan values. we can delete them using the method drop.

fillna Method

To replace the element nan with any other element, we use the method fillna.

In the example given below. We are using zero to replace the nan value.

# fillna method 
print(df.fillna(0))

Output:
   Sno     name gender   amount
0    1   rajesh      m  2000.00
1    2   suresh      m  3000.00
2    3   mahesh      0  4000.78
3    5  mahitha      f  8000.00
4    6   mohith      m     0.00
5    7        0      m    28.98
6    8  moulesh      m     0.00
7    9   murali      m     2.89

dropna Method

To delete the rows or columns that contain nan values, we use the method dropna.

axis=0 parameter in dropna method will delete rows that contain nan value.

axis=1 will delete columns that have nan values.

The below-given code is the example to show the method dropna.

# dropna method
print(df.dropna(axis=0))

Output:
   Sno     name gender   amount
0    1   rajesh      m  2000.00
1    2   suresh      m  3000.00
3    5  mahitha      f  8000.00
7    9   murali      m     2.89

# dropna method
print(df.dropna(axis=1))

Output:
   Sno
0    1
1    2
2    3
3    5
4    6
5    7
6    8
7    9

# dropna method
print(df.dropna(axis=0),how=any or all)

The parameter how used to delete based on options all or any.

all will delete rows or columns contain all nan values.

any will delete rows or columns contain any one nan value.