Pandas Data Types

In this class, We discuss Pandas Data Types.

For Complete YouTube Video: Click Here

The reader should have prior knowledge of data frame attributes. Click here.

In our last class dtypes attribute is used to display the column types of the data frame.

Data Types in Pandas

Take an example and understand the data types present in pandas.

The example code is given below.

import pandas as pd
df = pd.read_csv('Test.csv',names=['x','y','z','l','m'],header=0)
print(df)

Output:
   x        y  z          l           m
0  1   rajesh  m   Rs2000.0  2012-05-12
1  2   suresh  m     Rs3000  2013-05-12
2  3   mahesh  m  Rs4000.78  2014-06-18
3  5  mahitha  f     Rs8000  2018-07-19
4  6   mohith  m  Rs6000.66  2019-09-15
5  7  mallesh  m    Rs28.98  2020-04-04
6  8  moulesh  m     Rs1.22  2011-06-19
7  9   murali  m     Rs2.89  2016-05-05


print(df.dtypes)

Output:
x     int64
y    object
z    object
l    object
m    object
dtype: object

In our example, we have five columns. The first one is considered as int64, and the remaining are taken as an object.

First, we understand the data-type object.

object

The type object is similar to string in python.

Whenever pandas are taken strings, it will show the type object.

In the below example, we show the object type is working similar to strings in python.

# string concatenation ie object here
print(df.loc[0,'l']+df.loc[1,'l'])
print("------------------")
print(df.loc[0,'x']+df.loc[1,'x'])

Output:
Rs2000.0Rs3000
------------------
3

In our example, we considered the elements from column l, and we applied the plus operators.

The plus operator works as string concatenation because column l is of type object.

int64

In the above example, We considered the elements from column x. column x is of type integer.

When we apply the plus operator, the operator works as an addition because the elements are of type integer.

float64

The same way float64 type is similar to float in python. Here 64 shows the number of bits used.

In our example, shown above, we do not have any column of type float.

We will show the data type float in the later examples.

bool

The data type bool is similar to boolean in python.

The bool data type consists of True or False.

datetime64

In our example shown above, the column m showing the date.

The panda’s data frame considered it as an object type.

To convert the column m from object type to datetime64 type. We used the method astype.

The astype method is used to convert our data to the type required.

The example is shown below.

# converting date column from object type to datetime64
df['m']=df['m'].astype('datetime64')
print(df.dtypes)

Output:
x             int64
y            object
z            object
l            object
m    datetime64[ns]
dtype: object

The line of code df[‘m’]=df[‘m’].astype(‘datetime64’) converting the column m to datetime64.

After conversion, the data is stored back to the data frame in the same column.

In the method astype we mention the type to which data should be converted.

Use of data type conversion.

Once we converted the column type from object to datetime64. We can use the methods present in the datetime64 object.

Few examples are shown below.

To take only years from the data, we use the method dt.year.head().

#use of convertion
print(df['m'].dt.year.head())

Output:
0    2012
1    2013
2    2014
3    2018
4    2019
Name: m, dtype: int64

To take only months from the data, we use the method dt.month.head().

print(df['m'].dt.month.head())

Output:
0    5
1    5
2    6
3    7
4    9
Name: m, dtype: int64

Similarly, we can select days and weekdays.

print(df['m'].dt.day.head())

Output:
0    12
1    12
2    18
3    19
4    15
Name: m, dtype: int64

print(df['m'].dt.hour.head())
0    0
1    0
2    0
3    0
4    0
Name: m, dtype: int64

print(df['m'].dt.weekday_name.head())

Output:
0     Saturday
1       Sunday
2    Wednesday
3     Thursday
4       Sunday
Name: m, dtype: object

Suppose if we are working on sales data. We can arrange sales according to weekdays, weekends, year, month, etc.

In the above example hours are not mentioned in our data. so we got the output 0.

We can mention the time after the date in our data. Example 2012-05-12 hh:mm:ss.

The same way column l is converted from type object to float by removing Rs from the data.

The code is given below.

def func(val):
    newdata=val.replace('Rs','')
    return float(newdata)
df['l']=df['l'].apply(func)

print(df)
print(df.dtypes)

Output:
   x        y  z        l          m
0  1   rajesh  m  2000.00 2012-05-12
1  2   suresh  m  3000.00 2013-05-12
2  3   mahesh  m  4000.78 2014-06-18
3  5  mahitha  f  8000.00 2018-07-19
4  6   mohith  m  6000.66 2019-09-15
5  7  mallesh  m    28.98 2020-04-04
6  8  moulesh  m     1.22 2011-06-19
7  9   murali  m     2.89 2016-05-05
x             int64
y            object
z            object
l           float64
m    datetime64[ns]
dtype: object

To convert the data from the object to float. We used a method apply.

The apply method will take a function name as input.

The apply method will take the data and send it to the method given in its parameter, and the returned value is considered.

In our example, we have written a function func. The func will replace Rs to empty string.

After eliminating Rs from the data, then data frame considered as float data type.