Pandas Data Types
In this class, We discuss Pandas Data Types.
For Complete YouTube Video: Click Here
The reader should have prior knowledge of data frame attributes. Click here.
In our last class dtypes attribute is used to display the column types of the data frame.
Data Types in Pandas
Take an example and understand the data types present in pandas.
The example code is given below.
import pandas as pd
df = pd.read_csv('Test.csv',names=['x','y','z','l','m'],header=0)
print(df)
Output:
x y z l m
0 1 rajesh m Rs2000.0 2012-05-12
1 2 suresh m Rs3000 2013-05-12
2 3 mahesh m Rs4000.78 2014-06-18
3 5 mahitha f Rs8000 2018-07-19
4 6 mohith m Rs6000.66 2019-09-15
5 7 mallesh m Rs28.98 2020-04-04
6 8 moulesh m Rs1.22 2011-06-19
7 9 murali m Rs2.89 2016-05-05
print(df.dtypes)
Output:
x int64
y object
z object
l object
m object
dtype: object
In our example, we have five columns. The first one is considered as int64, and the remaining are taken as an object.
First, we understand the data-type object.
object
The type object is similar to string in python.
Whenever pandas are taken strings, it will show the type object.
In the below example, we show the object type is working similar to strings in python.
# string concatenation ie object here
print(df.loc[0,'l']+df.loc[1,'l'])
print("------------------")
print(df.loc[0,'x']+df.loc[1,'x'])
Output:
Rs2000.0Rs3000
------------------
3
In our example, we considered the elements from column l, and we applied the plus operators.
The plus operator works as string concatenation because column l is of type object.
int64
In the above example, We considered the elements from column x. column x is of type integer.
When we apply the plus operator, the operator works as an addition because the elements are of type integer.
float64
The same way float64 type is similar to float in python. Here 64 shows the number of bits used.
In our example, shown above, we do not have any column of type float.
We will show the data type float in the later examples.
bool
The data type bool is similar to boolean in python.
The bool data type consists of True or False.
datetime64
In our example shown above, the column m showing the date.
The panda’s data frame considered it as an object type.
To convert the column m from object type to datetime64 type. We used the method astype.
The astype method is used to convert our data to the type required.
The example is shown below.
# converting date column from object type to datetime64
df['m']=df['m'].astype('datetime64')
print(df.dtypes)
Output:
x int64
y object
z object
l object
m datetime64[ns]
dtype: object
The line of code df[‘m’]=df[‘m’].astype(‘datetime64’) converting the column m to datetime64.
After conversion, the data is stored back to the data frame in the same column.
In the method astype we mention the type to which data should be converted.
Use of data type conversion.
Once we converted the column type from object to datetime64. We can use the methods present in the datetime64 object.
Few examples are shown below.
To take only years from the data, we use the method dt.year.head().
#use of convertion
print(df['m'].dt.year.head())
Output:
0 2012
1 2013
2 2014
3 2018
4 2019
Name: m, dtype: int64
To take only months from the data, we use the method dt.month.head().
print(df['m'].dt.month.head())
Output:
0 5
1 5
2 6
3 7
4 9
Name: m, dtype: int64
Similarly, we can select days and weekdays.
print(df['m'].dt.day.head())
Output:
0 12
1 12
2 18
3 19
4 15
Name: m, dtype: int64
print(df['m'].dt.hour.head())
0 0
1 0
2 0
3 0
4 0
Name: m, dtype: int64
print(df['m'].dt.weekday_name.head())
Output:
0 Saturday
1 Sunday
2 Wednesday
3 Thursday
4 Sunday
Name: m, dtype: object
Suppose if we are working on sales data. We can arrange sales according to weekdays, weekends, year, month, etc.
In the above example hours are not mentioned in our data. so we got the output 0.
We can mention the time after the date in our data. Example 2012-05-12 hh:mm:ss.
The same way column l is converted from type object to float by removing Rs from the data.
The code is given below.
def func(val):
newdata=val.replace('Rs','')
return float(newdata)
df['l']=df['l'].apply(func)
print(df)
print(df.dtypes)
Output:
x y z l m
0 1 rajesh m 2000.00 2012-05-12
1 2 suresh m 3000.00 2013-05-12
2 3 mahesh m 4000.78 2014-06-18
3 5 mahitha f 8000.00 2018-07-19
4 6 mohith m 6000.66 2019-09-15
5 7 mallesh m 28.98 2020-04-04
6 8 moulesh m 1.22 2011-06-19
7 9 murali m 2.89 2016-05-05
x int64
y object
z object
l float64
m datetime64[ns]
dtype: object
To convert the data from the object to float. We used a method apply.
The apply method will take a function name as input.
The apply method will take the data and send it to the method given in its parameter, and the returned value is considered.
In our example, we have written a function func. The func will replace Rs to empty string.
After eliminating Rs from the data, then data frame considered as float data type.