Adding and Removing Columns in Data Frame
In this class, We discuss Adding and Removing Columns in Data Frame.
For Complete YouTube Video: Click Here
The reader should have prior knowledge of the data frame and how to add and delete rows in the data frame. Click here.
Adding and Removing Columns
We take an example and understand adding and removing columns in a data frame.
The example data frame is given in the below code.
import pandas as pd
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20]}
df=pd.DataFrame(data=students)
print(df)
Output:
name age marks
0 rajesh 25 85
1 suresh 35 45
2 mahesh 40 65
3 j 1 20
insert Method to Add Column to Data Frame
DataFrame.insert(loc,column, value, allow_duplicates=False)
insert method has four parameters.
loc: The parameter loc is used to mention the location of the Column to be added.
Column: the parameter column used to mention the name of the Column to be added.
value: the value parameter will take a list of elements added to the Column.
allow_duplicates: The parameter allow_duplicates, if true, will allow duplicate column names.
The example program to add a new Column using the insert method is shown in the program below.
# To add column to the data frame we use insert method
# DataFrame.insert(loc,column, value, allow_duplicates=False)
df.insert(1,"qualification",["B-Tech","B.sc","B-Tech","MCA"])
print(df)
Output:
name qualification age marks
0 rajesh B-Tech 25 85
1 suresh B.sc 35 45
2 mahesh B-Tech 40 65
3 j MCA 1 20
Default: the parameter allow_duplicates is assigned false.
If we try to add the same Column, we get an error. The example is shown below.
# again trying to insert qualification
df.insert(1,"qualification",["B-Tech","B.sc","B-Tech","MCA"])
print(df)
Output:
ValueError: cannot insert qualification, already exists
The example to add duplicate column names is given below.
# allow duplicate columns
df.insert(1,"qualification",["B-Tech","B.sc","B-Tech","MCA"],allow_duplicates=True)
print(df)
Other Way to Add Column
# 2nd way to add a column to data Frame
import pandas as pd
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20]}
df=pd.DataFrame(data=students)
print(df)
df["qualification"]=["B-Tech","B.sc","B-Tech","MCA"]
print(df)
Output:
name age marks
0 rajesh 25 85
1 suresh 35 45
2 mahesh 40 65
3 j 1 20
name age marks qualification
0 rajesh 25 85 B-Tech
1 suresh 35 45 B.sc
2 mahesh 40 65 B-Tech
3 j 1 20 MCA
In the above program, we use df[“qualification”]=[“B-Tech”,” B.sc”,” B-Tech”,” MCA”] to add a new column to the data frame.
assign Method to Add Multiple Columns
The example program is given below.
# adding multiple new columns using assign method
import pandas as pd
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20]}
df=pd.DataFrame(data=students)
print(df)
list1=["B-Tech","B.sc","B-Tech","MCA"]
list2=["F","M","F","M"]
df=df.assign(qualification=list1,Gender=list2)#assign method return new object
print(df)
Output:
name age marks
0 rajesh 25 85
1 suresh 35 45
2 mahesh 40 65
3 j 1 20
name age marks qualification Gender
0 rajesh 25 85 B-Tech F
1 suresh 35 45 B.sc M
2 mahesh 40 65 B-Tech F
3 j 1 20 MCA M
The assign method will take the new column names, and a list is assigned to the Column.
The list assigned should contain the elements of the Column.
In the above example, we add two columns qualification and gender.
drop Method to remove Column from the Data Frame
The discussion about the drop method is made in our previous class. Click here.
The drop method is used to remove columns and rows.
To remove columns, we use axis =1.
If axis =1, the parameter label is considered as a column.
The example program to remove columns is shown below.
#removing a column from the data frame
import pandas as pd
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20],'qualification':["B-Tech","B.sc","B-Tech","MCA"]}
df=pd.DataFrame(data=students)
print(df)
#drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
df.drop(labels="marks",axis=1,inplace=True)
print(df)
Output:
name age marks qualification
0 rajesh 25 85 B-Tech
1 suresh 35 45 B.sc
2 mahesh 40 65 B-Tech
3 j 1 20 MCA
name age qualification
0 rajesh 25 B-Tech
1 suresh 35 B.sc
2 mahesh 40 B-Tech
3 j 1 MCA
removing Multiple columns
If the label parameter is assigned the list of column names. The drop method will remove all the columns.
The example program is shown below.
# removing multiple columns using names
import pandas as pd
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20],'qualification':["B-Tech","B.sc","B-Tech","MCA"]}
df=pd.DataFrame(data=students)
print(df)
df.drop(labels=["marks","qualification"],axis=1,inplace=True)
print(df)
Output:
name age marks qualification
0 rajesh 25 85 B-Tech
1 suresh 35 45 B.sc
2 mahesh 40 65 B-Tech
3 j 1 20 MCA
name age
0 rajesh 25
1 suresh 35
2 mahesh 40
3 j 1
Delete Column using Column Index
The columns attribute in the data frame will give the names of the columns to the respective indexes.
df.columns[[2,3]]. will give the list of column names for indexes 2 and 3.
Using columns attribute to get the column names and place them in label parameter in drop method.
The example is shown below.
# removing columns using index
import pandas as pd
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20],'qualification':["B-Tech","B.sc","B-Tech","MCA"]}
df=pd.DataFrame(data=students)
print(df)
print()
print(df.columns[[2,3]])
print()
df.drop(df.columns[[2,3]],axis=1,inplace=True)
print(df)
Output:
name age marks qualification
0 rajesh 25 85 B-Tech
1 suresh 35 45 B.sc
2 mahesh 40 65 B-Tech
3 j 1 20 MCA
Index(['marks', 'qualification'], dtype='object')
name age
0 rajesh 25
1 suresh 35
2 mahesh 40
3 j 1
Deleting Columns using parameter columns.
#remove columns using parameter columns
import pandas as pd
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20],'qualification':["B-Tech","B.sc","B-Tech","MCA"]}
df=pd.DataFrame(data=students)
print(df)
df.drop(columns=["marks","qualification"],inplace=True)
print(df)
Output:
name age marks qualification
0 rajesh 25 85 B-Tech
1 suresh 35 45 B.sc
2 mahesh 40 65 B-Tech
3 j 1 20 MCA
name age
0 rajesh 25
1 suresh 35
2 mahesh 40
3 j 1
Instead of using axis = 1 and label. We can directly use the columns attribute to list column names that are to be dropped.