Python for Data Science Adding and Removing Columns in Data Frame

Adding and Removing Columns in Data Frame

In this class, We discuss Adding and Removing Columns in Data Frame.

For Complete YouTube Video: Click Here

The reader should have prior knowledge of the data frame and how to add and delete rows in the data frame. Click here.

Adding and Removing Columns

We take an example and understand adding and removing columns in a data frame.

The example data frame is given in the below code.

import pandas as pd 
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20]}
df=pd.DataFrame(data=students)
print(df)

Output:
     name  age  marks
0  rajesh   25     85
1  suresh   35     45
2  mahesh   40     65
3       j    1     20

insert Method to Add Column to Data Frame

DataFrame.insert(loc,column, value, allow_duplicates=False)

insert method has four parameters.

loc: The parameter loc is used to mention the location of the Column to be added.

Column: the parameter column used to mention the name of the Column to be added.

value: the value parameter will take a list of elements added to the Column.

allow_duplicates: The parameter allow_duplicates, if true, will allow duplicate column names.

The example program to add a new Column using the insert method is shown in the program below.

# To add column to the data frame we use insert method
# DataFrame.insert(loc,column, value, allow_duplicates=False)
df.insert(1,"qualification",["B-Tech","B.sc","B-Tech","MCA"])
print(df)

Output:
     name qualification  age  marks
0  rajesh        B-Tech   25     85
1  suresh          B.sc   35     45
2  mahesh        B-Tech   40     65
3       j           MCA    1     20

Default: the parameter allow_duplicates is assigned false.

If we try to add the same Column, we get an error. The example is shown below.

# again trying to insert qualification
df.insert(1,"qualification",["B-Tech","B.sc","B-Tech","MCA"])
print(df)

Output:
ValueError: cannot insert qualification, already exists

The example to add duplicate column names is given below.

# allow duplicate columns
df.insert(1,"qualification",["B-Tech","B.sc","B-Tech","MCA"],allow_duplicates=True)
print(df)

Other Way to Add Column

# 2nd way to add a column to data Frame
import pandas as pd 
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20]}
df=pd.DataFrame(data=students)
print(df)

df["qualification"]=["B-Tech","B.sc","B-Tech","MCA"]
print(df)

Output:
     name  age  marks
0  rajesh   25     85
1  suresh   35     45
2  mahesh   40     65
3       j    1     20


     name  age  marks qualification
0  rajesh   25     85        B-Tech
1  suresh   35     45          B.sc
2  mahesh   40     65        B-Tech
3       j    1     20           MCA

In the above program, we use df[“qualification”]=[“B-Tech”,” B.sc”,” B-Tech”,” MCA”] to add a new column to the data frame.

assign Method to Add Multiple Columns

The example program is given below.

# adding multiple new columns using assign method
import pandas as pd 
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20]}
df=pd.DataFrame(data=students)
print(df)

list1=["B-Tech","B.sc","B-Tech","MCA"]
list2=["F","M","F","M"]
df=df.assign(qualification=list1,Gender=list2)#assign method return new object
print(df)

Output:
     name  age  marks
0  rajesh   25     85
1  suresh   35     45
2  mahesh   40     65
3       j    1     20

     name  age  marks qualification Gender
0  rajesh   25     85        B-Tech      F
1  suresh   35     45          B.sc      M
2  mahesh   40     65        B-Tech      F
3       j    1     20           MCA      M

The assign method will take the new column names, and a list is assigned to the Column.

The list assigned should contain the elements of the Column.

In the above example, we add two columns qualification and gender.

drop Method to remove Column from the Data Frame

The discussion about the drop method is made in our previous class. Click here.

The drop method is used to remove columns and rows.

To remove columns, we use axis =1.

If axis =1, the parameter label is considered as a column.

The example program to remove columns is shown below.

#removing a column from the data frame

import pandas as pd 
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20],'qualification':["B-Tech","B.sc","B-Tech","MCA"]}
df=pd.DataFrame(data=students)
print(df)

#drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
df.drop(labels="marks",axis=1,inplace=True)
print(df)

Output:
     name  age  marks qualification
0  rajesh   25     85        B-Tech
1  suresh   35     45          B.sc
2  mahesh   40     65        B-Tech
3       j    1     20           MCA

     name  age qualification
0  rajesh   25        B-Tech
1  suresh   35          B.sc
2  mahesh   40        B-Tech
3       j    1           MCA

removing Multiple columns

If the label parameter is assigned the list of column names. The drop method will remove all the columns.

The example program is shown below.

# removing multiple columns using names
import pandas as pd 
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20],'qualification':["B-Tech","B.sc","B-Tech","MCA"]}
df=pd.DataFrame(data=students)
print(df)

df.drop(labels=["marks","qualification"],axis=1,inplace=True)
print(df)

Output:
     name  age  marks qualification
0  rajesh   25     85        B-Tech
1  suresh   35     45          B.sc
2  mahesh   40     65        B-Tech
3       j    1     20           MCA
     name  age
0  rajesh   25
1  suresh   35
2  mahesh   40
3       j    1

Delete Column using Column Index

The columns attribute in the data frame will give the names of the columns to the respective indexes.

df.columns[[2,3]]. will give the list of column names for indexes 2 and 3.

Using columns attribute to get the column names and place them in label parameter in drop method.

The example is shown below.

# removing columns using index
import pandas as pd 
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20],'qualification':["B-Tech","B.sc","B-Tech","MCA"]}
df=pd.DataFrame(data=students)
print(df)
print()
print(df.columns[[2,3]])
print()


df.drop(df.columns[[2,3]],axis=1,inplace=True)
print(df)

Output:
     name  age  marks qualification
0  rajesh   25     85        B-Tech
1  suresh   35     45          B.sc
2  mahesh   40     65        B-Tech
3       j    1     20           MCA

Index(['marks', 'qualification'], dtype='object')

     name  age
0  rajesh   25
1  suresh   35
2  mahesh   40
3       j    1

Deleting Columns using parameter columns.

#remove columns using parameter columns
import pandas as pd 
students={'name':['rajesh','suresh','mahesh','j'],'age':[25,35,40,1],'marks':[85,45,65,20],'qualification':["B-Tech","B.sc","B-Tech","MCA"]}
df=pd.DataFrame(data=students)
print(df)

df.drop(columns=["marks","qualification"],inplace=True)
print(df)

Output:
     name  age  marks qualification
0  rajesh   25     85        B-Tech
1  suresh   35     45          B.sc
2  mahesh   40     65        B-Tech
3       j    1     20           MCA
     name  age
0  rajesh   25
1  suresh   35
2  mahesh   40
3       j    1

Instead of using axis = 1 and label. We can directly use the columns attribute to list column names that are to be dropped.

Previous Lesson

Back to Course

Next Lesson