Adding and Removing Rows in Data Frame

In this class, We discuss Adding and Removing Rows in Data Frame.

For Complete YouTube Video: Click Here

The reader should have prior knowledge on how to create a data frame using a list, dictionaries, etc. Click here.

Adding and Removing Rows

We take an example and understand adding and removing rows in the data frame.

First, we will use the loc attribute in the data frame to access the data.

After understanding how to access elements from the data frame. We will understand how to add and remove rows from the data frame.

Accessing Elements from Data Frame

The example data frame we consider is shown in the below program.

import pandas as pd
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],index=['a', 'c', 'b'],columns=['C1', 'C2'])
print(df)
print("-----------------")

# Accesssing rows using index
print(df.loc['a':'b'])

Output:
   C1  C2
a   1   2
c   4   5
b   7   8
-----------------
   C1  C2
a   1   2
c   4   5
b   7   8

In the above program, we define a data frame having three rows and two columns.

We can access elements in a data frame using the attribute loc.

The syntax is shown in the above program. df.loc[‘a’:’b’].

The indexes of the rows of the data frame are given a,c,b.

In the example above, we are accessing all the rows from a to b.

Similar to the concept of slicing. We used a colon to represent a to b.

Note: In slicing, the last value is not considered. Here last value is also considered.

Accessing rows using numeric indexes

#Default indexes
import pandas as pd
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],columns=['C1', 'C2'])
print(df)
print("-----------------")

# Accesssing rows using numeric index
print(df.loc[0:2])

Output:
   C1  C2
0   1   2
1   4   5
2   7   8
-----------------
   C1  C2
0   1   2
1   4   5
2   7   8

In the above program, we used the default indexes given by python.

Either way, we can use default indexes or user-defined indexes to access the elements.

Accessing elements using rows and columns

import pandas as pd
df = pd.DataFrame([[1, 2,3], [4, 5,6], [7, 8,9]],index=['a', 'c', 'b'],columns=['C1', 'C2','C3'])
print(df)
print("-----------------")

# Accesssing elements using rows and columns
print(df.loc['a','C1'])

Output:
   C1  C2  C3
a   1   2   3
c   4   5   6
b   7   8   9
-----------------
1

In the above program, we mentioned loc[‘a’,’ C1′].

The value a in the attribute loc shows the row index. And C1 shows the column name.

There is a comma separation between rows and columns.

The above example shows only a single element because we used row a and column C1.

Some more examples are shown below to understand the concept better.

print(df.loc[‘a’:’c’,’C1′])

Output:
a    1
c    4

The above code displays the rows from a to c and columns C1.

print(df.loc[‘a’:’c’,’C1′:’C3′])

Output:  
 C1  C2  C3
a   1   2   3
c   4   5   6

The above print statement displays the rows a and c. Columns C1 to C3.

Accessing elements using a list of row and column indexes.

print(df.loc[[‘a’,’c’],[‘C1′,’C3’]])

Output:
   C1  C2  C3
a   1   2   3
c   4   5   6

The list of rows and columns that need to be displayed are given in a list.

Applying Conditions

print(df.loc[df[‘C3’]>6])

Output:
   C1  C2  C3
b   7   8   9

In this example, we are applying the condition on column C3.

We need to display the values in C3 should be greater than 6.

One more example.

print(df.loc[df[‘C3′]>6,’C2’])

Output:
b    8

Adding Rows

# Adding rows to the DataFrame 
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns = ["a", "b"])
print(df)

# Adding rows using list
list1 = [5, 6]
length = len(df)
df.loc[length] = list1
print("------------")
print(df)

Output:
   a  b
0  1  2
1  3  4
------------
   a  b
0  1  2
1  3  4
2  5  6

In the above program, We have taken a list of elements, and we added that list to the data frame using the attribute loc.

The statement used to add the row is df.loc[length] = list1.

length is the index position to add a new row. In our example, we are using the last index position.

Replacing row zero with a new row is shown below.

df.loc[0]=list1
print(df)

Output:
   a  b
0  5  6
1  3  4
2  5  6

Adding two Data Frames

Method append is used to add two data frames.

Example code is given below.

# Adding two dataframes
import pandas as pd
  
df1 = pd.DataFrame([[1, 2], [3, 4]], columns = ["a", "b"])
print(df1)

df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ["a", "b"])
print(df2)
print("-----------")
print()

#Adding two DataFrames returning appended dataframe
df3=df1.append(df2)
print(df3)

Output:
   a  b
0  1  2
1  3  4
   a  b
0  5  6
1  7  8
-----------

   a  b
0  1  2
1  3  4
0  5  6
1  7  8

The two data frames are combined in the above program, and the index values are not changed.

To change the index values, we use the option ignore_index = True in the method append.

The sample program below shows the new indexes after adding two data frames.

# Rearranging index values
import pandas as pd
  
df1 = pd.DataFrame([[1, 2], [3, 4]], columns = ["a", "b"])
print(df1)

df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ["a", "b"])
print(df2)
print("-----------")
print()

#Adding two DataFrames
df3=df1.append(df2,ignore_index = True)
print(df3)

Output:
   a  b
0  1  2
1  3  4
   a  b
0  5  6
1  7  8
        a	b
0	1	2
1	3	4
2	5	6
3	7	8

Columns not in order

In the below example, the columns in the data frame are not in the sequence.

Still, the method append will take the columns with the same name and append.

# columns not in order
import pandas as pd
  
df1 = pd.DataFrame([[1, 2], [3, 4]], columns = ["a", "b"])
print(df1)

df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ["b", "a"])
print(df2)
print("-----------")
print()

#Adding two DataFrames returning appended dataframe
df3=df1.append(df2)
print(df3)

Output:
   a  b
0  1  2
1  3  4
   b  a
0  5  6
1  7  8
-----------

   a  b
0  1  2
1  3  4
0  6  5
1  8  7

More columns in a data frame

In the example program given below, we have an extra column in one data frame.

The extra column is also attached. The empty places after appending two data frames are filled with NaN.

We discuss NaN in the following classes.

# More columns in a dataframe
import pandas as pd
  
df1 = pd.DataFrame([[1, 2], [3, 4]], columns = ["a", "b"])
print(df1)

df2 = pd.DataFrame([[5, 6,7], [8, 9,10]], columns = ["a", "b","c"])
print(df2)
print("-----------")
print()

#Adding two DataFrames returning appended dataframe
df3=df1.append(df2)
print(df3)

Output:
   a  b
0  1  2
1  3  4
   a  b   c
0  5  6   7
1  8  9  10
-----------

   a  b     c
0  1  2   NaN
1  3  4   NaN
0  5  6   7.0
1  8  9  10.0

Remove Rows

To remove the rows from the data frame, we use the method drop.

drop(labels=Noneaxis=0index=Nonecolumns=Nonelevel=Noneinplace=Falseerrors=’raise’)

In the method drop, we have labels parameter to mention the indexes we need to drop.

axis parameter is used to mention rows or columns.

axis =0 will consider labels as row indexes.

axis = 1 will take labels as column indexes.

We can delete both rows and columns using the drop method.

The example program is shown below.

# Remove rows from the dataframe

import pandas as pd
  
df1 = pd.DataFrame([[1, 2], [3, 4],[5,6],[7,8]], columns = ["a", "b"])
print(df1)
print("-----------")
print()

df1.drop([2,3],axis=0,inplace=True)
print(df1)

Output:
   a  b
0  1  2
1  3  4
2  5  6
3  7  8
-----------

   a  b
0  1  2
1  3  4

Another way to remove rows using the drop method.

We can use the index parameter. Instead of using labels and axis parameter

The example program is given below.

# instead of lable and axis we can use index
import pandas as pd
  
df1 = pd.DataFrame([[1, 2], [3, 4],[5,6],[7,8]], columns = ["a", "b"])
print(df1)
print("-----------")
print()

df1.drop(index=[2,3],inplace=True)
print(df1)

Output:
   a  b
0  1  2
1  3  4
2  5  6
3  7  8
-----------

   a  b
0  1  2
1  3  4