In Pandas, sometimes you'll need to remove columns from a DataFrame for various reasons, such as cleaning data, reducing memory usage, or simplifying analysis. And in this article, I'll show you how to do it.

I'll start by introducing the .drop() method, which is the primary method for removing columns in Pandas. We'll go through the syntax and parameters of the .drop() method, including how to specify columns to remove and how to control whether the original DataFrame is modified in place or a new DataFrame is returned.

Next, I'll provide an example of how to use the .drop() method to remove columns from a DataFrame.

How to Use the .drop() Method in Pandas

The .drop() method is a built-in function in Pandas that allows you to remove one or more rows or columns from a DataFrame. It returns a new DataFrame with the specified rows or columns removed and does not modify the original DataFrame in place, unless you set the inplace parameter to True.

The syntax for using the .drop() method is as follows:

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Here, DataFrame refers to the Pandas DataFrame that you want to remove rows or columns from. The parameters you can use with the .drop() method include:

  • labels: This parameter specifies the labels or indices of the rows or columns to be removed. You can pass either a single label or index or a list of labels or indices.
  • axis: This parameter specifies whether to remove rows or columns. By default, it is set to 0, which means rows are removed. If you want to remove columns, set it to 1.
  • index and columns: These parameters are alternative to the labels parameter and specify the labels or indices of rows or columns to be removed, respectively.
  • level: This parameter is used to remove a specific level of a hierarchical index.
  • inplace: This parameter is a boolean value that determines whether to modify the original DataFrame in place. By default, it is set to False.
  • errors: This parameter specifies how to handle errors if the specified label(s) or index(es) are not found in the DataFrame. By default, it is set to raise, which means that a KeyError is raised. Other options are ignore and warn, which will respectively ignore or display a warning when the label/index is not found.

How to Remove a Single Column from a Dataframe in Pandas

Let's ease into it by first learning how to remove a single column from a Dataframe before we remove multiple columns.

Code sample:

import pandas as pd

# create a sample dataframe
data = {'name': ['Alice', 'Bob', 'Charlie'],
        'age': [25, 30, 35],
        'gender': ['F', 'M', 'M']
        }
df = pd.DataFrame(data)

# display the original dataframe
print('Original DataFrame:\n', df)

# drop the 'gender' column
df = df.drop(columns=['gender'])

# display the modified dataframe
print('Modified DataFrame:\n', df)

Output:

Original DataFrame:
       name  age gender
0    Alice   25      F
1      Bob   30      M
2  Charlie   35      M

Modified DataFrame:
       name  age
0    Alice   25
1      Bob   30
2  Charlie   35

Code explanation:

In the example above, we first created a sample DataFrame with three columns – name, age, and gender. We then used the .drop() method with the columns parameter to remove the gender column. The resulting DataFrame only contains the name and age columns.

It's important to note that the .drop() method does not modify the original DataFrame in place. Instead, it returns a new DataFrame with the specified column(s) removed. If you want to modify the original DataFrame, you need to assign the result of the .drop() method back to the original variable, as we did in the example above.

In addition to the columns parameter, the .drop() method also has a number of other optional parameters you can use to control how columns are removed.

For example, you can use the inplace parameter to modify the original DataFrame in place instead of returning a new DataFrame. You can also use the axis parameter to remove columns by index instead of name.

How to Remove Multiple Columns from a Dataframe in Pandas

In this section we will remove multiple columns from our dataframe. This approach is similar to removing a single column from the dataframe.

To remove two or more columns from a DataFrame using the .drop() method in Pandas, we can pass a list of column names to the columns parameter of the method.

Code sample:

import pandas as pd

# create a sample dataframe
data = {'name': ['John', 'Mary', 'Peter'],
        'age': [30, 25, 35],
        'gender': ['Male', 'Female', 'Male'],
        'city': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)

# remove the 'gender' and 'city' columns
df.drop(columns=['gender', 'city'], inplace=True)

# print the modified dataframe
print(df)

Output:

    name  age
0   John   30
1   Mary   25
2  Peter   35

Code explanation:

In this example, we first create a sample DataFrame with four columns – name, age, gender, and city. Then, we use the .drop() method to remove the gender and city columns by passing a list of their names to the columns parameter. Finally, we set the inplace parameter to True to modify the original DataFrame and print the modified DataFrame.

Note that you can also remove columns by their indices by passing a list of indices to the columns parameter. For example, to remove the second and third columns, you can use:

df.drop(columns=df.columns[1:3], inplace=True)

This will remove the columns with indices 1 and 2 (that is the age and gender columns in this example).

Conclusion

I hope this article is a useful resource for anyone working with Pandas DataFrames who needs to remove columns efficiently and effectively.

Let's connect on Twitter and on LinkedIn. You can also subscribe to my YouTube channel.

Happy Coding!