A Pandas dataframe is a two dimensional data structure which allows you to store data in rows and columns. It's very useful when you're analyzing data.
When you have a list of data records in a dataframe, you may need to drop a specific list of rows depending on the needs of your model and your goals when studying your analytics.
In this tutorial, you'll learn how to drop a list of rows from a Pandas dataframe.
To learn how to drop columns, you can read here about How to Drop Columns in Pandas.
How to Drop a Row or Column in a Pandas Dataframe
To drop a row or column in a dataframe, you need to use the drop()
method available in the dataframe. You can read more about the drop()
method in the docs here.
Dataframe Axis
- Rows are denoted using
axis=0
- Columns are denoted using
axis=1
Dataframe Labels
- Rows are labelled using the index number starting with 0, by default.
- Columns are labelled using names.
Drop() Method Parameters
index
- the list of rows to be deletedaxis=0
- Marks the rows in the dataframe to be deletedinplace=True
- Performs the drop operation in the same dataframe, rather than creating a new dataframe object during the delete operation.
Sample Pandas DataFrame
Our sample dataframe contains the columns product_name, Unit_Price, No_Of_Units, Available_Quantity, and Available_Since_Date columns. It also has rows with NaN values which are used to denote missing values.
import pandas as pd
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers",pd.NaT],
"Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50,None],
"No_Of_Units":[5,5, 10, 20, 20, 8,pd.NaT],
"Available_Quantity":[5,6,10,"Not Available","Not Available", pd.NaT,pd.NaT],
"Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','09/18/2021','01/05/2021',pd.NaT]
}
df = pd.DataFrame(data)
df
The dataframe will look like this:
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
3 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
4 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
5 | Speakers | 250.500 | 8 | NaT | 01/05/2021 |
6 | NaT | NaN | NaT | NaT | NaT |
And just like that we've created our sample dataframe.
After each drop operation, you'll print the dataframe by using df
which will print the dataframe in a regular HTML
table format.
You can read here about how to Pretty Print a Dataframe to print the dataframe in different visual formats.
Next, you'll learn how to drop a list of rows in different use cases.
How to Drop a List of Rows by Index in Pandas
You can delete a list of rows from Pandas by passing the list of indices to the drop()
method.
df.drop([5,6], axis=0, inplace=True)
df
In this code,
[5,6]
is the index of the rows you want to deleteaxis=0
denotes that rows should be deleted from the dataframeinplace=True
performs the drop operation in the same dataframe
After dropping rows with the index 5 and 6, you'll have the below data in the dataframe:
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.000 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.000 | 5 | 6 | 4/23/2021 |
2 | Monitor | 5000.235 | 10 | 10 | 08/21/2021 |
3 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
4 | CPU | 10000.550 | 20 | Not Available | 09/18/2021 |
This is how you can delete rows with a specific index.
Next, you'll learn about dropping a range of indices.
How to Drop Rows by Index Range in Pandas
You can also drop a list of rows within a specific range.
A range is a set of values with a lower limit and an upper limit.
This may be useful in cases where you want to create a sample dataset exlcuding specific ranges of data.
You can create a range of rows in a dataframe by using the df.index()
method. Then you can pass this range to the drop()
method to drop the rows as shown below.
df.drop(df.index[2:4], inplace=True)
df
Here's what this code is doing:
df.index[2:4]
generates a range of rows from 2 to 4. The lower limit of the range is inclusive and the upper limit of the range is exclusive. This means that rows 2 and 3 will be deleted and row 4 will not be deleted.inplace=True
performs the drop operation in the same dataframe
After dropping rows within the range 2-4, you'll have the below data in the dataframe:
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.00 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.00 | 5 | 6 | 4/23/2021 |
4 | CPU | 10000.55 | 20 | Not Available | 09/18/2021 |
This is how you can drop the list of rows in the dataframe using its range.
How to Drop All Rows after an Index in Pandas
You can drop all rows after a specific index by using iloc[]
.
You can use iloc[]
to select rows by using its position index. You can specify the start and end position separated by a :
. For example, you'd use 2:3
to select rows from 2 to 3. If you want to select all the rows, you can just use :
in iloc[]
.
This may be useful in cases where you want to split the dataset for training and testing purposes.
Use the below snippet to select rows from 0 to the index 2. This results in dropping the rows after the index 2.
df = df.iloc[:2]
df
In this code, :2
selects the rows until the index 2.
This is how you can drop all rows after a specific index.
After dropping rows after the index 2, you'll have the below data in the dataframe:
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
0 | Keyboard | 500.0 | 5 | 5 | 11/5/2021 |
1 | Mouse | 200.0 | 5 | 6 | 4/23/2021 |
This is how you can drop rows after a specific index.
Next, you'll learn how to drop rows with conditions.
How to Drop Rows with Multiple Conditions in Pandas
You can drop rows in the dataframe based on specific conditions.
For example, you can drop rows where the column value is greater than X and less than Y.
This may be useful in cases where you want to create a dataset that ignores columns with specific values.
To drop rows based on certain conditions, select the index of the rows which pass the specific condition and pass that index to the drop()
method.
df.drop(df[(df['Unit_Price'] >400) & (df['Unit_Price'] < 600)].index, inplace=True)
df
In this code,
(df['Unit_Price'] >400) & (df['Unit_Price'] < 600)
is the condition to drop the rows.df[].index
selects the index of rows which passes the condition.inplace=True
performs the drop operation in the same dataframe rather than creating a new one.
After dropping the rows with the condition which has the unit_price
greater than 400 and less than 600, you'll have the below data in the dataframe:
product_name | Unit_Price | No_Of_Units | Available_Quantity | Available_Since_Date | |
---|---|---|---|---|---|
1 | Mouse | 200.0 | 5 | 6 | 4/23/2021 |
This is how you can drop rows in the dataframe using certain conditions.
Conclusion
To summarize, in this article you've learnt what the drop()
method is in a Pandas dataframe. You've also seen how dataframe rows and columns are labelled. And finally you've learnt how to drop rows using indices, a range of indices, and based on conditions.
If you liked this article, feel free to share it.