Query pandas DataFrame to select rows based on value and condition matching

Renesh Bedre    3 minute read

In this article, I will discuss how to query a pandas DataFrame to select the rows based on the exact and partial value matching to the column values

First create a random DataFrame,

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, np.nan], 'col2': ['abc', 'city', 'def', 'ghi', 'ijk', 'cd'], 'col3': [1, 'Y', 'Z', 'Z', 0, 1]})
df
# output
   col1  col2 col3
0   1.0   abc    1
1   2.0  city    Y
2   3.0   def    Z
3   4.0   ghi    Z
4   5.0   ijk    0
5   NaN    cd    1

Based on exact match

Select rows based on the exact match with the one column value,

# select the rows where col1 value is equal to 1
df[df['col1']==1]
# output
   col1 col2 col3
0   1.0  abc    1

# using query method
df.query('col1==1')
# output
   col1 col2 col3
0   1.0  abc    1

Select rows based on the exact match with the multiple column values,

# select the rows where col1 value is equal to 2 and col3 is equal to Y
# using & bitwise operator
df[(df['col1']==2) & (df['col3']=='Y') ]
# output
   col1  col2 col3
1   2.0  city    Y

# select the rows where col1 value is equal to 2 or col3 is equal to Y
# using | bitwise operator
df[(df['col1']==1) | (df['col3']=='Y') ]
# output
   col1  col2 col3
0   1.0   abc    1
1   2.0  city    Y

# using query method
df.query('col1 == 2 and col3 == "Y" ')
# output
   col1  col2 col3
1   2.0  city    Y

Based on partial match using regex

Select rows based on the partial match with the one column value,

# select the rows where specific column contains ty 
df[df['col2'].str.contains("ty")]
# output
   col1  col2 col3
1   2.0  city    Y

# using query method
df.query('col2.str.contains("ty")')
# output
   col1  col2 col3
1   2.0  city    Y

Select rows based on the multiple partial matches with the one column value,

# select the rows where specific column contains ty or de
df[df['col2'].str.contains("ty|de")]
# output
   col1  col2 col3
1   2.0  city    Y
2   3.0   def    Z

Select rows based on the start of value,

# select the rows where specific column value starts with ci
df[df['col2'].str.contains("^ci")]
# output
   col1  col2 col3
1   2.0  city    Y

Select rows based on the character index position in strings,

# select the rows where col2 has j character in second index position
df[df['col2'].str[1] == 'j']
# output
   col1 col2 col3
4   5.0  ijk    0

Based on condition match

Select rows where column values are greater or lesser than some value,

# select rows where col1 values are greater than 2
df[df['col1'] > 2]
# output
   col1 col2 col3
2   3.0  def    Z
3   4.0  ghi    Z
4   5.0  ijk    0

Select rows where column values are between some values,

# select rows where col1 values between 2 and 5
df[df['col1'].between(left=2, right=5, inclusive='both')] # left and right boundaries included
# output
   col1  col2 col3
1   2.0  city    Y
2   3.0   def    Z
3   4.0   ghi    Z
4   5.0   ijk    0

Using isin() function

If you have a list of values and would like to check any one of them matches to column value,

df[df['col2'].isin(['def', 'xyz', 'ijk'])]
# output
   col1 col2 col3
2   3.0  def    Z
4   5.0  ijk    0

Using isnull() function

Select rows where a column contains the null values,

df[df['col1'].isnull()]
# output
   col1 col2 col3
5   NaN   cd    1

# using query method
df.query('col1.isnull()')
# output
   col1 col2 col3
5   NaN   cd    1

Enhance your skills with courses on Python and pandas

Learn more about Python and R

If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com


This work is licensed under a Creative Commons Attribution 4.0 International License