How to use %in% and %notin% operators in R (with examples)

Renesh Bedre    8 minute read

%in% operator in R

Introduction

  • %in% is a built-in infix operator, which is similar to value matching function match. %in% is an infix version of match. User defined infix operators can be created by creating a function and naming it in between two % (e.g. %function_name%).
  • %in% returns logical vector (TRUE or FALSE but never NA) if there is a match or not for its left operand. Output logical vector has the same length as left operand.
  • If there are two vector x and y then the syntax of %in%: x %in% y
  • %in% works only with vectors
  • %notin% is not a built-in operator and can be created by negating the %in% operator (see below)
  • Help syntax for %in% operator: ?"%in%"

Learn about $ operator for data extraction from Data Frame and list in R

Here are few examples of how to use %in% to manipulate vectors and Data Frames in R,

%in% to check the value in a vector

%in% is helpful to check any value in a vector. If there is a match to the value, it returns TRUE, otherwise FALSE

x <- c(1,5,10,20,20,24,45)

# check any number in x vector
20 %in% x
[1] TRUE

%in% to compare two vectors

%in% operator is useful to compare two vectors and identify common values in between the two vectors.

x <- c(1,5,45)
y <- c(5,45)

x %in% y
[1] FALSE  TRUE  TRUE

# find common values
x[x %in% y]
[1]  5 45

Check two sequences of numbers and identify common numbers

x <- 1:5
y <- 3:7

x %in% y
[1] FALSE FALSE  TRUE  TRUE  TRUE

# find common letters
x[x %in% y]
[1] 3 4 5

Similarly, you can check two vectors containing letters and identify common letters

x <- LETTERS[1:5]
y <- LETTERS[4:7]

x %in% y
[1] FALSE FALSE FALSE  TRUE  TRUE

# find common numbers
x[x %in% y]
[1] "D" "E"

If you have a big vector (say vector with 1000 values), you can use any, all, or which functions with %in% operator

x <- 1:1000
y <- 900:2000

# check if there is any common values between a and b vectors
any(x %in% y)
[1] TRUE

# check if there are all values common between a and b vectors
all(x %in% y)
[1] FALSE

# get indexes of common values
a <- 1:10
b <- 6:200

which(a %in% b)
[1]  6  7  8  9 10

%in% to check the value in a Data Frames

%in% can be used for checking any value present in columns of Data Frames

Create a Data Frame,

df <- data.frame(col1 = c("A", "B", "C"),
  col2 = c(1, 2, 3),
  col3 = c(0.1, 0.2, 0.3))
# output
  col1 col2 col3
1    A    1  0.1
2    B    2  0.2
3    C    3  0.3

Check if any value is present in Data Frame columns,

# check if 'B' is present in col1
'B' %in% df$col1
[1] TRUE

# to check if any value in col1  is B
df$col1  %in% 'B'
[1] FALSE  TRUE FALSE

If you want to compare the vector with Data Frame columns, the %in% operator comes in handy. See below example,

# check if values of vector are present in a Data Frame columns
lapply(df, `%in%`, c(1, 4, 0.1))
# output
$col1
[1] FALSE FALSE FALSE

$col2
[1]  TRUE FALSE FALSE

$col3
[1]  TRUE FALSE FALSE

%in% to update the values in a Data Frame

%in% operator is useful when you want to scan Data Frame and update (replace) the existing values in Data Frame where there is a match with a given vector or values.

# search values in vector in Data Frame and replace with 0 where there is match
df[sapply(df, `%in%`, c(1, 4, 0.1))] <- 0
df
# output
  col1 col2 col3
1    A    0  0.0
2    B    2  0.2
3    C    3  0.3

%in% and %notin% to filter (subset) Data Frames based on multiple values (with dplyr)

Pandas rows selection

In filtering the Data Frame based on single or multiple values, the %in% operator is useful. You can use the %in% operator to filter the values in columns in Data Frame,

Filter (subset) Data Frame where multiple values from vector match to the values in col1,

library(dplyr)
df  %>% filter(col1 %in% c('A', 'B'))  # same as df[df$col1 %in% c('A', 'B'),]
# output
  col1 col2 col3
1    A    1  0.1
2    B    2  0.2

Filter (subset) Data Frame where multiple values does not match to the values in col1 using %notin%,

For this example, you need to first create %notin% operator (see below at the end of this article)

library(dplyr)
df  %>% filter(col1 %notin% 'C')
# output
  col1 col2 col3
1    A    1  0.1
2    B    2  0.2

%in% and %notin% to remove column from Data Frames

Remove column in R

%in% and %notin% operators can also be used for removing single or multiple columns from Data Frames

You need to first create the %notin% operator (see below at the end of this article)

# remove col2
df[ , !(names(df) %in% "col2")]
# output
  col1 col3
1    A  0.1
2    B  0.2
3    C  0.3

#  %notin% operator to remove columns. 
#  You need to first create %notin% operator (see below at the end of this article)
df[ , (names(df) %notin% "col2")]
# output
  col1 col3
1    A  0.1
2    B  0.2
3    C  0.3

# to remove multiple column use column names vector such as c("col2", "col3")

%in% to select columns from Data Frames

%in% can be used to select single or multiple columns from Data Frames

# select single column
df[ ,(names(df) %in% "col3"), drop=FALSE]
# output
  col3
1  0.1
2  0.2
3  0.3

# select multiple columns
df[ ,(names(df) %in% c("col1", "col2"))]
# output
  col1 col2
1    A    1
2    B    2
3    C    3

%in% can also be used for selecting specific columns where the column names match on a condition,

Select the columns from dataframe where column names matches row names of other dataframe

# create dataframe with rownames
df_row <- data.frame(M1 = c("X", "Y" ),
  M2 = c(11, 22), row.names = c("col1", "col3"))

# get columns of dataframe df where its column names matches 
# to rownames of dataframe df_row
df[,  names(df) %in% rownames(df_row)]
# output
  col1 col3
1    A  0.1
2    B  0.2
3    C  0.3

%in% to compare two Data Frames

%in% can be used to compare two Data Frames and subset Data Frames based on the column value match. This is more similar like left join query i.e. select all records from one Data Frame where column values match to another Data Frame

Create another Data Frame,

df2 <- data.frame(col1 = c("A", "B", "D", "E"),
  col4 = c(100, 200, 300, 400),
  col5 = c("a", "b", "c", "d"))

df2
# output
  col1 col4 col5
1    A  100    a
2    B  200    b
3    D  300    c
4    E  400    d

Now compare df and df2 to get all records from df2 where col1 values match to col1 values in df (similar to left join of tables),

subset(df2, df2$col1 %in% df$col1)
# output
  col1 col4 col5
1    A  100    a
2    B  200    b

Comparison of %in% and == operators

  • == operator compares the value between two vectors element-wise (the first value of one vector compared with the first value of another vector), whereas %in% compares the value between two vectors one by all (the first value of the first vector compared with all values of the second vector)
  • With == operator, the length of the left and right operands must be the same. It is not necessary to have the same length for left and right operands for %in% operator.
x <- c(1, 2, 3)
y <- c(3, 2, 1)

x %in% y
[1] TRUE TRUE TRUE

x == y
[1] FALSE  TRUE FALSE

# compare and get indexes of two vectors
a <- c(1, 2, 9, 2)
b <- c(1, 2, 3, 4, 5)

# == operator only found first two indexes
which(a == b)
[1] 1 2
Warning message:
In a == b : longer object length is not a multiple of shorter object length

# %in% operator found all matched value indexes
which(a %in% b)
[1] 1 2 4

Create %notin% operator (opposite to %in%)

%notin% operator is not built-in and can be created by applying Negate function to %in%.

%notin% is opposite to %in% operator

You can also use %notin% as by putting ! in front of the %in% expression (!%in%)

`%notin%` <- Negate(`%in%`)

Check the value in a vector using %notin%

x <- c(1,5,10,20,20,24,45)

# check any number in x vector
50 %notin%  x # same as !(50 %in%  x) 
[1] TRUE

Update values of Data Frame to NA where values do not match,

# create data frame
df <- data.frame(col1 = c("A", "B", "C"),
  col2 = c(1, 2, 3),
  col3 = c(0.1, 0.2, 0.3))

# update values of Data Frame to NA where values does not match to c(2, 3)
df[sapply(df, `%notin%`, c(2, 3))] <- NA  # df[sapply(!(df, `%in%`, c(2, 3)))] <- NA
df
# output
  col1 col2 col3
1 <NA>   NA   NA
2 <NA>    2   NA
3 <NA>    3   NA

Enhance your skills with courses R

References

If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.

Tags:

Updated: