3 efficient ways to read (import) a CSV file into R

Renesh Bedre    5 minute read

read (import) a CSV in R

Importing data files is an essential step in data analysis and visualization. The CSV (comma-separated values) file formats (.csv) is commonly used for storing data in text format. In this article, you will learn multiple ways to read CSV files in R.

Download example CSV dataset: testvolcano.csv

Page content

1. read.csv()

read.csv() is a base R function that reads a CSV file and converts it to a data frame. The first line of the file is used as header by default in read.csv().

You can provide the complete path to the file or directly read the CSV file if it is present in the current directory. For example, if the file is in the home directory, provide the path as /home/user/file.csv (Linux/Mac) or C:\\Users\\wind\\file.csv (Windows).

# R version 4.2.0
# If the file is not in the current directory, 
# add the path to the file
df = read.csv("testvolcano.csv")
head(df)
# output
         GeneNames    log2FC  p.value
1 LOC_Os09g01000.1 -1.886539 1.25e-55
2 LOC_Os12g42876.1  3.231611 1.05e-55
3 LOC_Os12g42884.2  3.179004 2.59e-54
4 LOC_Os03g16920.1  5.290677 4.69e-54
5 LOC_Os05g47540.4  4.096862 2.19e-54
6 LOC_Os09g00999.1 -1.839222 1.95e-54

If there is no header in the CSV file, set header = False

df = read.csv("testvolcano.csv", header = False)

Specify a row name using row.names parameter while reading a CSV file. You can specify either column name or column number for row.names parameter. All values in row.names column should be unique. If there are duplicated values in row.names column, you will get an error.

df = read.csv("testvolcano.csv", row.names = "GeneNames")

If there are comment lines in the CSV file (for example, if comment lines starts with #), then set comment.char = "#"

df = read.csv("testvolcano.csv", comment.char = "#")

If you want to skip the first few lines before reading files, you can use the skip parameter. For example, if you want to skip the first 5 lines, you can set skip = 5.

df = read.csv("testvolcano.csv", skip = 5)

If you do not know the exact location of file, then you can use file.choose() also open file explorer to search and open a CSV file.

df = read.csv(file.choose())

read.csv() is not efficient to read big CSV files (several hundreds of MBs to GBs). if you have big files, it is recommended to use either fread() or read_csv() functions.

2. read_csv()

read_csv() function from readr package (part of tidyverse) can also be used for reading CSV data files. The first line of the file is used as a header by default in read_csv().

With read_csv, you can also get additional information such as table dimension and data type of each column

# readr v2.1.2
library(readr)
# If the file is not in the current directory, 
# add the path to the file
df = read_csv("testvolcano.csv")
head(df)
# output
# A tibble: 6 × 3
  GeneNames        log2FC `p-value`
  <chr>             <dbl>     <dbl>
1 LOC_Os09g01000.1  -1.89  1.25e-55
2 LOC_Os12g42876.1   3.23  1.05e-55
3 LOC_Os12g42884.2   3.18  2.59e-54
4 LOC_Os03g16920.1   5.29  4.69e-54
5 LOC_Os05g47540.4   4.10  2.19e-54
6 LOC_Os09g00999.1  -1.84  1.95e-54

If there is no header in the CSV file, set col_names = FALSE. In addition, if you want to set custom column names, you can use col_names parameter.

# if there is no header
df = read_csv("testvolcano.csv", col_names = FALSE)

# set custom column names (where there is no column names in file)
df = read_csv("testvolcano.csv", col_names = c("gene", "fc", "pv"))

If there are comment lines in the CSV file (for example, if comment lines start with #), then set comment = "#"

df = read_csv("testvolcano.csv", comment = "#")

If you want to skip the first few lines before reading files, you can use the skip parameter. For example, if you want to skip the first 2 lines, you can set skip = 2.

df = read_csv("testvolcano.csv", skip = 2)

If you do not know the exact location of file, then you can use file.choose() also open file explorer to search and open a CSV file.

df = read_csv(file.choose())

The read_csv() function is ~5 to ~10X faster than the read.csv() base function. read_csv() also display a progress bar which is useful for importing big files. In addition, read_csv() outputs a tibble (modern data frame) which keeps the input data type intact. read_csv() is also more reproducible than read.csv().

3. fread()

fread() function from data.table package (advanced version base R’s data.frame) can also be used for importing CSV data files.

fread() is fast (~5X faster than read.csv()) and memory efficient, and especially good for importing big CSV files. fread() automatically detects the most common delimiter [,\t |;:]

# data.table v1.14.2
library(data.table)
# If the file is not in the current directory, 
# add the path to the file
df = fread("testvolcano.csv")
head(df)
# output
          GeneNames    log2FC  p-value
1: LOC_Os09g01000.1 -1.886539 1.25e-55
2: LOC_Os12g42876.1  3.231611 1.05e-55
3: LOC_Os12g42884.2  3.179004 2.59e-54
4: LOC_Os03g16920.1  5.290677 4.69e-54
5: LOC_Os05g47540.4  4.096862 2.19e-54
6: LOC_Os09g00999.1 -1.839222 1.95e-54

fread() automatically detects the header. If you do not want header to be used, set header = FALSE.

# if there is no header
df = fread("testvolcano.csv", header = FALSE)

If you want to skip the first few lines before reading files, you can use the skip parameter. For example, if you want to skip the first 4 lines, you can set skip = 4.

df = fread("testvolcano.csv", skip = 4)

If you do not know the exact location of file, then you can use file.choose() also open file explorer to search and open a CSV file.

df = fread(file.choose())

Enhance your skills with courses on Statistics and R

References


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.

Tags:

Updated: