summary() Function in R: How to Use (With 6 Examples)?

Renesh Bedre    4 minute read

The summary() is a base function in R which is useful for getting the detailed statistical summary of the fitted model (ANOVA, regression, etc.), data frame, vector, matrix, and factor.

For example, in the case of the fitted regression model, the summary() function returns the model equation, regression coefficients, residuals, F statistics, p value, and R-Squared.

The basic syntax for the summary() function is,

summary(object)

In above syntax, the object could be fitted model, data frame, data frame columns, matrix, or vector.

The following six example illustrates how to use a summary() function to summarise the results for various objects.

1. Summary statistics for the regression model

summary() function is a popular and widely used for summarising the statistical results obtained from the fitted regression model.

The following example shows how to use the lm() function to fit the linear regression model and summary() function to summarise the statistical results.

# load blood pressure example dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/reg/bp.csv")

# fit simple linear regression
model <- lm(BP ~ Age, data = df)

# get summary statistics
summary(model)

Call:
lm(formula = BP ~ Age, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.7104 -2.9217  0.4276  2.3973  7.8586 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  44.4545    18.7277   2.374  0.02894 * 
Age           1.4310     0.3849   3.718  0.00157 **
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 4.195 on 18 degrees of freedom
Multiple R-squared:  0.4344,	Adjusted R-squared:  0.403 
F-statistic: 13.82 on 1 and 18 DF,  p-value: 0.001574

In the regression model, the summary() function returns residuals, regression coefficients, performance metrics (R-Squared), and statistical significance of regression such as F statistics and p value.

In addition to summary(), you can also use summary.lm() to get similar results.

2. Summary statistics for the ANOVA model

When you run ANOVA in R, the summary() function is used for summarising the statistical results from the ANOVA model.

The following example shows how to use the aov() function to fit the ANOVA model and the summary() function to summarise the statistical results.

# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/anova.csv")

# fit one-way ANOVA
model <- aov(response ~ treatment, data = df)

# get summary statistics
summary(model)

            Df Sum Sq Mean Sq F value   Pr(>F)    
treatment    3   3011  1003.6   17.49 2.64e-05 ***
Residuals   16    918    57.4                     
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

In ANOVA, the summary() function returns an ANOVA table that contains the degree of freedom for treatment, residuals (experimental error), and statistical significance of ANOVA such as F statistics and p value.

In addition to summary(), you can also use summary.lm() on the ANOVA model which returns detailed summary statistics for each treatment group.

3. Summary statistics for data frame

The summary() function could be used for getting descriptive statistics such as mean, median, and quartiles for all or specific columns of a R data frame.

If you want descriptive statistics for additional parameters such as standard error (se), standard deviation (sd), sample count, trimmed mean, etc., you should use describe() function.

Get descriptive statistics for all columns,

# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/anova.csv")

# get summary statistics
summary(df)

  treatment            response    
 Length:20          Min.   :25.00  
 Class :character   1st Qu.:29.00  
 Mode  :character   Median :36.50  
                    Mean   :41.45  
                    3rd Qu.:54.25  
                    Max.   :73.00 

For a numeric variable, the summary() function returns the statistical summary for minimum, first quartile (25th percentile), median, mean, third quartile (75th percentile), and maximum value.

Now let’s check how to get descriptive statistics for a specific column,

# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/anova.csv")

# get summary statistics for response variable
summary(df$response)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  25.00   29.00   36.50   41.45   54.25   73.00 

4. Summary statistics for factor

The summary() function could be used for getting the frequency of the character variable. The character variable should be formatted as a factor.

Get a summary from a character variable,

# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/anova.csv")

# get summary of character variable
summary(as.factor(df$treatment))

A B C D 
5 5 5 5 

For a factor, the summary() function returns the frequency of each factor or group.

5. Summary statistics for vector

For a numerical vector, the summary() function returns the descriptive statistical summary.

# create random numeric vector
x <- c(1, 0.5, 3, 4.5, 3, 2)

# summary
summary(x)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.500   1.250   2.500   2.333   3.000   4.500 

Note: The summary function drops NA values while providing a statistical summary on a numeric vector.

For a character vector, the summary() function returns the frequency of the character. The character vector should be formatted as a factor.

# create random character vector
x <- c("A", "B", "A", "C", "A", "B")

# summary
summary(as.factor(x))

A B C 
3 2 1

6. Summary statistics for matrix

Similar to a data frame, the summary() function returns a descriptive summary statistics for each column of the matrix.

If you convert a data frame to the matrix, the factor columns (characters) are converted to integer values.

# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/anova.csv")

# convert to matrix
df_mat = data.matrix(df)

# get summary statistics
summary(df_mat)

   treatment       response    
 Min.   :1.00   Min.   :25.00  
 1st Qu.:1.75   1st Qu.:29.00  
 Median :2.50   Median :36.50  
 Mean   :2.50   Mean   :41.45  
 3rd Qu.:3.25   3rd Qu.:54.25  
 Max.   :4.00   Max.   :73.00  

Enhance your skills with statistical courses using R


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.