# `summary()` Function in R: How to Use (With 6 Examples)?

The `summary()` is a base function in R which is useful for getting the detailed statistical summary of the fitted model (ANOVA, regression, etc.), data frame, vector, matrix, and factor.

For example, in the case of the fitted regression model, the `summary()` function returns the model equation, regression coefficients, residuals, F statistics, p value, and R-Squared.

The basic syntax for the `summary()` function is,

``````summary(object)
``````

In above syntax, the `object` could be fitted model, data frame, data frame columns, matrix, or vector.

The following six example illustrates how to use a `summary()` function to summarise the results for various objects.

## 1. Summary statistics for the regression model

`summary()` function is a popular and widely used for summarising the statistical results obtained from the fitted regression model.

The following example shows how to use the `lm()` function to fit the linear regression model and `summary()` function to summarise the statistical results.

``````# load blood pressure example dataset

# fit simple linear regression
model <- lm(BP ~ Age, data = df)

# get summary statistics
summary(model)

Call:
lm(formula = BP ~ Age, data = df)

Residuals:
Min      1Q  Median      3Q     Max
-6.7104 -2.9217  0.4276  2.3973  7.8586

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  44.4545    18.7277   2.374  0.02894 *
Age           1.4310     0.3849   3.718  0.00157 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.195 on 18 degrees of freedom
Multiple R-squared:  0.4344,	Adjusted R-squared:  0.403
F-statistic: 13.82 on 1 and 18 DF,  p-value: 0.001574
``````

In the regression model, the `summary()` function returns residuals, regression coefficients, performance metrics (R-Squared), and statistical significance of regression such as F statistics and p value.

In addition to `summary()`, you can also use `summary.lm()` to get similar results.

## 2. Summary statistics for the ANOVA model

When you run ANOVA in R, the `summary()` function is used for summarising the statistical results from the ANOVA model.

The following example shows how to use the `aov()` function to fit the ANOVA model and the `summary()` function to summarise the statistical results.

``````# load dataset

# fit one-way ANOVA
model <- aov(response ~ treatment, data = df)

# get summary statistics
summary(model)

Df Sum Sq Mean Sq F value   Pr(>F)
treatment    3   3011  1003.6   17.49 2.64e-05 ***
Residuals   16    918    57.4
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
``````

In ANOVA, the `summary()` function returns an ANOVA table that contains the degree of freedom for treatment, residuals (experimental error), and statistical significance of ANOVA such as F statistics and p value.

In addition to `summary()`, you can also use `summary.lm()` on the ANOVA model which returns detailed summary statistics for each treatment group.

## 3. Summary statistics for data frame

The `summary()` function could be used for getting descriptive statistics such as mean, median, and quartiles for all or specific columns of a R data frame.

If you want descriptive statistics for additional parameters such as standard error (se), standard deviation (sd), sample count, trimmed mean, etc., you should use `describe()` function.

Get descriptive statistics for all columns,

``````# load dataset

# get summary statistics
summary(df)

treatment            response
Length:20          Min.   :25.00
Class :character   1st Qu.:29.00
Mode  :character   Median :36.50
Mean   :41.45
3rd Qu.:54.25
Max.   :73.00
``````

For a numeric variable, the `summary()` function returns the statistical summary for minimum, first quartile (25th percentile), median, mean, third quartile (75th percentile), and maximum value.

Now let’s check how to get descriptive statistics for a specific column,

``````# load dataset

# get summary statistics for response variable
summary(df\$response)

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
25.00   29.00   36.50   41.45   54.25   73.00
``````

## 4. Summary statistics for factor

The `summary()` function could be used for getting the frequency of the character variable. The character variable should be formatted as a factor.

Get a summary from a character variable,

``````# load dataset

# get summary of character variable
summary(as.factor(df\$treatment))

A B C D
5 5 5 5
``````

For a factor, the `summary()` function returns the frequency of each factor or group.

## 5. Summary statistics for vector

For a numerical vector, the `summary()` function returns the descriptive statistical summary.

``````# create random numeric vector
x <- c(1, 0.5, 3, 4.5, 3, 2)

# summary
summary(x)

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.500   1.250   2.500   2.333   3.000   4.500
``````

Note: The summary function drops NA values while providing a statistical summary on a numeric vector.

For a character vector, the `summary()` function returns the frequency of the character. The character vector should be formatted as a factor.

``````# create random character vector
x <- c("A", "B", "A", "C", "A", "B")

# summary
summary(as.factor(x))

A B C
3 2 1
``````

## 6. Summary statistics for matrix

Similar to a data frame, the `summary()` function returns a descriptive summary statistics for each column of the matrix.

If you convert a data frame to the matrix, the factor columns (characters) are converted to integer values.

``````# load dataset

# convert to matrix
df_mat = data.matrix(df)

# get summary statistics
summary(df_mat)

treatment       response
Min.   :1.00   Min.   :25.00
1st Qu.:1.75   1st Qu.:29.00
Median :2.50   Median :36.50
Mean   :2.50   Mean   :41.45
3rd Qu.:3.25   3rd Qu.:54.25
Max.   :4.00   Max.   :73.00
``````