# How to Calculate Descriptive Statistics in Python with Pandas DataFrame

The purpose of descriptive statistics is to summarise the statistical characteristics of the data in a meaningful way without inferring anything about them.

A descriptive statistic summarizes the central tendency (mean, median, mode), spread of the data (range, standard deviation, and variance), the shape of the data, and frequency of the data.

The `describe()` function from pandas calculates the descriptive statistics for a DataFrame.

The basic syntax for the `describe()` function is,

``````# for all columns
df.describe()

# for specific column
df["column_name"].describe()
``````

Where, `df` is pandas DataFrame

The `describe()` function calculates the following descriptive statistics for numeric data from a pandas DataFrame,

• Count
• Mean
• Standard deviation (`std`)
• Minimum (`min`)
• 25% (25th Percentile or First quartile)
• 50% (50th percentile or Second quartile or Median)
• 75% (75th Percentile or Third quartile)
• Maximum (`max`)

The `describe()` function calculates the following descriptive statistics for categorical data (e.g. strings) from a pandas DataFrame,

• Count
• Unique (number of unique values)
• Top (most common value)
• Frequency (`freq`)

Note: By default, `describe()` function returns descriptive statistics only for numerical columns if a data frame contains both numerical and categorical columns.

The following examples explain how to use the `describe()` function to get descriptive statistics from a pandas DataFrame.

### Calculate descriptive statistics for numerical pandas DataFrame

Create a numerical pandas DataFrame,

``````# load package
import pandas as pd

# create a DataFrame
df = pd.DataFrame({'Age':[25, 30, 20, 35, 38], 'Height':[5.5, 6.2, 5, 4.9, 5.9]})

# view DataFrame
Age  Height
0   25     5.5
1   30     6.2
2   20     5.0
3   35     4.9
4   38     5.9
``````

Calculate descriptive statistics,

``````df.describe()

Age    Height
count   5.000000  5.000000
mean   29.600000  5.500000
std     7.300685  0.561249
min    20.000000  4.900000
25%    25.000000  5.000000
50%    30.000000  5.500000
75%    35.000000  5.900000
max    38.000000  6.200000
``````

The `describe()` function outputs the values for count, mean, Standard deviation (`std`), minimum, maximum, and first quartile (25%), median (50%), and third quartile (75%) values.

Calculate the variance using `var()` function from pandas DataFrame,

``````df.var()

Age       53.300
Height     0.315
dtype: float64
``````

Calculate the range (difference between max and min values) from pandas DataFrame,

``````# for Age variable
df.Age.max() - df.Age.min()

18

# for Height column
df.Height.max() - df.Height.min()

1.29
``````

### Calculate descriptive statistics for categorical pandas DataFrame

Create a categorical pandas DataFrame,

``````# load package
import pandas as pd

# create a DataFrame
df = pd.DataFrame({'school':['A', 'B', 'C', 'D', 'E'], 'state':["TX", "TX", "CA", "CA", "CA"],
'temp':["hot", "hot", "mild", "mild", "mild"]})

# view DataFrame
school state  temp
0      A    TX       hot
1      B    TX       hot
2      C    CA       mild
3      D    CA       mild
4      E    CA       mild
``````

Calculate descriptive statistics,

``````df.describe()

school state  temp
count       5     5     5
unique      5     2     2
top         A    CA  mild
freq        1     3     3
``````

By default, the `describe()` outputs the values for count, number of unique values, most common values (`top`), and frequency (`freq`) of the most common value.

### Calculate descriptive statistics for mixed pandas DataFrame

By default, the `describe()` function returns descriptive statistics for the numerical column if you have mixed data types (numerical and categorical).

You can pass the `include='all` parameter to `describe()` function to get descriptive statistics for each data type

Create a mixed data type pandas DataFrame,

``````# load package
import pandas as pd

# create a DataFrame
df = pd.DataFrame({'name':['A', 'B', 'C', 'D', 'E'], 'Age':[25, 30, 20, 35, 38], 'Height':[5.5, 6.2, 5, 4.9, 5.9]})

# view DataFrame
name  Age  Height
0    A   25     5.5
1    B   30     6.2
2    C   20     5.0
3    D   35     4.9
4    E   38     5.9
``````

Calculate descriptive statistics for both numerical and categorical variables,

``````df.describe(include = "all")

name        Age    Height
count     5   5.000000  5.000000
unique    5        NaN       NaN
top       A        NaN       NaN
freq      1        NaN       NaN
mean    NaN  29.600000  5.500000
std     NaN   7.300685  0.561249
min     NaN  20.000000  4.900000
25%     NaN  25.000000  5.000000
50%     NaN  30.000000  5.500000
75%     NaN  35.000000  5.900000
max     NaN  38.000000  6.200000
``````