What is Probability Distributions?
- Probability distributions represent the probabilities associated with all outcomes of a random variable.
- Depending on the type of random variable - discrete or continuous - probability distributions classified as discrete and continuous probability distributions.
Discrete probability distribution
- Discrete probability distributions explain the probabilities associated with each possible outcome of a discrete random variable (countable quantity such as 0, 1, 2, and so on and not fractions, e.g. number of apples).
- The probability of each observation of discrete random variable lies between 0 and 1, and the sum of probabilities of all observations is 1.
- Binomial and Poisson distributions are a discrete probability distribution
- For example, a restaurant sells 10 to 20 pizzas during lunch hour, and Table 1 represents the discrete probability distribution of pizza sell. A random variable (X) takes all possible discrete values between 10 and 20. p(X=x) or p(x) represents the probability of each value of pizza sell.
Table 1: Probability distribution of pizza sells
Graphically, it can be shown as,
Figure 1: Probability distribution of pizza sells
Probability mass function (PMF) and cumulative distribution function (CDF)
- The probability mass function (PMF) is a distribution of the probability of each possible value (x) of X. For example, p(X=12) is 0.11, which is the PMF of X evaluated at 12.
- Similar to PMF, the cumulative distribution function (CDF) is a cumulative probability of at most x’s values of X. For example, p(X<=12) is 0.27, which is a cumulative probability of p(X=10), p(X=11), and p(X=12).
Continuous probability distribution
- Continuous probability distributions explain the probabilities associated with each possible outcome of a continuous random variable (infinite and uncountable quantity such as any values in a specified range, e.g. time spent on reading a blog page).
- The probability of each observation of continuous random variable that lies in between two values (a and b) is the area under the curve between a and b (see shaded area in Figure 2).
- For a continuous random variable, a probability density function (PDF) is used for calculating the probability for an interval between the two values (a and b) of X. The probability p(a ≤ x ≤ b) of any value between the a and b is equal to the area under the curve of a and b. The total area under the curve is always equal to one.
- Generally, the probability of interval is calculated in continuous probability distributions because the probability that X takes any single value is always zero.
- Similar to PDF, cumulative distribution function (CDF) is used for calculating the probability for all values of X which are less than or equal to some value p(X ≤ x ).
- The normal distribution, exponential distribution, and uniform distribution are continuous probability distributions
Let's take an example, a daily time spent on reading a blog page is approximately normally distributed with a mean of 3
minutes and a standard deviation of 0.5.
The shaded area in Figure 2 represents the probability that the time spent on reading a blog page in between 3 to 4 minutes i.e. p(3 ≤ x ≤ 4).
Figure 2: Normal distribution time spent on reading a blog page
This work is licensed under a Creative Commons Attribution 4.0 International License