Topics

Reference

  • Wasserman (2004), Chapters 2, 3, and 5

Sample Space

Sample space \(\Omega\) is the set of possible outcomes.

  • examples: \(\Omega = \{H, T\}\) for a coin toss

  • \(\Omega = \{HH, HT, TH, TT\}\) for two coin tosses

Outcome \(\omega\): an element of sample space \(\Omega\)

  • example: \(\omega=H\) for a coin toss

Event \(A\): subset of sample space \(\Omega\)

  • example: \(\Omega = \{HT, TH\}\) for tow coin tosses

Probability \(P(A)\) of an event \(A\) represents the frequency of observing \(A\)

  • Probability distribution \(P\) is a function that satisfies:

    • \(P(A) \ge 0\) for every event \(A\)
    • \(P(\Omega) = 1\)
    • If \(A\) and \(B\) are disjoint, \(P(A\cup B) = P(A) + P(B)\)

Random Variable

Random variable \(X\) is a mapping from each outcome \(\omega\in\Omega\) to a real number

  • example: \(X\) is the number of heads in two coin tosses

    • \(X(\omega)=1\) for both \(\omega=HT\) and \(\omega=TH\).

Probability for a random variable for the events that give the value:

- $P(X=1) = P({HT,TH}) = 1/2$  
- $P(X\le 1) = P({TT,HT,TH}) = 3/4$  
  • In general:

    • \(P(X=x) = P(\omega\in\Omega; X(\omega)=x)\)
    • \(P(X\in A) = P(\omega\in\Omega; X(\omega)=A)\)

Probability function (or probability mass function)

  • For discrete radom variable: \(f_X(x)=P(X=x)\)

  • Example: # of heads in two coin tosses

p = 0.3
x = 0:2
f = c((1-p)^2, 2*p*(1-p), p^2)
plot(x, f, type="h", lwd=3, ylim=c(0, 1))  # "h" for histogram-like

Probability density function (PDF)

  • For continuous random variable \(X\).

    • \(f_X(x) \ge 0\) for all \(x\)
    • \(\int_{-\infty}^{\infty} f_X(x)dx = 1\)

    • \(P(a<X<b) = \int_{a}^{b} f_X(x)dx\)

  • Example: Uniform distribution in \([0,1]\)

Probability density function:

x = seq(-0.5, 1.5, 0.01)
f = dunif(x, 0, 1)
plot(x, f, type="l")

Cumulative distribution function (CDF)

  • \(F_X(x)=P(X\le x)\)

  • CDF is related with PDF as:

    • \(F_X(x) = \int_{-\infty}^{x} f_X(x)dx\)
    • \(f_X(x) = F'_X(x)\) where \(F_X\) is differentiable
  • CDF allows dealing with discrete and continuous random variables in a unified way.

Inverse CDF (or quantile function)

  • \(F^{-1}(q) = \inf\{x: F(x)>q\}\)

  • Example: # of heads in two coin tosses

p = 0.5
f = c((1-p)^2, 2*p*(1-p), p^2)
# CDF
F = rep( c(0, cumsum(f)), each=2)
x = c(-1, rep(0:2, each=2), 3)
par(mfrow=c(1, 2))  # side by side 
plot(x, F, type="l")
# Inverse CDF
plot(F, x, type="l", xlab="q", ylab="x=F^-1(q)")

  • Example: Uniform distribution in \([0,1]\)
x = seq(-0.5, 1.5, 0.01)
# CDF
F = punif(x, 0, 1)
par(mfrow=c(1, 2))  # side by side 
plot(x, F, type="l")
# Inverse CDF
plot(F, x, type="l", xlab="q", ylab="x=F^-1(q)")

Discrete Random Variables

\(X \sim F\) means \(X\) has distribution \(F\)

Uniform distribution on integers \({1,...,k}\):

\[f(x) = \frac{1}{k}\]

k = 5
x = 1:k
f = matrix(1/k, 1, k)
plot(x, f, type="h", lwd=3, ylim=c(0,1))

Bernoulli Distribution

\[X \sim \mbox{Bernoulli}(p)\]

  • coin toss with the probability of head \(p\)

\[P(X=1) = p\] \[P(X=0) = 1-p\]

The probability (mass) function can be represented as: \[f(x) = p^x(1-p)^{1-x}\]

p = 0.3
x = 0:1
f = p^x * (1-p)^(1-x)  # = c(1-p, p)
plot(x, f, type="h", lwd=3, ylim=c(0,1))

Binomial Distribution

\[X \sim \mbox{Binomial}(n, p)\]

  • number of heads in \(n\) coin tosses the probability of head \(p\)

\[f(x) = {n \choose x} p^x(1-p)^{n-x}\] \({n \choose x}=\frac{n!}{x!(n-x)!}\): the number of ways choosing \(x\) items out of \(n\).

n = 5
p = 0.6
x = 0:n
f = choose(n,x) * p^x * (1-p)^(n-x)
#f = dbinom(x, n, p)
plot(x, f, type="h", lwd=3, ylim=c(0,1))

Probability distributions in R

Most popular distributions are available with the convention:

  • d...(): probability density or mass function
  • p...(): CDF
  • q...(): Quantile function (inverse CDF)
  • r...(): draw samples
# binomial distribution
n = 5
p = 0.3
par(mfcol=c(2,2))  # in 2x2 grid
# mass function
x = 0:n
plot(x, dbinom(x, n, p), type="h", lwd=3, ylim=c(0,1))
# CDF
x = seq(0, n, 0.05)
plot(x, pbinom(x, n, p), type="l")
# Quantile function
q = seq(0, 1, 0.01)
plot(q, qbinom(q, n, p), type="l")
# draw samples
plot(rbinom(100, n, p))

Poisson Disbribution

\[X \sim \mbox{Poisson}(\lambda)\]

  • count of events that occur at average rate \(\lambda\)

\[f(x) = e^{-\lambda} \frac{\lambda^x}{x!}\]

lambda = 2
x = 0:10
f = dpois(x, lambda)
plot(x, f, type="h", lwd=3, ylim=c(0,1))

Multinomial distribution

\[X \sim \mbox{Multinomial}(n,p)\]

  • For \(k\) possible outcomes with probabiliies \(p=(p_1,..,p_k)\), number of each outcome after \(n\) draws \(X=(X_1,...,X_k)\)

\[f(x) = {n \choose x_1...x_k} p_1^{x_1}...p_k^{x_k}\] \({n \choose x_1...x_k} = \frac{n!}{x_1!...x_k!}\)

p = c(0.4, 0.5, 0.1)  # k=3
n = 10
dmn <-function(x1, x2){
  if(x1+x2 > n){
    return(0)  # cannot happen
  }else{
    x = c(x1, x2, n-x1-x2)  # sum up to n
    return(dmultinom(x, prob=p))
  }
}
x1 = x2 = 0:n
f = outer(x1, x2, Vectorize(dmn))
persp(x1, x2, f, theta=60)

Continuous Random Variables

Uniform Distribution

\[X \sim \mbox{Uniform}(a,b)\]

\[f(x) = \left\{\begin{array}{cl} \frac{1}{b-a} & \mbox{for } x \in [a,b]\\ 0 & \mbox{otherwise}\end{array}\right.\]

x = seq(-2, 2, 0.01)
f = dunif(x, -1, 1)
plot(x, f, type="l")

Exponential Distribution

\[X \sim \mbox{Exp}(\lambda)\]

  • Interval of events happening at rate \(\lambda\)

\[f(x) = \lambda e^{-\lambda x}\]

Defined for \(x \ge 0\) and \(\lambda > 0\).

lambda = .5
x = seq(-5, 10, 0.1)
f = dexp(x, lambda)
plot(x, f, type="l")

It is sometimes parameterized by \(\beta = \frac{1}{\lambda}\).

Gamma Distribution

\[X \sim \mbox{Gamma}(a,b)\]

Sum of \(a\) independent samples from Exp(\(b\))

\[f(x) = \frac{b^a}{\Gamma(a)} x^{a-1} e^{-bx}\] where the “Gamma function” is defined as \[\Gamma(a) =\int_0^\infty t^{a-1}e^{-t}dt\] For integer values of \(a\), \(\Gamma(a)=a!\).

a = 1  # same as exp
b = 1
x = seq(-2, 10, 0.01)
f = dgamma(x, a, b)
plot(x, f, type="l")
for (a in 2:6){  # see the change with a
  lines(x, dgamma(x, a, b), col=a)
}

For independent random variables \(X_i \sim \mbox{Gamma}(a_i,b)\), \[\sum_{i=1}^n X_i \sim \mbox{Gamma}(\sum_{i=1}^n a_i, b)\]

Normal (Gaussian) Distribution

\[X \sim \mathcal{N}(\mu,\sigma)\] mean \(\mu\) and standard deviation \(\sigma\)

\[f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]

x = seq(-5, 5, 0.1)
f = dnorm(x)  # default: mu=0, sigma=1
plot(x, f, type="l")

Expectation

Expectation (or mean, or first moment) of random variable \(X\):

\[E(X) = \int x dF(x)\]

\[E(X) = \sum_x x f(x)\]

\[E(X) = \int_{-\infty}^{\infty} x f(x) dx\]

Expectation is often denoted as \(E(X)=\mu_X=\mu\).

Properties of Expectations

  • Expectation of a function \(Y=r(X)\):

\[E(Y) = E(r(X)) = \int r(x)dF_X(x)\]

  • Linear sum:

\[E(\sum_i a_i X_i) =\sum_i a_i E(X_i)\]

  • Product of independent random variables:

\[E(\prod_i X_i) =\prod_i E(X_i)\]

Variance

Variance is a measure of the spread of a distribution.

\[V(X) = E((X-\mu)^2) = \int(x-\mu)^2 dF(x)\]

\[V(\sum_i a_i X_i) =\sum_i a_i^2 V(X_i)\]

Limit Theory

The Law of Large Numbers

The sample average \(\bar{X}_n=\frac{1}{n}\sum_{i=1}^n X_i\) converges in probability to the expectation \(\mu=EE(X_i)\).

  • \(X_n\) converges to \(X\) in probability

\[X_n\stackrel{P}{\rightarrow}X\]

For every \(\epsilon>0\), as \(n\rightarrow\infty\), \[P(|X_n-X|>\epsilon)\rightarrow 0\]

The Central Limit Theorem

For any distribution of \(X\) with mean \(\mu\) and variance \(\sigma^2\), the distribution of sample averages \(X_n\) follows a Normal distribution \(\mathcal{N}(\mu,\frac{\sigma^2}{n})\)

  • Deviation of the sample average from the true mean, scaled by \(n\) as \(\sqrt{n}(\bar{X}_n-\mu)\), converges in distribution to a Normal distribution \(\mathcal{N}(0,\sigma^2)\).

  • \(X_n\) converges to \(X\) in distribution

\[X_n\leadsto X\]

The cumulative distribution function \(F_n(X_n)\) converges to \(F(X)\) at every point \(x\) where \(F\) is continuous. \[\lim_{n\rightarrow\infty}F_n(x)=F(x)\]

Exercise

1. PDF and CDF

  1. For an exponential distribution, plot the PDF, CDF and inverse CDF (quantile function) by dexp, pexp and qexp.
  1. Derive the mathematical form of the CDF of the exponential distribution from its PDF \[f(x) = \lambda e^{-\lambda x}\]

and compare with the plot above.

  1. Derive the mathematical form of the inverse CDF (quantile function) of the exponential distribution

and compare with the plot above.

2. Relationships between distributions

  1. Make a sample by summing samples from Bernoulli distribution. Plot its histgram and chek if that fits with the Binomial distribution give by dbinom().
  1. By taking \(n\) large and scaling \(p\) by \(\frac{1}{n}\) in Binomial distribution, see if the distribution comes close to Poisson distribution
  1. Draw a sequence of samples from a Bernoulli distribution with small \(p\). Make a histogram of the time intervals between 1s and see what distribution it follows.
  1. Divide the above sequence into time bins of length \(T\) and count 1s in each bin. What distribution does it follow?
  1. By summing up multiple samples from exponential distribution, check whether that follows Gamma distribution.
  1. See in what case Gamma distribution become close to the normal distribution.

3. Expectation and Variance

*is optional for those with mathematical background

  1. Derive the mean and the variance of Bernoulli distribution.

  2. Derive the mean and the variance of Binomial distribution.

  3. Drive the mean and the variance of uniform distribution.

4*) Compute the mean of the exponential distribution from PDF: \[E(X) = \int_0^\infty x f(x) dx\]

5*) Compute the mean of the exponential distribution from CDF: \[E(X) = \int x dF(x) = \int_0^1 F^{-1}(q) dq\]

6*) Derive the variance of the exponential distribution.

  1. Derive the mean and the variance of Gamma distribution as a sum of the samples from exponential distribution.
