Point and Interval Estimation

Statistical Inference

One of the goals of statistical inference to draw conclusions about a population based on a random sample from the population.

Specifically, we seek to estimate an unknown parameter θ\theta, say using a single quantity called the point estimate θ\overline{\theta}.

The point estimate is obtained using a statistic, which is simply a function of a random sample. The probability distribution of the statistic is its sampling distribution.

Examples of a statistic include:

  • sample mean and sample median
  • sample variance and sample standard distribution
  • sample quantiles

Estimator Variance and Standard Error

The standard error of a statistic is the standard deviation of its sampling distribution

For instance, if observations X1,...,XnX_1, ..., X_n come from a population with unknown mean μ\mu and known variance σ2\sigma^2, then Var(X)=σ2/n\mathrm{Var}(\overline{X}) = \sigma^2/n and the standard error of X\overline{X} is

σX=σn\sigma_{\overline{X}} = \frac{\sigma}{\sqrt{n}}

if the variance of the original population is unknown, then it is estimated by the sample variance S2S^2 and the estimated standard error X\overline{X}:

σX=Sn\sigma_{\overline{X}} = \frac{S}{\sqrt{n}}
S2=1n1i1n(XiX)2S^2 = \frac{1}{n-1}\sum_{i-1}^{n}(X_i-\overline{X})^2

Confidence Interval

For mean When SD is known

Consider a sample x1,...,xn{x_1, ..., x_n} from a normal population with known variance σ2\sigma^2 and unknown mean μ\mu. The sample mean is a point estimate of μ\mu.

x=x1+...+xnn\overline{x} = \frac{x_1 + ... + x_n}{n}

The 68-96-99.7 Rule

Image

68% of the data is within 1 standard deviation, 95% is within 2 standard deviation, 99.7% is within 3 standard deviations.

The symmetric confidence interval for μ\mu is

Xkσn<μ<X+kσnX±kσn\overline{X} - k\frac{\sigma}{\sqrt{n}} < \mu < \overline{X} + k\frac{\sigma}{\sqrt{n}} \Rightarrow \overline{X} \pm k\frac{\sigma}{\sqrt{n}}

For mean when SD is known (reprise)

Another approach to C.I. building is to specify the proportion of the area under φφ(z) of interest, and then to determine the critical values (the endpoints) of the interval.

For a symmetric 95% confidence interval, we need to find z>0z^* > 0 such that P(z<Z<z)0.95\mathrm{P}(-z^* < Z < z^*) ≈ 0.95.

But the LHS can be re-written as

P(z<Z<z)=Φ(z)Φ(z)=Φ(z)(1Φ(z))=2Φ(z)1\mathrm{P}(-z^* < Z < z^*) = Φ(z^*) - Φ(-z^*) = Φ(z^*) - (1 - Φ(z^*)) = 2Φ(z^*) - 1

The confidence level 1 - α is usually expressed in terms of a small α, e.g. α = 0.05 ⇒ 1 - α = 0.95 confidence level.

For α = 0.01, 0.02, . . . , 0.98, 0.99, the corresponding zαz_α are called the percentiles of the standard normal distribution. In general,

P(Z>zα)=αzα(Z > z_α) = α ⇒ z_α is the 100(1α)100(1 - α) percentile

The symmetric 100(1 - α)% confidence interval can generally be written as:

X±za/2σn\overline{X} \pm z_{a/2}\frac{\sigma}{\sqrt{n}}

For a given confidence level α, shorter confidence intervals are better in relation to estimating the mean:

  • estimates become better when the sample size n increases;
  • estimates become better when σ decreases.

Choice of Sample Size

The error we commit by estimating \mu via the sample mean X is smaller than zα/2σnz_{α/2}\frac{\sigma}{\sqrt{n}}, with probability 100(1 - α)%.

If we want to control the error, the only thing we can really do is control the sample size:

E>zα/2σnn>(zα/2σE)2E > z_{α/2}σ\sqrt{n} ⇒ n > (\frac{z_{α/2}σ}{E})^2

If σ is known, we know from the CLT that Xμσ/nN(0,1)\frac{X-\mu}{σ/\sqrt{n}} \sim N (0, 1).

If σ is unknown, it can be shown that XμS/n\frac{X-\mu}{S/\sqrt{n}} follows approximately t(n1)t(n - 1), the Student T-distribution with n1n - 1 degrees of freedom.

Consequently, for a confidence level α,

P(tα/2(n1)<XμS/n<tα/2(n1))1αP(-t_{α/2}(n - 1) < \frac{X-\mu}{S/\sqrt{n}}< t_{α/2}(n - 1))≈ 1 - α,

Equality is reached if the underlying population is normal.

100(1α)100(1 - α)% C.I. for μ:X±tα/2(n1)S/n\mu : \overline{X} ± t_{α/2}(n - 1) S/\sqrt{n}.

Confidence Interval for a Proportion

If XB(n,p)X ∼ B(n, p) (number of successes in nn trials), then the point estimator for pp is P=X/nP = X/n.

Recall that E[X]=np\mathrm{E}[X] = np and Var[X]=np(1p)\mathrm{Var}[X] = np(1 - p).

We can standardize any random variable: Z=Xμσ=nPnpnp(1p)=Ppp(1p)/nZ = \frac{X - \mu}{σ} = \frac{nP - np}{\sqrt{np(1 - p)}} = \frac{P - p}{\sqrt{p(1-p)/n}} is approximately N(0,1)N (0, 1).

To calculate the confidence interval for a proportion:

p±zα/2p(1p)n\overline{p} \pm z_{\alpha/2}\sqrt{\frac{\overline{p}(1-\overline{p})}{n}}

Summary

Sample: X1,...,Xn{X_1, . . . , X_n}. Objective: predict \mu with confidence level α.

If population is normal with known variance σ2, the exact 100(1-α)% C.I. is X±zα/2σnX ± z_{α/2}\frac{σ}{\sqrt{n}}.

If population is non-normal with known variance σ2σ^2 and nn is ‘big’, the approximate 100(1 - α)% C.I. is X±zα/2σnX ± z_{α/2}\frac{σ}{\sqrt{n}}.

If population is normal with unknown variance, the exact 100(1α)100(1 - α)% C.I. is X±tα/2(n1)SnX ± t_{α/2}(n - 1)\frac{S}{\sqrt{n}}.

If population has unknown variance and n is ‘big’, the approximate 100(1α)100(1 - α)% C.I. is X±zα/2SnX ± z_{α/2}\frac{S}{\sqrt{n}}.