Central Limit Theorem: Foundation of Statistical Inference

August 31, 2024 3 min read Mathematics Statistics Central Limit Theorem Statistical Inference Normal Distribution Sampling Probability Theory

The Central Limit Theorem (CLT) states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the data's original distribution.

On this page

The Central Limit Theorem (CLT) is a fundamental principle in the field of statistics that states that the distribution of the mean of a sufficiently large number of independent, identically distributed random variables approaches a normal distribution, regardless of the original distribution of the variables.

Detailed Explanation§

Mathematical Definition§

Let $X_1, X_2, \ldots, X_n$ be a sequence of independent, identically distributed (i.i.d.) random variables with mean $\mu$ and variance $\sigma^2 > 0$ . The Central Limit Theorem states:

\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1)

Here, $\bar{X}n = \frac{1}{n} \sum{i=1}^{n} X_i$ represents the sample mean, and $\mathcal{N}(0, 1)$ denotes the standard normal distribution.

Types of Central Limit Theorems§

Classic Central Limit Theorem:
- Applies to a large number of independent and identically distributed variables.
Lindeberg-Feller Central Limit Theorem:
- Extends the CLT to allow for variables that are not identically distributed, but still independent.
Lyapunov Central Limit Theorem:
- Provides a condition based on moments for the application of the CLT.

Historical Context§

The Central Limit Theorem was first postulated by Abraham de Moivre in 1733 in the context of approximating the binomial distribution with a normal distribution. Later, Pierre-Simon Laplace generalized de Moivre’s finding. The theorem was rigorously refined by various mathematicians including Carl Friedrich Gauss and Andrey Kolmogorov.

Applicability§

The theorem serves as a backbone for various statistical methods and ensures that:

Inference and Estimation: Sample means are normally distributed, enabling confidence intervals and hypothesis testing.
Law of Large Numbers: Reinforces that larger samples yield more reliable reflections of the population parameters.
Practical Applications: Used in areas ranging from quality control to finance for approximating sums of random variables.

Examples§

Dice Rolling:
- If you roll a die 60 times, the sum of the results tends to form a normal distribution.
Survey Sampling:
- Averages of survey results from a large population (e.g., average income).

Special Considerations§

Sample Size: The approximation to normality improves with larger sample sizes (n > 30 is a common heuristic rule).
Independence: The variables must be independent.
Variance: Must be finite and non-zero.

Law of Large Numbers (LLN): Describes the result of performing the same experiment a large number of times.
Normal Distribution: A probability distribution that is symmetric about the mean.
Sampling Distribution: Distribution of a statistic (like the sample mean) computed from a sample of a population.

FAQs§

How does the CLT apply to non-normal distributions?

Regardless of the original distribution shape, the sample means will tend to follow a normal distribution as the sample size increases.

What is the significance of the CLT in hypothesis testing?

It allows for the use of normal distribution-based methods (like z-tests) even when the population distribution is not normal, provided the sample size is large enough.

Can the CLT be applied in real-world scenarios with smaller sample sizes?

While the CLT benefits from larger sample sizes, practical scenarios often use approximations even with moderately sized samples (e.g., n > 30).

References§

Jay L. Devore, Probability and Statistics for Engineering and the Sciences, 9th Edition.
A.M. Mood, F.A. Graybill, D.C. Boes, Introduction to the Theory of Statistics.
J. Rice, Mathematical Statistics and Data Analysis, 3rd Edition.

Summary§

The Central Limit Theorem is a cornerstone of statistical theory that assures us that with a sufficiently large sample size, the distribution of sample means approximates a normal distribution. This theorem underpins many statistical procedures and allows us to make inferences about a population even when the population distribution is unknown or not normal.