Chi-Square (χ²) Statistic: Definition, Examples, and Applications

August 24, 2024 3 min read Statistics Mathematics Chi-Square Statistics Hypothesis Testing Data Analysis Goodness-of-Fit

An in-depth look at the chi-square (χ²) statistic, including its definition, practical examples, application methods, and when to use this statistical test.

On this page

The chi-square (χ²) statistic is a statistical measure used to evaluate the statistical significance of observed data by comparing it with expected data under a specific hypothesis. It is commonly used in hypothesis testing, particularly in tests of independence and goodness-of-fit.

Understanding the Chi-Square (χ²) Test§

The chi-square test is a non-parametric test that assesses whether observed frequencies in data differ from expected frequencies derived from a specific hypothesis. It is especially useful in categorical data analysis.

Types of Chi-Square Tests§

Goodness-of-Fit Test: Used to determine if a sample data matches a population with a specific distribution.
Test for Independence: Used to assess if two categorical variables are independent of each other.

Mathematical Formula§

The formula for the chi-square statistic is:

\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}

where:

$O_i$ = observed frequency
$E_i$ = expected frequency

Application and Examples§

Example 1: Goodness-of-Fit Test§

Suppose we want to check if a six-sided die is fair. We roll the die 60 times, get the following frequencies: [8, 12, 10, 10, 14, 6], and want to compare this to the expected frequencies (10 for each side).

Example 2: Test for Independence§

Imagine we have survey data from 100 people about their preferred mode of transportation (Car, Bike, Public Transport) and their city of residence (City A, City B). We use the chi-square test to assess if preferences are independent of the city.

When and How to Use the Chi-Square Test§

Assumptions and Conditions§

The data must be categorical.
The expected frequency for each category should be at least 5.
Observations should be independent.

Steps in Performing Chi-Square Test§

Formulate the hypotheses.
Calculate the expected frequencies.
Compute the chi-square statistic using the observed and expected frequencies.
Compare the chi-square statistic to the critical value from the chi-square distribution table.
Draw a conclusion based on the comparison.

Historical Context§

The chi-square test was developed by Karl Pearson in the 1900s and has since become a cornerstone in statistical hypothesis testing.

FAQs§

What is the critical value in the chi-square test?

The critical value is determined by the degrees of freedom and the chosen level of significance (α), and it is found in chi-square distribution tables.

Can chi-square tests be used for continuous data?

No, chi-square tests are specifically designed for categorical data.

Degrees of Freedom (df): The number of independent values or quantities which can be assigned to a statistical distribution.
Null Hypothesis (H₀): A hypothesis that states there is no significant difference or effect.
Alternative Hypothesis (H₁): A hypothesis that states there is a significant difference or effect.

References§

Pearson, K. (1900). On the Criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50, 157-175.
Agresti, A. (2007). An Introduction to Categorical Data Analysis. Wiley-Interscience.
McHugh, M. L. (2013). The chi-square test of independence. Biochemia Medica, 23(2), 143–149.

Summary§

The chi-square (χ²) statistic is an essential tool in statistical analysis, particularly in testing hypotheses about categorical data. It provides a method to determine if deviations from expectations are due to random chance or indicate a significant effect. By following its steps and understanding its application, researchers can draw meaningful conclusions from their data.