Information Criterion: Likelihood Function Based Statistic for Model Selection

August 31, 2024 3 min read Statistics Mathematics Information Criterion Model Selection AIC BIC Goodness-of-Fit

Information Criterion: A likelihood function based statistic used as a model selection criterion. Important examples include Akaike Information Criterion (AIC) and Bayes-Schwarz Information Criterion (BIC).

On this page

Information Criteria (IC) are powerful tools in the field of statistics used to evaluate and compare different models. They provide a balance between goodness-of-fit and complexity, assisting in the selection of the best model among a set of candidates.

Historical Context§

The concept of Information Criterion emerged in the late 20th century, with significant contributions from Hirotugu Akaike, who introduced the Akaike Information Criterion (AIC) in 1974, and Gideon Schwarz, who introduced the Bayes-Schwarz Information Criterion (BIC) in 1978.

Types of Information Criteria§

Akaike Information Criterion (AIC)§

AIC balances the model’s complexity with its goodness of fit. The formula for AIC is:

AIC = 2k - 2 \ln(L)

where

k

is the number of parameters and

L

is the maximized likelihood of the model.

Bayesian Information Criterion (BIC)§

BIC introduces a stricter penalty for complexity compared to AIC. Its formula is:

BIC = k \ln(n) - 2 \ln(L)

where

n

is the number of observations,

k

is the number of parameters, and

L

is the maximized likelihood.

Key Events and Developments§

1974: Introduction of AIC by Hirotugu Akaike.
1978: Introduction of BIC by Gideon Schwarz.
Recent Advances: Development of variants and extensions such as the Deviance Information Criterion (DIC) and the corrected AIC (AICc) for small sample sizes.

Importance and Applicability§

Model Selection§

Information criteria are crucial for selecting models that not only fit data well but also avoid overfitting. They are widely used in:

Econometrics: Selecting models for economic forecasting.
Machine Learning: Evaluating models’ performance and complexity.
Biostatistics: Determining the most appropriate statistical models for biological data.

Examples and Considerations§

Application in Linear Regression§

Consider multiple linear regression models fitted to the same dataset. AIC and BIC can help determine which model provides the best trade-off between fit and complexity.

Considerations§

Sample Size: BIC is generally preferred for larger samples due to its stronger penalty on the number of parameters.
Model Comparisons: Lower values of AIC or BIC indicate better models, but they should only be compared among models fitted to the same dataset.

Goodness-of-Fit: Measures how well the model’s predictions match observed data.
Overfitting: A model is too complex and fits noise in the data rather than the underlying trend.

Comparison of AIC and BIC§

Penalty Term: BIC imposes a higher penalty compared to AIC.
Model Preference: AIC may prefer more complex models, while BIC favors simpler, more parsimonious models.

Interesting Facts§

Widespread Use: ICs like AIC and BIC are not confined to statistics but are extensively utilized in various fields, including economics, medicine, and engineering.

Quotes and Proverbs§

Quote: “All models are wrong, but some are useful.” - George Box, highlighting the necessity of using criteria like AIC and BIC for practical model selection.

FAQs§

Q: Can AIC and BIC be used for non-linear models?§

A: Yes, AIC and BIC can be applied to a variety of models, including non-linear models.

Q: How do I choose between AIC and BIC?§

A: It depends on the context and sample size. BIC is typically preferred for larger samples due to its stronger penalization of model complexity.

References§

Akaike, H. (1974). “A new look at the statistical model identification.” IEEE Transactions on Automatic Control.
Schwarz, G. (1978). “Estimating the dimension of a model.” Annals of Statistics.

Summary§

Information Criteria like AIC and BIC are essential tools for model selection, balancing goodness of fit and complexity. Their appropriate use ensures that the chosen models generalize well to new data, avoiding overfitting and underfitting.

This comprehensive entry on Information Criteria ensures readers understand the significance, application, and subtleties of using these tools for model selection. By maintaining this knowledge, individuals across various disciplines can improve their model-building strategies and make informed decisions.