Residuals are the differences between observed values and predicted values in a statistical model. They play a crucial role in assessing the accuracy and reliability of the model.
Definition
Residuals are defined mathematically as:
where:
- \( y_i \) represents the observed value
- \( \hat{y}_i \) represents the predicted value
Types of Residuals
Raw Residuals
Raw residuals are the simplest form, calculated as the difference between observed and predicted values.
Standardized Residuals
Standardized residuals are raw residuals divided by an estimate of their standard deviation.
Studentized Residuals
Studentized residuals further adjust standardized residuals by taking into account the leverage of individual observations.
Special Considerations
Homoscedasticity
Residuals should exhibit constant variance for the model to be considered reliable.
Independence
Residuals should be independent of each other to ensure the integrity of the model.
Examples
Consider a simple linear regression model where you predict the weight of individuals based on their height. The residual for each individual is the difference between the observed weight and the predicted weight based on the regression line.
Numerical Example
Given:
- Observed weight ( \( y_i \) ): 150 lbs
- Predicted weight ( \( \hat{y}_i \) ): 145 lbs
Residual ( \( e_i \) ) = 150 - 145 = 5 lbs
Historical Context
The concept of residuals has been integral to regression analysis since its inception, tracing back to the work of Sir Francis Galton in the 19th century.
Applicability
Residuals are widely used in:
- Economics: For evaluating the fit of economic models
- Finance: For assessing the accuracy of predictive models in financial forecasting
- Quality Control: For monitoring process performance and stability
Comparisons
Residuals vs. Errors
While often used interchangeably, residuals specifically refer to the discrepancies between observed and predicted values in a sample, while errors refer to the actual discrepancies in the population.
Related Terms
- Regression Analysis: A set of statistical processes for estimating relationships among variables, where residuals are key indicators of model fit.
- Leverage: A measure of the influence of an individual data point on the fit of the regression model.
FAQs
What is the purpose of analyzing residuals?
How do residuals help in diagnosing a model?
References
- Galton, F. (1886). “Regression towards mediocrity in hereditary stature”. The Journal of the Anthropological Institute of Great Britain and Ireland.
- Draper, N., & Smith, H. (1998). Applied Regression Analysis. Wiley.
Summary
Residuals are fundamental in evaluating and diagnosing statistical models. By understanding the differences between observed and predicted values, statisticians and analysts can improve model accuracy and reliability across various fields.