The median is a statistical measure that identifies the middle value in a data set where half the numbers are above and half are below. Unlike the mean, the median does not get affected by extreme values or outliers, making it a reliable measure of central tendency.
Types of Data Sets
Odd Number of Observations
When the data set contains an odd number of observations, the median is simply the middle number:
Even Number of Observations
For an even number of observations, the median is the average of the two middle numbers:
Special Considerations
Robustness to Outliers
The median is less sensitive to outliers compared to the mean. This makes it particularly useful in skewed distributions or when there are anomalous values.
Applicability
The median is widely used in various fields, such as economics, finance, and medicine, to provide a measure of central tendency without the influence of long tails or outliers.
Examples
Example with Odd Number of Observations
Consider the data set: \([3, 1, 4, 1, 5, 9, 2]\)
- Ordered Data Set: \([1, 1, 2, 3, 4, 5, 9]\)
- Median:
$$ 3 $$
Example with Even Number of Observations
Consider the data set: \([3, 1, 4, 1, 5, 9, 2, 6]\)
- Ordered Data Set: \([1, 1, 2, 3, 4, 5, 6, 9]\)
- Median:
$$ \frac{3 + 4}{2} = 3.5 $$
Historical Context
The concept of the median as a measure of central tendency dates back to the 13th century but was formalized in the mathematical sense in the 19th century by Francis Galton.
Comparisons with Related Terms
Mean
The mean is the arithmetic average of a data set, which can be skewed by extreme values:
Mode
The mode is the most frequently occurring value in a data set, useful in categorical data analysis.
FAQs
Why is the median preferred over the mean in certain situations?
Can you find the median in categorical data?
References
- Galton, Francis. “Typical Laws of Heredity.” Nature (1875).
- Hald, A. “Statistical Theory of Sampling.” International Statistical Review (1981).
Summary
The median is a vital statistical tool for representing the central location of a data set. Its robustness against outliers makes it an invaluable measure in descriptive statistics. Understanding both its calculation and application is essential for accurate data analysis.