A scatter plot (also known as a scattergram, scatter graph, or scatter chart) is a type of data visualization that displays values for typically two variables for a set of data. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.
Definition
A scatter plot is a graphical representation used to observe and show the relationship between two quantitative variables. The pattern of their intersection can indicate various types of regressions and correlations.
Characteristics
-
Axes:
- X-axis: Represents the independent variable.
- Y-axis: Represents the dependent variable.
-
Points: Each point represents an individual data observation.
-
Trend Lines: Optional lines that may be drawn to represent trends within the data.
Understanding Scatter Plots
Interpreting Relationships
-
Positive Correlation: When the values of the two variables move in the same direction. As one variable increases, the other also increases.
-
Negative Correlation: When the values of the two variables move in opposite directions. As one variable increases, the other decreases.
-
No Correlation: When there is no discernible pattern in the data points.
Special Considerations
- Outliers: Scatter plots can help identify outliers that do not fit the overall pattern of the data.
- Cluster Identification: Multiple clusters of points can indicate subgroups within the data.
Examples
Positive Correlation Example
If we plot data for heights (X-axis) and weights (Y-axis) of a group of individuals, typically taller individuals will weigh more, showcasing a positive correlation.
Negative Correlation Example
If we plot data for hours studied per week by students (X-axis) against the number of errors made in exams (Y-axis), generally, more hours of study result in fewer errors, showing a negative correlation.
Historical Context
The use of scatter plots in data analysis dates back to the early 20th century. They were prominently used by Sir Francis Galton in his study of the relationships between different human characteristics.
Applicability
Scatter plots are widely used in:
- Economics: To demonstrate the relationship between supply and demand.
- Finance: To observe the relationship between different financial indicators.
- Real Estate: To plot property prices against various attributes like square footage.
- Science and Technology: For a multitude of experimental and observational data.
Related Terms
- Correlation Coefficient: A measure that determines the degree to which two variables’ movements are associated.
- Regression Analysis: A statistical method for estimating the relationships among variables.
- Dot Plot: Another form of data visualization that displays individual data points.
FAQs
1. How do you create a scatter plot?
- Select two variables, plot individual data points on the graph where the X-axis and Y-axis represent the two variables.
2. What is the difference between a scatter plot and a line graph?
- A scatter plot displays individual data points and is used to show relationships, while a line graph connects data points with a line, often to show trends over time.
3. Can scatter plots include more than two variables?
- Typically, scatter plots represent two variables. However, additional dimensions can be visually represented using color, size, or shapes of points.
References
- Galton, Francis. “Regression towards mediocrity in hereditary stature.” (1886)
- Cleveland, William S. “The Elements of Graphing Data.” (1985)
- Tufte, Edward R. “The Visual Display of Quantitative Information.” (1983)
Summary
A scatter plot is an essential tool in statistics and data visualization that helps in understanding the relationship between two variables by representing them as a collection of points. It is useful in identifying correlations, trends, and outliers in a dataset, making it invaluable across various fields such as economics, finance, science, and real estate.