

Course 2, Unit 4  Regression and Correlation
Overview
In the Regression and Correlation unit, students learn to describe
the association between two variables by interpreting a scatterplot,
to interpret a correlation coefficient, and to understand the limitations
of correlation coefficients. They learn to know when it is appropriate
to make predictions from the regression line, and to understand the effects
of outliers and influential points on the correlation coefficient and
on the regression line. Students compute and interpret Pearson's correlation
and come to understand that a strong correlation does not imply
that one variable causes the other.
Key Ideas from Course 2, Unit 4

Least squares regression line: a line that fits a set of
data with slope and intercept chosen to minimize the sum of the squared
errors (SSE), also called the squared residuals. This line contains
the point with xcoordinate the mean of the x values
of the data (xbar) and ycoordinate the mean of the y values
of the data (ybar): (xbar, ybar). (See
pp. 280285.)

Residual: The difference between the observed value (the
value used in calculating the regression equation) and the predicted
value is called the residual. These differences are visualized as
the vertical gaps between given data points and the regression line. (See
p. 283.)

Pearson's correlation r: a single number that gives
a measure of the strength of the linear association between variables.
Perfectly linear data will have an r value of 1 (for
a positive association), or an r value of 1 (for
a negative association). A set of data that is quite linear,
and for which the y values increase as the x values
increase, is said to have a strong positive association. If the points
appear to fit a linear trend but the y values decrease
as x increases, the association is negative. A set of data
that is quite random, not at all linear, will have an r value
close to zero. (See pp. 258268 and 291298.)

Association and causation: Two variables may have a strong
correlation without there being any reason to suspect that when one
variable changes, it "causes" a change in the other variable. The
example used in the text to show this idea is data on ice cream consumption
and recorded crimes in 12 countries. (See pp. 300303.)
