next up previous index
Next: RMS Error Up: Regression Previous: Regression   Index

Regression Effect and Regression Fallacy


\begin{displaymath}\begin{array}{lcrcc}
\mbox{observed} & = & \mbox{true }& + & ...
...
140 & = & 135 & + & 5\\
140 & = & 145 & - & 5\\
\end{array}\end{displaymath}

Why aren't the latter both equally likely, if the chance error is half good and half bad?

Because if we consider that the subjects IQ is Normally distributed with 100 as its average and 15 as its standard deviation an IQ of 135 will be more likely than an IQ of 145.

The REGRESSION FALLACY consists in thinking that the regression effect must be due to something important, not just the spread around the line. "On test retest situations the reason the retest scores tend to regress towards the mean is due to chance error. If someone scores very high on test 1 it is assumed that it was partly due to luck. On the retest they may not be so lucky so their scores drop on average. The converse is true on very low scores on the first test."

So chance error could be good luck, bad luck, faulty scoring, faulty measuring, mistakes in writing results, mistakes in reading results, anything.

When the explanatory variable $x$ is about 1 SD above avergae, the response variable $y$ will be about $r$ standard deviations above the average of the response variable.

In the height and weight of adult men example:
Weight is the response and its average is 162 lb and the $SD_y$ is 30 lb.
Height is the explanatory variable x and its average is 70 in, and $SD_x$ is 3 in.

We also need to know the correlation coefficient between the two variables.
(So we need 5 numbers to work with)
Suppose we are told $r=.47$.

Without any prior information if we had to guess someone' weight, the best we could do would be to use the overall averge $\bar{y}=162 lb$.

Suppose we are told that a certain man is $73 in $ tall, this will help us see that he is over average height, wo will more likely be over average weight.

By how much?

\begin{displaymath}73 = 70 + 3= \bar{x} + 1 SD_x\end{displaymath}

The best y prediction will be $(r\times 1)$ $SD_y$ over average:

\begin{displaymath}162+ (1 \times r) SD_y= 162+ (.47)\times 30=162+14.1= 176.1 lbs\end{displaymath}


next up previous index
Next: RMS Error Up: Regression Previous: Regression   Index
Susan Holmes
2000-11-28