In the simple experiment, you randomly assign participants to two groups. Nothing could make your two groups identical, but random assignment makes sure that the groups do not differ in any systematic way. The differences between the groups at the end of the experiment should be due to only two things:
1. The one systematic difference between the groups (the treatment [independent variable]) and
2. chance (random error)
We can use statistics to factor out the effects of chance because chance conforms to certain rules, most of the time. By taking advantage of what statisticians have learned about those rules, we can determine whether the difference between the groups is too large to be due to chance alone.
The logic statistics uses is not unlike what you would use to decide whether a coin was biased. If it came up "heads" 10 out of 10 times, you would say the coin was biased. Similarly, if it came up "heads" 600/1000 times, you would say it was biased. However, you would not say it was biased if it came up "heads" 6 out of 10 times. In other words, to determine whether the treatment had an effect, statistics looks at: how big the difference was between the groups (the bigger, the less likely it is to be due to chance alone) and how many participants were in each group (if there are only a few participants, then even a fairly large difference between the groups is unlikely to be due to chance).
Unfortunately, the decision from a statistical test may be wrong. One problem is a Type 1 error can be made--deciding that a difference between our groups is due to the treatment, when the difference is due entirely to chance. Fortunately, we can decide what risk of a Type 1 error we are going to take. If we choose a .05 significance level, then there is only a 5% risk of making a Type 1 error. If we choose a .01 significance level, there is only a 1% risk of making a Type 1 error.
You might wonder what is really happening when we are choosing a smaller risk of making a Type 1 error. What we're doing is requiring a bigger difference between our groups before we declare it to be "statistically significant. " Thus, a difference between the experimental group and control group that would have been big enough to be statistically significant at the .05 level might not be big enough to be significant at the .01 level.
You might wonder why we don't just set our risk of making a Type 1 error at a real low level, such as .001. The problem is that by setting our risk of a Type 1 (false alarm) error real low, we increase our risk of a Type 2 error--failing to find a real treatment effect. Usually, your risk of making a Type 2 error is much greater than the risk of making a Type 1 error. You can reduce your risk of making a Type 2 error by:
You have seen that we have to use statistics to determine whether the difference between our groups is too big to be due to chance. More importantly, you have seen that statistics affects how we design and conduct our study. Because of statistics, we should
But how do inferential statistics actually work? You know that your experimental and control groups may differ for two reasons:
To be statistically significant, the actual difference between the means must be bigger than the standard error of the difference. How much bigger? Usually, the difference between the means must be about twice as big as the standard error of the difference. To get the exact value, you need to know the degrees of freedom (number of participants -2) and then look at the t table on page 538 of your text.