Idea Transcript
Experimental Design and Hypothesis Testing Rick Balkin, Ph.D.
1
Let’s review hypothesis testing and experimental design
3 types of hypothesis testing in experimental research: z-test t-test F-test
Balkin, R. (2008)
2
z-test
Tests for statistically significant differences between a sample group and a population
Balkin, R. (2008)
3
t-test
Tests for statistically significant differences between two sample groups 2 types: Independent: tests for differences between groups and used in between group designs Dependent: tests for changes over time and used in within group designs
Balkin, R. (2008)
4
F-test
Tests for statistically significant differences among three or more groups Also known as an ANOVA—Analysis of Variance When two or more dependent variables exist, it is known as MANOVA—Multivariate Analysis of Variance
Balkin, R. (2008)
5
Categories of Experimental Design
Pre-experimental: no random assignment of manipulation of IV Quasi-experimental: IV manipulation but no random assignment
Explanatory non-experimental: IV is an assigned trait rather than a manipulated variable Sometimes referred to as expost facto
True experimental: random assignment and IV manipulation Balkin, R. (2008)
6
Models for experimental design
Models for experimental design may be as classified as two types: between groups, in which outcomes are compared between two or more groups or within groups, in which a single group is measured across time using two or more different treatments.
Balkin, R. (2008)
7
Between groups design
In a between groups design, the effect of the independent variable on the dependent variable is based upon the examination of group differences.
Balkin, R. (2008)
8
Between groups design
Four different types of between group designs Post-test
only Pretest-posttest control group Solomon four group Factorial designs
Balkin, R. (2008)
9
Consider the following scenario:
A counselor wishes to know whether a peer mentoring program would be effective in assisting students who are at-risk for academic failure. The counselor utilizes the Youth Outcome Questionnaire (Y-OQ-SR-2.0) as a measure of program effectiveness. The Y-OQSR-2.0 is a youth survey designed to be repeatedly administered to adolescents to assess their ongoing progress in counseling. It has 30items on a 5 point Likert scale. Internal consistency was assessed at .91, and is noted to have adequate validity (Wells, Burlingame, Rose, 1999). Balkin, R. (2008)
10
Post-test only
Participants are randomly assigned to a treatment group and a comparison/control group. The treatment group receives some type of manipulation or intervention while a control group would receive none. A quantitative measure is then used to determine the effect of the intervention. In this case, the quantitative measure is the dependent variable and the presence or absence of the treatment is the independent variable. Balkin, R. (2008)
11
Post-test only--Example
Using our example for a post-test only design, the counselor would randomly assign students identified as at-risk for academic failure to receive peer mentoring over the next three months or to receive peer mentoring three months later after the initial group has received the treatment The effect of peer mentoring is being evaluated for change in well-being, as evidenced by the score on the Y-OQ-SR-2.0, the dependent variable. Balkin, R. (2008)
12
Post-test only--Model The treatment group receives peer mentoring; the control group initially does not, and both groups are evaluated by observing their scores on the Y-OQ-SR-2.0 Posttest Only Peer Group Mentoring Posttest Yes Yes random Treatment
assign.
Control
Balkin, R. (2008)
No
Yes 13
Testing for differences in a Posttest only Model In this model we have two groups—an experimental group and control group. The DV is the score on the Y-OQ-SR-2.0 So, we can conduct an independent t-test to determine if statistically significant differences exist in the scores on the Y-OQ-SR-2.0 between the treatment and control groups
Balkin, R. (2008)
14
Pretest-posttest control group
Participants are randomly assigned to a treatment group and a comparison/control group. A quantitative measure is then used to determine the effect of the intervention. Both the treatment group and control group receive a pretest. The treatment group receives some type of manipulation or intervention while a control group would receive none. After the intervention, a posttest is administered. In this case, the groups are still compared based on posttest scores, but the researcher can be certain of the degree of Balkin, equal groups at the onset of the study. 15 R. (2008)
Pretest-posttest control group-Example
Using our example for a pretest posttest control group design, the counselor would randomly assign students identified as atrisk for academic failure to receive peer mentoring over the next three months or to receive peer mentoring three months later after the initial group has received the treatment. Balkin, R. (2008)
16
Pretest-posttest control group-Model Each group is administered the Y-OQ-SR-2.0 as a pretest in order to ensure equality of groups. The treatment group receives peer mentoring; the control group initially does not, and both groups are evaluated by observing their scores on the Y-OQ-SR-2.0 posttest Pretest Posttest Control Group Peer Group Pretest Mentoring Posttest
Treatment
random assign. Control Balkin, R. (2008)
Yes
Yes
Yes
Yes
No
Yes 17
Testing for differences in a Pretestposttest control group--Model
This model may be tested in two ways: First, if the purpose of the pretest was to ensure group equality, then this model is tested the same way as the previous model. The DV is the score on the Y-OQ-SR-2.0, and we can conduct an independent t-test to determine if statistically significant differences exist in the scores on the Y-OQ-SR-2.0 between the treatment and control groups Balkin, R. (2008)
18
Testing for differences in a Pretestposttest control group--Model
Second, the researcher has the option of also looking at changes over time because a pretest was administered. So two dependent t-tests could be conducted, one for the treatment group and one for the control group, to determine if the change in scores over time was statistically significant. Balkin, R. (2008)
19
Solomon four group
Participants are randomly assigned to one of four groups: (a) a treatment group that receives both a pretest and a posttest (b) a treatment group that receives a posttest only (c) a control group that receives both a pretest and a posttest (d) a control group that receives a posttest only. Thus, only one treatment group and one control group are administered a pretest. Balkin, R. (2008)
20
Solomon four group
Both treatment groups receive some type of manipulation or intervention while both control groups would receive none. After the intervention, a posttest is administered to all four groups. In this case, the groups are still compared based on posttest scores, but the researcher can be certain of the equivalence of the groups at the onset of the study and assess the impact of the pretest to ascertain whether or not a testing effect exists. Balkin, R. (2008)
21
Solomon four group--Example
Using our example for a Solomon four group, the counselor would randomly assign students identified as at-risk for academic failure to receive peer mentoring over the next three months or to receive peer mentoring three months later after the initial group has received the treatment. From the treatment group, those students are randomly assigned to be administered a pretest or no pretest. From the control group, those students are randomly assigned to be administered a pretest or no pretest. One treatment group and one control are administered the Y-OQ-SR-2.0 as a pretest in order to ensure equality of groups. Balkin, R. (2008)
22
Solomon four group--Example
The other treatment and control groups do not receive a pretest. The treatment groups receive peer mentoring; the control groups initially do not. The posttest scores from the two treatment groups and the posttest scores from the two control groups can be compared in order to ascertain whether the pretest contributed to differences in the scores. Balkin, R. (2008)
23
Solomon four group--Model
If the posttest scores for both treatment groups and the posttest scores for both control groups are similar, then the administration of the pretest had no effect. Assuming there is no testing effect, the posttest scores between the treatment groups and control groups can be compared to determine the effect of the intervention
Solomon Four Group
Random assign. Balkin, R. (2008)
Balkin, R. (2008)
Group
Pretest
Peer Ment.
Posttest
Treatment
Yes
Yes
Yes
Control
Yes
No
Yes
Treatment
No
Yes
Yes
Control
No
No
Yes
24
Testing for differences in a Solomon four group--Model
This model may be tested in the following ways: First, remember that an advantage of this design is that any changes due to being administered a pretest can be ruled out. Therefore if there is no testing effect, then there should be no significant differences in the DV scores for the treatment groups, and there should no significant differences in the DV scores for the control groups To determine this, two independent t-tests are conducted: A test to compare Y-OQ-SR-2.0 scores for the treatment group that received a pretest and the treatment group that did not receive a pretest A test to compare Y-OQ-SR-2.0 scores for the control group that received a pretest and the control group that did not receive a pretest
Balkin, R. (2008)
25
Testing for differences in a Solomon four group--Model
Second, differences between treatment and control groups need to be determined If there is no testing effect, as evidenced by results in the previous steps, then an independent t-test can be conducted between the treatment and control groups
Balkin, R. (2008)
26
Testing for differences in a Solomon four group--Model
Third, changes over time could be assessed by looking at the treatment and control groups that received a pretest. Two dependent t-tests could be conducted to determine changes over time--one for the treatment group and one for the control group, to determine if the change in scores over time was statistically significant.
Balkin, R. (2008)
27
What if I have more than two groups?
In any of these designs, it is possible to have more than two groups. In our example, we have compared one group who receives a treatment, peer mentoring, and another group who receives no intervention. What if we added a third group, such as participants who participate in a psychoeducational group?
Balkin, R. (2008)
28
What if I have more than two groups?
Then we could conduct the analyses using a F-test. This statistic is design to identify statistically significant differences in three or more groups. So, we could simply replace the t-tests in the previous examples with F-tests. Balkin, R. (2008)
29
Factorial designs
The purpose of a factorial design is to study change in the dependent variable across two or more independent variables. For example, instead of simply examining the effect of a peer mentoring program, the counselor wishes to know whether sex plays a role.
Is the degree of change in well-being different across a peer mentoring intervention for males and females? Balkin, R. (2008)
30
Factorial designs
In other words, two analyses will be conducted: (a) differences in Y-OQ-SR-2.0 scores across the treatment and control groups (b) differences in Y-OQ-SR-2.0 scores across males and females.
Balkin, R. (2008)
31
Testing for differences in a factorial design
This is generally done with a F-test A F-test can be used to analyze differences between males and females and between treatment and control groups in a single analysis! Designs like this are complex, but also widely used in social science research Balkin, R. (2008)
32
Within Group Design
A within group design is utilized when a change in the dependent variable in a group is measured across time. In a within group design, the pretest also serves as a baseline in which to compare subsequent tests. Balkin, R. (2008)
33
Within Group Design--Example
For example, the Y-OQ-SR-2.0 is administered at the onset of the peer mentoring study to get a baseline measure. Then the Y-OQ-SR-2.0 is administered four additional times on a monthly basis in order to compare progress to the initial administration.
Within Group Design
Observation1 Balkin, R. (2008)
Observation2
Observation3
Observation4 34
Testing for differences in a withingroup design
You have actually already seen this design at work in the previous examples when changes over time were tested. In the case in which changes over time are evaluated in a pretest and posttest, a dependent t-test is conducted When there are more than two repeated observations, then a F-test is conducted on each of the DV scores. Balkin, R. (2008)
35