Contents - Iowa Testing Programs - University of Iowa [PDF]

Table 7.1. Standard Errors of Measurement in the Standard Score Metric for ITBS by Level and Gender . ...... 19,917. 11.

0 downloads 6 Views 1MB Size

Recommend Stories


by BA, University of Iowa
What you seek is seeking you. Rumi

The University of Iowa W9
We can't help everyone, but everyone can help someone. Ronald Reagan

Iowa
If you are irritated by every rub, how will your mirror be polished? Rumi

University Billing Office - The University of Iowa
Ask yourself: How confident are you in your abilities to make decisions for yourself? Next

Review of Vendor Rebates Paid to Hospitals, University of Iowa Hospitals & Clincis, Iowa City, Iowa
We must be willing to let go of the life we have planned, so as to have the life that is waiting for

Assistant Professor The University of Iowa
At the end of your life, you will never regret not having passed one more test, not winning one more

RhythmsMusic at the University of Northern Iowa
Never let your sense of morals prevent you from doing what is right. Isaac Asimov

(Iowa City, Iowa), 1961-08-09
What we think, what we become. Buddha

2016 IOWA
Sorrow prepares you for joy. It violently sweeps everything out of your house, so that new joy can find

Iowa Department
Do not seek to follow in the footsteps of the wise. Seek what they sought. Matsuo Basho

Idea Transcript


961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page i

Contents Part 1 Nature and Purposes of The Iowa Tests® . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Iowa Tests. . . . . . . . . . . . . . . . . . . . . . . 1 Major Purposes of the ITBS Batteries . . . . . 1 Validity of the Tests . . . . . . . . . . . . . . . . . . . 1 Description of the ITBS Batteries . . . . . . . . 2 Names of the Tests . . . . . . . . . . . . . . . . . 2 Description of the Test Batteries . . . . . . . 2 Nature of the Batteries . . . . . . . . . . . . . . 2 Nature of the Levels . . . . . . . . . . . . . . . . 2 Grade Levels and Test Levels. . . . . . . . . 3 Test Lengths and Times . . . . . . . . . . . . . 3 Nature of the Questions . . . . . . . . . . . . . 3 Mode of Responding . . . . . . . . . . . . . . . . 3 Directions. . . . . . . . . . . . . . . . . . . . . . . . . 3 Other Iowa Tests . . . . . . . . . . . . . . . . . . . . . 6 Iowa Writing Assessment . . . . . . . . . . . . 6 Listening Assessment for ITBS . . . . . . . . 6 Constructed-Response Supplement to The Iowa Tests . . . . . . . . . . . . . . . . . 6 Other Manuals . . . . . . . . . . . . . . . . . . . . . . . 6 Part 2 The National Standardization Program . . 7 Planning the National Standardization Program. . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Procedures for Selecting the Standardization Sample . . . . . . . . . . . . . . 7 Public School Sample . . . . . . . . . . . . . . . 7 Catholic School Sample . . . . . . . . . . . . . 8 Private Non-Catholic School Sample . . . 8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 8 Design for Collecting the Standardization Data . . . . . . . . . . . . . . . . . 8 Weighting the Samples . . . . . . . . . . . . . . . . 8 Racial-Ethnic Representation . . . . . . . . . . 12 Participation of Students in Special Groups . . . . . . . . . . . . . . . . . . . . 12 Empirical Norms Dates . . . . . . . . . . . . . . . 14 School Systems Included in the 2000 Standardization Samples. . . . . . . . . . . . . 16 New England and Mideast . . . . . . . . . . 16 Southeast . . . . . . . . . . . . . . . . . . . . . . . 17 Great Lakes and Plains. . . . . . . . . . . . . 19 West and Far West . . . . . . . . . . . . . . . . 22

Part 3 Validity in the Development and Use of The Iowa Tests . . . . . . . . . . . . . . . 25 Validity in Test Use . . . . . . . . . . . . . . . . . . 25 Criteria for Evaluating Achievement Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Validity of the Tests . . . . . . . . . . . . . . . . . . 25 Statistical Data to Be Considered . . . . . . . 26 Validity of the Tests in the Local School . . 26 Domain Specifications . . . . . . . . . . . . . . . . 27 Content Standards and Development Procedures . . . . . . . . . . . . . . . . . . . . . . . 28 Curriculum Review . . . . . . . . . . . . . . . . 28 Preliminary Item Tryout . . . . . . . . . . . . . 28 National Item Tryout . . . . . . . . . . . . . . . 28 Fairness Review . . . . . . . . . . . . . . . . . . 30 Development of Individual Tests . . . . . . 30 Critical Thinking Skills . . . . . . . . . . . . . . 43 Other Validity Considerations . . . . . . . . . . 44 Norms Versus Standards . . . . . . . . . . . 44 Using Tests to Improve Instruction . . . . 44 Using Tests to Evaluate Instruction . . . . 45 Local Modification of Test Content . . . . 45 Predictive Validity . . . . . . . . . . . . . . . . . 46 Readability. . . . . . . . . . . . . . . . . . . . . . . 48 Part 4 Scaling, Norming, and Equating The Iowa Tests . . . . . . . . . . . . . . . . . . . . . 51 Frames of Reference for Reporting School Achievement . . . . . . . . . . . . . . . . 51 Comparability of Developmental Scores Across Levels: The Growth Model. . . . . . 51 The National Standard Score Scale . . . . . 52 Development and Monitoring of National Norms for the ITBS . . . . . . . . . . . . . . . . . 55 Trends in Achievement Test Performance . . . . . . . . . . . . . . . . . . . . . . 55 Norms for Special School Populations . . . 60 Equivalence of Forms . . . . . . . . . . . . . . . . 60 Relationships of Forms A and B to Previous Forms . . . . . . . . . . . . . . . . . . . . 61

i

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page ii

Part 5 Reliability of The Iowa Tests . . . . . . . . . . 63 Methods of Determining, Reporting, and Using Reliability Data . . . . . . . . . . . . 63 Internal-Consistency Reliability Analysis . . 64 Equivalent-Forms Reliability Analysis . . . . 74 Sources of Error in Measurement . . . . . . . 75 Standard Errors of Measurement for Selected Score Levels. . . . . . . . . . . . . . . 77 Effects of Individualized Testing on Reliability . . . . . . . . . . . . . . . . . . . . . . . . . 83 Stability of Scores on the ITBS . . . . . . . . . 83 Part 6 Item and Test Analysis. . . . . . . . . . . . . . . 87 Difficulty of the Tests . . . . . . . . . . . . . . . . . 87 Discrimination . . . . . . . . . . . . . . . . . . . . . . 94 Ceiling and Floor Effects . . . . . . . . . . . . . 100 Completion Rates . . . . . . . . . . . . . . . . . . 100 Other Test Characteristics . . . . . . . . . . . . 100 Part 7 Group Differences in Item and Test Performance . . . . . . . . . . . . . . . . . . . . . . 107 Standard Errors of Measurement for Groups. . . . . . . . . . . . . . . . . . . . . . . . . . 107 Gender Differences in Achievement . . . . 107 Racial-Ethnic Differences in Achievement . . . . . . . . . . . . . . . . . . . . . 114 Differential Item Functioning . . . . . . . . . . 116 Part 8 Relationships in Test Performance . . . 121 Correlations Among Test Scores for Individuals . . . . . . . . . . . . . . . . . . . . . . . 121 Structural Relationships Among Content Domains . . . . . . . . . . . . . . . . . . . . . . . . 121 Levels 9 through 14. . . . . . . . . . . . . . . 126 Levels 7 and 8. . . . . . . . . . . . . . . . . . . 126 Levels 5 and 6. . . . . . . . . . . . . . . . . . . 126 Interpretation of Factors . . . . . . . . . . . 126 Reliabilities of Differences in Test Performance . . . . . . . . . . . . . . . . . . . . . 127 Correlations Among Building Averages . . 127 Relations Between Achievement and General Cognitive Ability . . . . . . . . . . . . 127 Predicting Achievement from General Cognitive Ability: Individual Scores . . . . 131 Obtained Versus Expected Achievement. . . . . . . . . . . . . . . . . . . 136 Predicting Achievement from General Cognitive Ability: Group Averages . . . . . 143

ii

Part 9 Technical Consideration for Other Iowa Tests. . . . . . . . . . . . . . . . . . . 149 Iowa Tests of Basic Skills Survey Battery . . . . . . . . . . . . . . . . . . . . 149 Description of the Tests . . . . . . . . . . . . 149 Other Scores . . . . . . . . . . . . . . . . . . . . 149 Test Development . . . . . . . . . . . . . . . . 149 Standardization . . . . . . . . . . . . . . . . . . 149 Test Score Characteristics. . . . . . . . . . 150 Iowa Early Learning Inventory . . . . . . . . . 150 Description of the Inventory . . . . . . . . 150 Test Development . . . . . . . . . . . . . . . . 151 Standardization . . . . . . . . . . . . . . . . . . 151 Iowa Writing Assessment . . . . . . . . . . . . 151 Description of the Test. . . . . . . . . . . . . 151 Test Development . . . . . . . . . . . . . . . . 152 Standardization . . . . . . . . . . . . . . . . . . 152 Test Score Characteristics. . . . . . . . . . 152 Constructed-Response Supplement to The Iowa Tests . . . . . . . . . . . . . . . . . 153 Description of the Tests . . . . . . . . . . . . 153 Test Development . . . . . . . . . . . . . . . . 154 Joint Scaling with the ITBS . . . . . . . . . 154 Test Score Characteristics. . . . . . . . . . 154 Listening Assessment for ITBS . . . . . . . . 155 Description of the Test. . . . . . . . . . . . . 155 Test Development . . . . . . . . . . . . . . . . 155 Standardization . . . . . . . . . . . . . . . . . . 155 Test Score Characteristics. . . . . . . . . . 157 Predictive Validity . . . . . . . . . . . . . . . . 157 Integrated Writing Skills Test . . . . . . . . . . 157 Description of the Tests . . . . . . . . . . . . 157 Test Development . . . . . . . . . . . . . . . . 158 Standardization . . . . . . . . . . . . . . . . . . 158 Test Score Characteristics. . . . . . . . . . 158 Iowa Algebra Aptitude Test . . . . . . . . . . . 159 Description of the Test. . . . . . . . . . . . . 159 Test Development . . . . . . . . . . . . . . . . 159 Standardization . . . . . . . . . . . . . . . . . . 159 Test Score Characteristics. . . . . . . . . . 160 Works Cited. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page iii

Tables and Figures Part 1: Nature and Purposes of The Iowa Tests® . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Table 1.1 Test and Grade Level Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Table 1.2 Number of Items and Test Time Limits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Part 2: The National Standardization Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Table 2.1 Summary of Standardization Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Table 2.2 Sample Size and Percent of Students by Type of School . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Table 2.3 Percent of Public School Students by Geographic Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Table 2.4 Percent of Public School Students by SES Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Table 2.5 Percent of Public School Students by District Enrollment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Table 2.6 Percent of Catholic Students by Diocese Size and Geographic Region . . . . . . . . . . . . . . . . . . . 11 Table 2.7 Percent of Private Non-Catholic Students by Geographic Region . . . . . . . . . . . . . . . . . . . . . . . . 12 Table 2.8 Racial-Ethnic Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Table 2.9 Test Accommodations—Special Education and 504 Students. . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Table 2.10 Test Accommodations—English Language Learners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Part 3: Validity in the Development and Use of The Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Figure 3.1 Steps in Development of the Iowa Tests of Basic Skills. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Table 3.1 Distribution of Skills Objectives for the Iowa Tests of Basic Skills, Forms A and B . . . . . . . . . . . 31 Table 3.2 Types of Reading Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Table 3.3 Reading Content/Process Standards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Table 3.4 Listening Content/Process Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Table 3.5 Comparison of Language Tests by Battery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Table 3.6 Computational Skill Level Required for Math Problem Solving and Data Interpretation . . . . . . . 41 Table 3.7 Summary Data from Predictive Validity Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Table 3.8 Readability Indices for Selected Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Part 4: Scaling, Norming, and Equating The Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Table 4.1 Comparison of Grade-to-Grade Overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Table 4.2 Differences Between National Percentile Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Figure 4.1 Trends in National Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Table 4.3 Summary of Median Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Figure 4.2 Trends in Iowa Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Table 4.4 Sample Sizes for Equating Forms A and B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Part 5: Reliability of The Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Table 5.1 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Table 5.2 Equivalent-Forms Reliabilities, Levels 5–14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Table 5.3 Estimates of Equivalent-Forms Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Table 5.4 Mean (Grades 3–8) Reliability Coefficients: Reliability Types Analysis by Tests . . . . . . . . . . . . . 76 Table 5.5 Test-Retest Reliabilities, Levels 5–8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Table 5.6 Standard Errors of Measurement for Selected Standard Score Levels . . . . . . . . . . . . . . . . . . . . 78 Table 5.7 Correlations Between Developmental Standard Scores, Forms A and B . . . . . . . . . . . . . . . . . . 84 Table 5.8 Correlations Between Developmental Standard Scores, Forms K and L . . . . . . . . . . . . . . . . . . 85

iii

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page iv

Part 6: Item and Test Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Table 6.1 Word Analysis Content Classifications with Item Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Table 6.2 Usage and Expression Content Classifications with Item Norms . . . . . . . . . . . . . . . . . . . . . . . . 89 Table 6.3 Distribution of Item Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Table 6.4 Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices . . . . . . . . . . . . 95 Table 6.5 Ceiling Effects, Floor Effects, and Completion Rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Part 7: Group Differences in Item and Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Table 7.1 Standard Errors of Measurement in the Standard Score Metric for ITBS by Level and Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 by Level and Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Table 7.2 Male-Female Effect Sizes for Average Achievement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Table 7.3 Descriptive Statistics by Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Table 7.4 Gender Differences in Achievement over Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Table 7.5 Race Differences in Achievement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Table 7.6 Effect Sizes for Racial-Ethnic Differences in Average Achievement . . . . . . . . . . . . . . . . . . . . . 115 Table 7.7 Fairness Reviewers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Table 7.8 Number of Items Identified in Category C in National DIF Study . . . . . . . . . . . . . . . . . . . . . . . . 119 Part 8: Relationships in Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Table 8.1 Correlations Among Developmental Standard Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Table 8.2 Reliabilities of Differences Among Scores for Major Test Areas: Developmental Standard Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Table 8.3 Reliabilities of Differences Among Tests: Developmental Standard Scores . . . . . . . . . . . . . . . 128 Table 8.4 Correlations Among School Average Developmental Standard Scores. . . . . . . . . . . . . . . . . . . 131 Table 8.5 Correlations Between Standard Age Scores and Developmental Standard Scores . . . . . . . . . 137 Table 8.6 Reliabilities of Difference Scores and Standard Deviations of Difference Scores Due to Errors of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Table 8.7 Correlations, Prediction Constants, and Standard Errors of Estimate for School Averages . . . 145 Part 9: Technical Consideration for Other Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Table 9.1 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Iowa Tests of Basic Skills–Survey Battery, Form A Table 9.2 Average Reliability Coefficients, Grades 3–8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Iowa Writing Assessment Table 9.3 Correlations and Reliability of Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Iowa Writing Assessment and Iowa Tests of Basic Skills Language Total Table 9.4 Internal-Consistency Reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Constructed-Response Supplement Table 9.5 Correlations and Reliabilities of Differences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Constructed-Response Supplement and Corresponding ITBS Subtests Table 9.6 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Listening Assessment for ITBS Table 9.7 Correlations Between Listening and ITBS Achievement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Table 9.8 Correlations Between Listening Grade 2 and ITBS Grade 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Table 9.9 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Integrated Writing Skills Test, Form M Table 9.10 Correlations Between IWST and ITBS Reading and Language Tests . . . . . . . . . . . . . . . . . . . . 159 Table 9.11 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Iowa Algebra Aptitude Test–Grade 8 Table 9.12 Correlations Between IAAT and Algebra Grades and Test Scores . . . . . . . . . . . . . . . . . . . . . . 160

iv

961464_ITBS_GuidetoRD.qxp

10/29/10

PART 1

3:15 PM

Page 1

Nature and Purposes of The Iowa Tests ®

The Iowa Tests

Validity of the Tests

The Iowa Tests consist of a variety of educational achievement instruments developed by the faculty and professional staff at Iowa Testing Programs at The University of Iowa. The Iowa Tests of Basic Skills® (ITBS®) measure educational achievement in 15 subject areas for kindergarten through grade 8. The Iowa Tests of Educational Development ® (ITED®) measure educational achievement in nine subject areas for grades 9 through 12. These test batteries share a history of development that has been an integral part of the research program in educational measurement at The University of Iowa for the past 70 years. In addition to these achievement batteries, The Iowa Tests include specialized instruments for specific achievement domains.

The most valid assessment of achievement for a particular school is one that most closely defines that school’s education standards and goals for teaching and learning. Ideally, the skills and abilities required for success in assessment should be the same skills and abilities developed through local instruction. Whether this ideal has been attained in the Iowa Tests of Basic Skills is something that must be determined from an itemby-item examination of the test battery early in the decision-making process.

This Guide to Research and Development is devoted primarily to the ITBS and related assessments. The Guide to Research and Development for the ITED contains technical information about that test battery and related assessments.

Major Purposes of the ITBS Batteries The purpose of measurement is to provide information that can be used to improve instruction and learning. Assessment of any kind has value to the extent that it results in better decisions for students. In general, these decisions apply to choosing goals for instruction and learning strategies to achieve those goals, designing effective classroom environments, and meeting the diverse needs and characteristics of students. The Iowa Tests of Basic Skills measure growth in fundamental areas of school achievement: vocabulary, reading comprehension, language, mathematics, social studies, science, and sources of information. The achievement standards represented by the tests are crucial in educational development because they can determine the extent to which students will benefit from later instruction. Periodic assessment in these areas is essential to tailor instruction to individuals and groups, to provide educational guidance, and to evaluate the effectiveness of instruction.

Common practices to validate test content have been used to prepare individual items for The Iowa Tests. The content standards were determined through consideration of typical course coverage, current teaching methods, and recommendations of national curriculum groups. Test content has been carefully selected to represent best curriculum practice, to reflect current performance standards, and to represent diverse populations. The arrangement of items into levels within tests follows a scope and sequence appropriate to a particular level of teaching and cognitive development. Items are selected for content relevance from a larger pool of items tried out with a range of students at each grade level. Throughout the battery, efforts have been made to emphasize the functional value of what students learn in school. Students’ abilities to use what they learn to interpret what they read, to analyze language, and to solve problems are tested in situations that approximate—to the extent possible with a paper and pencil test—actual situations in which students may use these skills. Ultimately, the validity of information about achievement derived from The Iowa Tests depends on how the information is used to improve instruction and learning. Over the years, the audience for assessment information has grown. Today it represents varied constituencies concerned about educational progress at local, state, and national levels. To make assessment information useful, careful attention must be paid to reporting results to students, to parents and teachers, to 1

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 2

school administrators and board members, and to the public. Descriptions of the types of score reports provided with The Iowa Tests are included in the Interpretive Guide for Teachers and Counselors and the Interpretive Guide for School Administrators. How to present test results to various audiences is discussed in these guides.

Description of the ITBS Batteries

Nature of the Levels Levels 5–6 (Grades K.1–1.9)

The achievement tests included in the Complete Battery are listed below. The Composite score for these levels, Core Total, includes only the tests preceded by a solid circle (•). Those included in the Reading Profile Total are followed by an asterisk (*). Abbreviations used in this Guide appear in parentheses. • Vocabulary* (V)

Names of the Tests Iowa Tests of Basic Skills® (ITBS®) Form A, Level 5; Form A, Level 6; Forms A and B, Levels 7 and 8; Forms A and B, Levels 9–14. Description of the Test Batteries The ITBS includes three batteries that allow for a variety of testing needs: • The Complete Battery consists of five to fifteen subtests, depending on level, and is available at Levels 5 through 14. • The Core Battery consists of a subset of tests in the Complete Battery, including all tests that assess reading, language, and math. It is available at Levels 7 through 14. • The Survey Battery consists of 30-minute tests on reading, language, and math. Items in the Survey Battery come from tests in the Complete Battery. It is available at Levels 7 through 14. Nature of the Batteries

Word Analysis* (WA) Listening* (Li) • Language (L) • Mathematics (M) Reading: Words* (Level 6 only) (RW) Reading: Comprehension* (Level 6 only) (RC) Levels 7–8 (Grades 1.7–3.2)

The achievement tests included in the Complete Battery and the Core Battery are listed below. Those in the Core Battery are preceded by a solid circle (•). Those included in the Reading Profile Total are followed by an asterisk (*). Test abbreviations are given in parentheses. • • • • • • • • •

Levels 5–8

Levels 5 and 6 of Form A are published as a Complete Battery; there is no separate Core Battery or Survey Battery for these levels. Levels 7 and 8 of Forms A and B are published as a Complete Battery (twelve tests), a Core Battery (nine tests), and a Survey Battery (three tests). Levels 9–14

Levels 9 through 14 of Forms A and B are published in a Complete Battery (thirteen tests) and a Survey Battery (three tests). At Level 9, two additional tests are available, Word Analysis and Listening. For Level 9 only, a machine-scorable Complete Battery, a Core Battery (eleven tests), and a Survey Battery are available. Levels 10 through 14 have no separate Core Battery booklet; all Core tests are part of the Complete Battery booklet.

2

Vocabulary* (V) Word Analysis* (WA) Reading* (RC) Listening* (Li) Spelling* (L1) Language (L) Mathematics Concepts (M1) Mathematics Problems (M2) Mathematics Computation (M3) Social Studies (SS) Science (SC) Sources of Information (SI) Levels 9–14 (Grades 3.0–9.9)

The achievement tests in the Complete Battery are listed below. Those in the Core Battery are preceded by a solid circle (•). Those tests included in the Reading Profile Total for Level 9 are followed by an asterisk (*). • • • • • • • •

Vocabulary* (V) Reading Comprehension* (RC) Word Analysis* (Level 9 only) (WA) Listening* (Level 9 only) (Li) Spelling* (L1) Capitalization (L2) Punctuation (L3) Usage and Expression (L4)

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 3

• Math Concepts and Estimation (M1) • Math Problem Solving and Data Interpretation (M2) • Math Computation (M3) Social Studies (SS) Science (SC) Maps and Diagrams (S1) Reference Materials (S2) Tests in the Survey Battery—Reading, Language, and Mathematics—comprise items from the Complete Battery. Each test is divided into the parts indicated. Reading (two parts) Vocabulary Comprehension Language Mathematics (three parts) Concepts, Problem Solving and Data Interpretation Estimation Computation Grade Levels and Test Levels Levels 5 through 14 represent a comprehensive assessment program for kindergarten through grade 9. Each level is numbered to correspond roughly to the age of the student for whom it is best suited. A student should be given the level most compatible with his or her level of academic development. Typically, students in kindergarten and grades 1 and 2 would take only three of the Primary Battery’s four levels before taking Level 9 in grade 3. Table 1.1 shows how test level corresponds to a student’s level of academic development, expressed as a grade range. Decimals in the last column indicate month of the school year. For example, K.1–1.5 means the first month of kindergarten through the fifth month of grade 1. Test Lengths and Times For Levels 5 through 8, the number of questions and approximate working time for each test are given in Table 1.2. Tests at these levels are untimed; the actual time required for a test varies somewhat with the skill level of the students. (The administration times in the table are based on average rates reported by teachers in tryout sessions.) The Level 6 Reading test is administered in two sessions. For Levels 9 through 14, all tests are timed; the administration times include time to read directions as well as to take the tests.

Table 1.1 Test and Grade Level Correspondence Iowa Tests of Basic Skills, Forms A and B Test Level

Age

Grade Level

5 6 7 8 9 10 11 12 13 14

5 6 7 8 9 10 11 12 13 14

K.1 – 1.5 K.7 – 1.9 1.7 – 2.3 2.3 – 3.2 3.0 – 3.9 4.0 – 4.9 5.0 – 5.9 6.0 – 6.9 7.0 – 7.9 8.0 – 9.9

Nature of the Questions For Levels 5 through 8, questions are read aloud except at Level 6 for parts of the Reading test, and at Levels 7 and 8 except for the Reading test and parts of the Vocabulary and Math Computation tests. Questions are multiple choice with three or four response options. Responses are presented in pictures, letters, numerals, or words, depending on the test and level. All questions in Levels 9 through 14 are multiple choice, have four or five options, and are read by the student. Mode of Responding Students who take Levels 5 through 8 mark answers in machine-scorable booklets by filling in a circle. Those who take Levels 9 through 14 mark answers on a separate answer folder (Complete Battery) or answer sheet (Survey Battery). For the machine-scorable booklets at Level 9, students mark answers in the test booklets. Directions A separate Directions for Administration manual is provided for each Complete Battery (Levels 5 through 8) and Core Battery (Levels 7 and 8) level and form. The Survey Battery (Levels 7 and 8) has separate Directions for Administration manuals for each level and form. At Levels 9 through 14, there is one Directions for Administration manual for Forms A and B of the Complete Battery. At these levels, the Survey Battery has a single Directions for Administration manual. The machine-scorable booklets of Level 9 have separate Directions for Administration manuals.

3

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 4

Table 1.2 Number of Items and Test Time Limits Iowa Tests of Basic Skills, Forms A and B

Level 5: Complete Battery Approximate Working Time (Minutes)

• Vocabulary Word Analysis Listening • Language • Mathematics • Core Tests Complete Battery

20 20 30 25 25 1 hr., 10 min. 2 hrs.

Level 6: Complete Battery Number of Items

29 30 29 29 29 87 146

Level 7: Complete and Core Battery Approximate Working Time (Minutes)

• Vocabulary 15 • Word Analysis 15 • Reading 35 • Listening 25 • Spelling 15 • Language 15 • Math Concepts 20 • Math Problems 25 • Math Computation 20 Social Studies 25 Science 25 Sources of Information 25 • Core Battery 3 hrs., 5 min. Complete Battery 4 hrs., 20 min.

Reading 30 Language 25 Mathematics 22 Mathematics Computation 8 Survey Battery 1 hr., 25 min.

4

• Vocabulary 20 Word Analysis 20 Listening 30 • Language 25 • Mathematics 25 Reading: Words 23 Reading: Comprehension 20 • Core Tests 1 hr., 10 min. Complete Battery 2 hrs., 43 min.

Number of Items

31 35 31 31 35 29 19 97 211

Level 8: Complete and Core Battery Number of Items

30 35 34 31 23 23 29 28 27 31 31 22 260 344

Level 7: Survey Battery Approximate Working Time (Minutes)

Approximate Working Time (Minutes)

Approximate Working Time (Minutes)

• Vocabulary 15 • Word Analysis 15 • Reading 35 • Listening 25 • Spelling 15 • Language 15 • Math Concepts 20 • Math Problems 25 • Math Computation 20 Social Studies 25 Science 25 Sources of Information 30 • Core Battery 3 hrs., 5 min. Complete Battery 4 hrs., 25 min.

Number of Items

32 38 38 31 23 31 31 30 30 31 31 28 284 374

Level 8: Survey Battery Number of Items

40 34 27 13 114

Approximate Working Time (Minutes)

Reading 30 Language 25 Mathematics 22 Mathematics Computation 8 Survey Battery 1 hr., 25 min.

Number of Items

44 42 33 17 136

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 5

Table 1.2 (continued) Number of Items and Test Time Limits Iowa Tests of Basic Skills, Forms A and B

Levels 9 –14: Number of Items, Complete and Core Battery Level Working Time (Minutes)

• Vocabulary 15 • Reading Comprehension1 25 + 30 • Spelling 12 • Capitalization 12 • Punctuation 12 • Usage and Expression 30 • Mathematics Concepts and Estimation1 25 + 5 • Mathematics Problem Solving and Data Interpretation 30 • Mathematics Computation 15 Social Studies 30 Science 30 Maps and Diagrams 30 Reference Materials 25 • Word Analysis2 20 • Listening2 25 • Core Battery (3 hrs., 31 min.)3 Complete Battery (5 hrs., 26 min.)4

211 326

9

10

11

12

13

14

29 37 28 24 24 30 31

34 41 32 26 26 33 36

37 43 36 28 28 35 40

39 45 38 30 30 38 43

41 48 40 32 32 40 46

42 52 42 34 34 43 49

22 25 30 30 24 28 35 31

24 27 34 34 25 30 – –

26 29 37 37 26 32 – –

28 30 39 39 28 34 – –

30 31 41 41 30 36 – –

32 32 43 43 31 38 – –

2503 3624

279 402

302 434

321 461

340 488

360 515

1 This

test is administered in two parts. test is untimed. The time given is approximate. 3 With Word Analysis and Listening at Level 9, testing time is 256 min. (4 h., 16 m.) and the number of items is 316. 4 With Word Analysis and Listening at Level 9, testing time is 371 min. (6 h., 11 m.) and the number of items is 428. 2 This

Levels 9 –14: Number of Items, Survey Battery Level Working Time (Minutes)

9

10

11

12

13

14

Reading Part 1: Vocabulary Part 2: Comprehension Language Mathematics Part 1: Concepts and Problems Part 2: Estimation Part 3: Computation

30 5 25 30 30 22 3 5

27 10 17 43 31 19 4 8

30 11 19 47 34 21 4 9

32 12 20 51 37 23 5 9

34 13 21 54 40 25 5 10

36 14 22 57 43 27 6 10

37 14 23 59 46 29 6 11

Survey Battery (1 hr., 30 min.)

90

101

111

120

128

136

142

5

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 6

Other Iowa Tests

Other Manuals

Iowa Writing Assessment

In addition to this Guide to Research and Development, several other manuals provide information for test users. Each Directions for Administration manual includes a section on preparing for test administration as well as the script needed to administer the tests. The Test Coordinator Guide offers suggestions about policies and procedures associated with testing, advice about planning for and administering the testing program, ideas about preparing students and parents, and details about how to prepare answer documents for the scoring service. The Interpretive Guide for Teachers and Counselors describes test content, score reports, use of test results for instructional purposes, and communication of results to students and parents. The Interpretive Guide for School Administrators offers additional information, including guidance on designing a districtwide assessment program and reporting test results. The Norms and Score Conversions booklets contain directions for hand scoring and norms tables for converting raw scores to derived scores such as standard scores and percentile ranks.

The Iowa Writing Assessment measures a student’s ability to generate, organize, and express ideas in writing. This assessment includes four prompts that require students to compose an essay in either narrative, descriptive, persuasive, or expository modes. With norm-referenced evaluation of a student’s writing about a specific topic, the Iowa Writing Assessment adds to the information obtained from other language tests and from the writing students do in the classroom. Listening Assessment for ITBS Content specifications for Levels 9 through 14 of the Listening tests are based on current literature in the teaching and assessment of listening comprehension. The main purposes of the Listening Assessment are: (a) to measure strengths and weaknesses in listening so effective instruction can be planned to meet individual and group needs; (b) to monitor listening instruction; and (c) to help make teachers and students aware of the importance of good listening strategies. Constructed-Response Supplement to The Iowa Tests These tests may be used with the Complete Battery and Survey Battery of the ITBS. The ConstructedResponse Supplement measures achievement in reading, language, and math in an open-ended format. Students write answers in the test booklet, and teachers use the scoring guidelines to rate the responses. The results can be used to provide information about achievement to satisfy requirements for multiple measures.

6

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 7

PART 2

The National Standardization Program

Normative data collected at the time of standardization is what distinguishes normreferenced tests from other assessments. It is through the standardization process that scores, scales, and norms are developed. The procedures used in the standardization of The Iowa Tests are designed to make the norming sample reflect the national population as closely as possible, ensuring proportional representation of ethnic and socioeconomic groups. The standardization of the Iowa Tests of Basic Skills (ITBS) Complete Battery and Survey Battery was a cooperative venture. It was planned by the ITBS authors, the publisher, and the authors of the Iowa Tests of Educational Development (ITED) and the Cognitive Abilities Test™ (CogAT ®). Many public and non-public schools cooperated in national item tryouts and standardization activities, which included the 2000 spring and fall test administrations, scaling, and equating studies.

Planning the National Standardization Program The standardization of the ITBS, ITED, and CogAT was carried out as a single enterprise. After reviewing previous national standardization programs, the basic principles and conditions of those programs were adapted to the following current needs: • The sample should be selected to represent the national population with respect to ability and achievement. It should be large enough to represent the diverse characteristics of the population, but a carefully selected sample of reasonable size would be preferred over a larger but less carefully selected sample. • Sampling units should be chosen primarily on the basis of school district size, region of country, and socioeconomic characteristics. A balance between public and non-public schools should be obtained. • The sample of attendance centers should be sufficiently large and selected to provide dependable norms for building averages.

• Attendance centers in each part of the sample should represent the central tendency and variability of the population. • To ensure comparability of norms from grade to grade, all grades in a selected attendance center (or a designated fraction thereof) should be tested. • To ensure comparability of norms for ability and achievement tests, both the ITBS and the CogAT should be administered to the same students at the appropriate grade level. • To ensure comparability of norms for Complete and Survey Batteries, alternate forms of both batteries should be administered at the appropriate grade level to the same students or to equivalent samples of students. • To ensure applicability of norms to all students, testing accommodations for students who require them should be a regular part of the standardization design.

Procedures for Selecting the Standardization Sample Public School Sample Three stratifying variables were used to classify public school districts across the nation: geographic region, district enrollment, and socioeconomic status (SES) of the school district. Within each geographic region (New England and Mideast, Southeast, Great Lakes and Plains, and West and Far West), school districts were stratified into nine enrollment categories. School district SES was determined with data from the National Education Database™ (Quality Education Data, 2002). The socioeconomic index is the percent of students in a district falling below the federal government poverty guideline, similar to the Orshansky index used in sampling for the National Assessment of Educational Progress (NAEP). This index was used in each of the four regions to break the nine district-size categories into five strata.

7

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 8

In each SES category, districts were selected at random and designated as first, second, or third choices. Administrators in the selected districts were contacted by the publisher and invited to participate. If a district declined, the next choice was contacted. Catholic School Sample The primary source for selecting and weighting the Catholic sample was NCEA/Ganley’s Catholic Schools in America (NCEA, 2000). Within each geographic region of the public sample, schools were stratified into five categories on the basis of diocesan enrollment. A two-stage random sampling procedure was used to select the sample. In the first stage, dioceses were randomly selected from each of five enrollment categories. Different sampling fractions were used, ranging from 1.0 for dioceses with total student enrollment above 100,000 (all four were selected) to .07 for dioceses with fewer than 10,000 students (seven of 102 were selected). In the second stage, schools were randomly chosen from each diocese selected in the first stage. In all but the smallest enrollment dioceses—where only one school was selected—two schools were randomly chosen. If the selected school declined to participate, the alternate school was contacted. If neither school agreed to participate, additional schools randomly selected from the diocese were contacted. Private Non-Catholic School Sample The sample of private non-Catholic schools was obtained from the QED data file. The schools in each geographic region of the public and Catholic samples were stratified into two types: churchrelated and nonsectarian. Schools were randomly sampled in eight categories (region by type of school) until the target number of students was reached. For each school selected, an alternate school was chosen to be contacted if the selected school declined to participate. Summary These sampling procedures produced (1) a national probability sample representative of students nationwide; (2) a nationwide sample of schools for school building norms; (3) data for Catholic/private and other special norms; and (4) empirical norms for the Complete Battery and the Survey Battery.

8

The authors and publisher of the ITBS are grateful to many people for assistance in preparing test materials and administering tests in item tryouts and special research projects. In particular, gratitude is acknowledged to administrators, teachers, and students in the schools that took part in the national standardization. These schools are listed at the end of this part of the Guide to Research and Development. Schools marked with an asterisk participated in both spring and fall standardizations.

Design for Collecting the Standardization Data A timetable for administration of the ITBS and the CogAT is given in Table 2.1. This illustrates how the national standardization study was designed. During the spring standardization, students took the appropriate level of the Complete Battery of the ITBS, Form A. These same students took Form 6 of the CogAT. The design of the fall standardization was more complex. Every student in grades 2 through 8 participated in two units of testing. The order of the two testing units was counterbalanced. In the first testing unit, the student took the Complete Battery of either Form A or Form B of the ITBS. In grades 2 and 3, Forms A and B of the ITBS machine-scorable booklets were used in alternate classrooms. In approximately half of the grade 3 classrooms, alternate forms of the ITBS Level 8 were administered; in the remaining grade 3 classrooms, Forms A and B of Level 9 were administered to every other student. In grades 4 through 8, Forms A and B were administered to every other student in all classrooms. In the second testing unit of the fall standardization, students took Form A or Form B of the Survey Battery. (Students who had taken Form A of the Complete Battery took Form B of the Survey Battery and vice versa.)

Weighting the Samples After materials from the spring standardization had been received by the Riverside Scoring Service®, the number and percents of students in each sample (public, Catholic, and private non-Catholic) and stratification category were determined. The percents were adjusted by weighting to compensate for missing categories and to adjust for schools that tested more or fewer students than required.

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 9

Table 2.1 Summary of Standardization Schedule

Time

First Unit

Spring 2000

ITBS, Form A Complete Battery (Levels 5 – 8, Grades K – 2)

CogAT, Form 6 (Levels 1 – 2, Grades K – 3)

ITBS, Form A Complete Battery (Levels 9 – 14, Grades 3 – 8)

CogAT, Form 6 (Levels A – F, Grades 3 – 8)

Fall 2000

Second Unit

ITBS, Form A/B Complete Battery (Levels 7 – 8, Grades 2 – 3) ITBS, Form A/B Complete Battery (Levels 9 –14, Grades 3 – 8)

ITBS, Form B/A Survey Battery (Levels 9 – 14, Grades 3 – 8)

The number of students in the 2000 spring national standardization of the ITBS is given in Table 2.2 for the public, Catholic, and private non-Catholic samples. Table 2.2 also shows the unweighted and weighted sample percents and the population percents for each cohort.

Tables 2.3 through 2.7 summarize the unweighted and weighted sample characteristics for the spring 2000 standardization of the ITBS based on the principal stratification variables of the public school sample and other key characteristics of the nonpublic sample.

Optimal weights for these samples were determined by comparing the proportion of students nationally in each cohort to the corresponding sample proportion. Once the optimal weight for each sample was obtained, the stratification variables were simultaneously considered to assign final weights. These weights (integer values 0 through 9, with 3 denoting perfect proportional representation) were assigned to synthesize the characteristics of a missing unit or adjust the frequencies in other units. As a result, the weighted distributions in the three standardization samples closely approximate those of the total student population.

In addition to the regular norms established in the 2000 national standardization, separate norms were established for special populations. These norms and the procedures used to derive them are discussed in Part 4.

9

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 10

Table 2.2 Sample Size and Percent of Students by Type of School Spring 2000 National Standardization Sample, ITBS, Grades K–8

Public School Sample

Catholic School Sample

Private Non-Catholic Sample

Total

Unweighted Sample Size

149,831

10,797

9,589

170,217

Unweighted Sample %

88.0

6.3

5.6

100.0

Weighted Sample %

90.1

4.9

5.0

100.0

Population %

90.1

4.9

5.0

100.0

Table 2.3 Percent of Public School Students by Geographic Region Spring 2000 National Standardization Sample, ITBS, Grades K–8 Geographic Region

% of Students in Sample

% of Students in Weighted Sample

% of Students in Population

New England and Mideast

14.7

22.2

21.7

Southeast

26.9

23.8

23.6

Great Lakes and Plains

25.0

22.3

21.9

West and Far West

33.4

31.7

32.7

Table 2.4 Percent of Public School Students by SES Category Spring 2000 National Standardization Sample, ITBS, Grades K–8 % of Students in Sample

% of Students in Weighted Sample

% of Students in Population

High

12.2

15.3

15.2

High Average

23.5

19.2

19.1

Average

36.8

31.3

31.5

Low Average

21.2

19.1

19.1

6.3

15.2

15.1

SES Category

Low

10

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 11

Table 2.5 Percent of Public School Students by District Enrollment Spring 2000 National Standardization Sample, ITBS, Grades K–8 District K–12 Enrollment

% of Students in Sample

% of Students in Weighted Sample

% of Students in Population

100,000 +

3.9

9.7

15.6

50,000 – 99,999

5.6

9.6

8.5

25,000 – 49,999

11.4

17.9

11.4

10,000 – 24,999

23.3

20.1

17.7

5,000 – 9,999

15.8

10.2

14.7

2,500 – 4,999

17.6

13.6

14.7

1,200 – 2,499

12.5

9.2

10.2

600 – 1,199

7.4

6.9

4.5

Less than 600

2.5

2.8

2.7

Table 2.6 Percent of Catholic Students by Diocese Size and Geographic Region Spring 2000 National Standardization Sample, ITBS, Grades K–8 % of Students in Sample

% of Students in Weighted Sample

% of Students in Population

100,000 +

7.7

17.5

17.5

50,000 – 99,999

7.4

17.9

18.0

20,000 – 49,999

35.0

21.8

21.7

10,000 – 19,999

19.4

24.9

25.0

Less than 10,000

30.5

17.9

17.8

New England and Mideast

23.4

35.0

34.9

Southeast

17.5

13.7

13.6

Great Lakes and Plains

44.2

33.7

33.9

West and Far West

14.9

17.6

17.6

Diocese Size

Geographic Region

11

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 12

Table 2.7 Percent of Private Non-Catholic Students by Geographic Region Spring 2000 National Standardization Sample, ITBS, Grades K–8 Geographic Region New England and Mideast

% of Students in Sample

% of Students in Weighted Sample

% of Students in Population

9.7

24.0

23.8

Southeast

19.5

29.3

29.4

Great Lakes and Plains

34.0

19.8

19.7

West and Far West

36.8

26.9

27.1

Racial-Ethnic Representation

Participation of Students in Special Groups

Although not a direct part of a typical sampling plan, the racial-ethnic composition of a national standardization sample should represent that of the school population. The racial-ethnic composition of the 2000 ITBS spring standardization sample was estimated from responses to demographic questions on answer documents. In all grades, all the racialethnic group(s) to which a student belonged was requested. In kindergarten through grade 3, teachers furnished this information. In the remaining grades, students furnished it. The results reported in Table 2.8 include students in Catholic and other private schools. The table also shows estimates of population percents in public schools for each category, according to the National Center for Education Statistics.

In the spring 2000 national standardization, schools were given detailed instructions for the testing of students with disabilities and English Language Learners. Schools were asked to decide whether students so identified should be tested, and, if so, what modifications in testing procedures were needed.

The response rate for racial-ethnic information was high; 98 percent of the standardization participants indicated membership in one of the groups listed. Although the percents of students in each group fluctuate from grade to grade, differences between sample and population percents were generally within chance error. This was true for all groups except Hispanics or Latinos, who were slightly underrepresented. However, some of this underrepresentation can be attributed to school districts exempting from testing students whose first language is not English. These students are not as likely to be represented in the test-taking population as they are in the school population. Collectively, the results in Table 2.8 provide evidence of the overall quality of the national standardization sample and its representativeness of the racial and ethnic makeup of the U.S. student population.

12

Among students with disabilities, nearly all were identified as eligible for special education services and had an Individualized Education Program (IEP), an Individualized Accommodation Plan (IAP), or a Section 504 Plan. Schools were asked to examine the IEP or other plan for these students, decide whether the student should receive accommodations, and determine the nature of those accommodations. Schools were told an accommodation refers to a change in the procedures for administering the test and that an accommodation is intended to neutralize, as much as possible, the effect of the student’s disability on the assessment process. Accommodations should not change the kind of achievement being measured, but change how achievement is measured. When accommodations were used, the test administrator recorded the type of accommodation on each student’s answer document. The accommodations most frequently used by students with IEPs or Section 504 Plans were listed on the student answer document. Space for indicating other accommodations was also included.

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 13

Table 2.8 Racial-Ethnic Representation Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization White (62.1 %)* Grade

Number

Black or African American (17.2 %)*

Percent

Weighted Number

Percent

Grade

Number

Percent

Weighted Number

Percent

K

11,137

67.8

33,201

57.0

K

2,735

16.7

11,343

19.5

1

12,648

70.2

29,729

58.7

1

2,629

14.6

7,773

15.3

2

13,529

71.9

30,560

59.1

2

2,323

12.3

6,780

13.1

3

13,308

72.3

33,713

65.6

3

2,229

12.1

7,336

14.3

4

13,437

71.6

33,866

64.6

4

2,024

10.8

7,028

13.4

5

14,516

72.3

35,010

67.0

5

2,146

10.7

7,231

13.8

6

14,776

73.1

35,509

70.0

6

2,223

11.0

7,404

14.6

7

14,346

75.0

36,519

73.1

7

1,731

9.1

6,160

12.3

8

12,146

71.8

37,154

71.4

8

1,877

11.1

8,503

16.3

Total

119,843

71.9

371,460

66.0

Total

19,917

11.9

62,371

14.6

Hispanic or Latino (15.6 %)* Grade

Number

Percent

Weighted Number

Asian/Pacific Islander (4.0 %)* Percent

Grade

Number

Percent

Weighted Number

Percent

K

1,839

11.2

6,399

11.0

K

429

2.6

1,493

2.6

1

1,975

11.0

5,732

11.3

1

460

2.6

1,192

2.4

2

2,084

11.1

5,785

11.2

2

537

2.9

1,291

2.5

3

1,941

10.5

6,894

13.4

3

497

2.7

1,406

2.7

4

2,080

11.1

7,127

13.6

4

553

2.9

1,468

2.8

5

2,031

10.1

7,499

14.3

5

591

2.9

1,487

2.8

6

1,745

8.6

4,643

9.1

6

614

3.0

1,602

3.2

7

1,647

8.6

4,466

8.9

7

477

2.5

1,328

2.7

8

1,490

8.8

4,379

8.4

8

485

2.9

1,697

3.3

Total

16,832

10.1

62,371

11.1

Total

4,643

2.9

15,584

2.8

American Indian/Alaskan Native (1.2 %)* Grade

Number

Percent

Weighted Number

Percent

Native Hawaiian (NA) Grade

Number

Percent

Weighted Number

Percent

K

207

1.3

878

1.5

K

70

0.4

194

0.3

1

225

1.2

885

1.7

1

69

0.5

157

0.3

2

250

1.3

946

1.8

2

95

0.5

148

0.3

3

364

2.0

1,279

2.5

3

80

0.4

142

0.3

4

500

2.7

1,806

3.4

4

181

1.0

498

1.0

5

673

3.4

2,128

4.1

5

111

0.6

247

0.5

6

749

3.7

2,244

4.4

6

109

0.5

251

0.5

7

789

4.1

2,476

5.0

7

136

0.7

379

0.8

8

656

3.9

2,148

4.1

8

274

1.6

969

1.9

4,413

2.6

17,782

3.2

Total

1,125

0.7

4,060

0.7

Total

*Population percent (Source: Digest of Education Statistics 2000, 1999–2000 public school enrollment)

13

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 14

For students whose native language was not English and who had been in an English-only classroom for a limited time, two decisions had to be made prior to testing. First, was English language developed sufficiently to warrant testing, and, second, should an accommodation be used? In all instances, the district’s instructional guidelines were used in decisions about individual accommodations.

Test administration for the 2000 spring standardization of the ITBS, Form A, took place between March 23 and May 29; it took place for the fall standardization between September 21 and November 11. The spring norming group was a national probability sample of approximately 170,000 students in kindergarten through grade 8; the fall sample was approximately 76,000 students.

The test administrators were told that the use of testing accommodations with English Language Learners is intended to allow the measurement of skills and knowledge in the curriculum without significant interference from a limited opportunity to learn English. Those just beginning instruction in English were not likely to be able to answer many questions no matter what types of accommodations were used. For those in the second or third year of instruction in an English as a Second Language (ESL) program, accommodations might be warranted to reduce the effect of limited English proficiency on test performance. The types of accommodations sometimes used with such students were listed on the student answer document for coding.

After answer documents were checked and scored and sampling weights had been assigned to schools, weighted opening and closing dates were determined. These are reference points for the empirical norms dates. The median empirical norms date for spring testing is April 30; for fall testing it is October 22.

Table 2.9 summarizes the use of accommodations with students with disabilities during the standardization. While the percents vary somewhat across grades, an average of about 7 percent of the students were identified as special education students or as having a 504 Plan. Of these students, roughly 50 percent received at least one accommodation. The last column in the table shows that in the final distribution of scores from which the national norms were obtained, an average of 3 percent to 4 percent of the students received an accommodation. Table 2.10 reports similar information for English Language Learners.

Empirical Norms Dates To provide more information for schools with alternative school calenders, data were collected from districts on their opening and closing dates. Procedures to analyze these data were altered from those used in the 1976–77 standardization—when the Title I program first required empirical norms dates—to determine weighted opening and closing dates. The procedures used and the advice given to school districts that do not have a standard 180-day, September-to-May school year are noted below.

14

Regular fall, midyear, and spring norms can be used by school districts that operate on a twelve-month schedule. To do so, testing should be scheduled so the number of instructional days prior to testing corresponds to the median number of instructional days for schools in the national standardization. For example, the fall norms for the 2000 national standardization were established with a median testing date of October 22, on average 40 instructional days from the median start date of schools in the national standardization. If a school year begins on July 15, testing should be scheduled between September 1 and September 21. Doing so places the median testing date at September 10, about 40 instructional days from the July 15 start date. By testing during this period, instructional opportunity is comparable to the norms group and the use of fall norms is therefore appropriate. Testing dates for twelve-month schools can be calculated in a similar way so midyear and spring norms can be used.

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 15

Table 2.9 Test Accommodations — Special Education and 504 Students Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization (Weighted Sample) Grade

Standardization Sample

Identified Students

Accommodated Students

N

N

% of Standardization Sample

K

58,216

2,121

3.6

262

12.4

0.5

1

50,687

2,397

4.7

905

37.8

1.8

2

51,725

3,076

5.9

1,322

43.0

2.6

3

51,414

3,485

6.8

1,615

46.3

3.1

4

52,392

4,101

7.8

2,184

53.3

4.2

5

52,277

4,286

8.2

2,241

52.3

4.3

6

50,753

3,652

7.2

1,662

45.5

3.3

7

49,925

3,478

7.0

2,146

61.7

4.3

8

52,072

3,489

6.7

2,109

60.4

4.1

N

% of Identified Students

% of Standardization Sample

Note: Accommodations included Braille, large print, tested off level, answers recorded, extended time, communication assistance, transferred answers, individual/small group administration, repeated directions, tests read aloud (except for Vocabulary and Reading Comprehension), plus selected others.

Table 2.10 Test Accommodations — English Language Learners Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization (Weighted Sample) Grade

Standardization Sample

Identified Students

Accommodated Students

N

N

% of Standardization Sample

N

K

58,216

3,780

6.5

1,122

29.7

1.9

1

50,687

2,853

5.6

382

13.4

0.8

2

51,725

3,352

6.5

244

7.3

0.5

3

51,414

2,460

4.8

358

14.6

0.7

4

52,392

3,604

6.9

565

15.7

1.1

5

52,277

3,060

5.9

315

10.3

0.6

6

50,753

973

1.9

163

16.8

0.3

7

49,925

739

1.5

216

29.2

0.4

8

52,072

662

1.3

156

23.6

0.3

% of Identified Students

% of Standardization Sample

Note: Accommodations included tested off level, extended time, individual/small group administration, repeated directions, provision of English/native language word-to-word dictionary, test administered by ESL teacher or individual providing language services.

15

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 16

School Systems Included in the 2000 Standardization Samples New England and Mideast Connecticut Orange: New Haven Hebrew Day School Thomaston, Thomaston School District: Thomaston Center Intermediate School Waterbury, Archdiocese of Hartford: St. Joseph School

Delaware Newark, Christina School District: Summit School Wilmington, Brandywine School District: Talley Middle School

District of Columbia Washington: Nannie H. Borroughs School

Maine Bangor, Hermon School District*: Hermon Middle School Bowdoinham, School Admin. District 75: Bowdoinham Community Elementary School Calais, Union 106 Calais: Calais Elementary School Danforth, School Admin. District 14: East Grand School Hancock, Union 92 Hancock: Hancock Elementary School Jonesport, Union 103 Jonesport: Jonesport Elementary School Limestone, Caswell School District: Dawn F. Barnes Elementary School Monmouth, Monmouth Public School District: Henry Cottrell Elementary School North Berwick, School Admin. District 60: Berwick Elementary School, Hanson School, Noble Junior High School, North Berwick Elementary School, Vivian E. Hussey Primary School Portland, Diocese of Portland: Catherine McAuley High School, St. Joseph’s School Robbinston, Union 106 Robbinston*: Robbinston Grade School Turner*: Calvary Christian Academy Vanceboro, Union 108 Vanceboro*: Vanceboro Elementary School

Maryland Baltimore, Baltimore City-Dir. Inst. Area 9*: Roland Park Elementary/Middle School 233 Baltimore, Baltimore City Public School District*: Edgecombe Circle Elementary School 62, Samuel F. B. Morse Elementary School 98 Hagerstown: Heritage Academy Hagerstown, Archdiocese of Baltimore: St. Maria Goretti High School, St. Mary School Stevensville, Queen Annes County School District: Kent Island Elementary School

16

Massachusetts Adams, Adams Cheshire Regional School District*: C. T. Plunkett Elementary School Boston, Archdiocese of Boston: Holy Trinity School, Immaculate Conception School, St. Bridget School Bridgewater S., Bridgewater-Raynham Regional School District: Burnell Laboratory School Danvers, Danvers School District: Highlands Elementary School, Willis Thorpe Elementary School Fall River: Antioch School Fall River, Diocese of Fall River: Our Lady of Lourdes School, Our Lady of Mt. Carmel School, Taunton Catholic Middle School Fall River, Diocese of Fall River*: Espirito Santo School, St. Jean Baptiste School Fall River, Fall River School District*: Brayton Avenue Elementary School, Harriet T. Healy Elementary School, Laurel Lake Elementary School, McCarrick Elementary School, Ralph Small Elementary School, Westall Elementary School Fitchburg, Fitchburg School District: Memorial Intermediate School Lowell: Lowell Public School District Peabody, Peabody Public School District: Kiley Brothers Memorial School Phillipston, Narragansett Regional School District: Phillipston Memorial Elementary School South Lancaster*: Browning SDA Elementary School Swansea: Swansea School District Walpole: Walpole Public School District Weymouth, Weymouth School District*: Academy Avenue Primary School, Lawrence Pingree Primary School, Murphy Primary School, Ralph Talbot Primary School, South Intermediate School, Union Street Primary School, William Seach Primary School Worcester, Worcester Public School District: University Park Campus School

New Hampshire Bath, District 23 Bath: Bath Village Elementary School Litchfield, District 27 Litchfield: Griffin Memorial Elementary School Manchester, District 37 Manchester*: Bakersville Elementary School, Gossler Park Elementary School, Hallsville Elementary School, McDonough Elementary School, Southside Middle School, Weston Elementary School North Haverhill, District 23 Haverhill: Haverhill Cooperative Middle School, Woodsville Elementary School, Woodsville High School Rochester, District 54 Rochester: Maple Street Elementary School Salem: Granite Christian School Warren, District 23 Warren*: Warren Village School

Note: Schools marked with an asterisk (*) participated in both spring and fall standardizations.

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 17

New Jersey Collingswood*: Collingswood Public School District Elizabeth: Bruriah High School For Girls Jersey City: Jersey City Public School District Salem, Mannington Township School District: Mannington Elementary School

New York Beaver Falls, Beaver River Central School District: Beaver River Central School Briarcliff Manor, Briarcliff Manor Union Free School District: Briarcliff High School Bronx*: Regent School Dobbs Ferry, Archdiocese of New York*: Our Lady of Victory Academy Elmhurst, Diocese of Brooklyn: Cathedral Preparatory Seminary Lowville, Lowville Central School District: Lowville Academy and Central School New York, Archdiocese of New York: Corpus Christi School, Dominican Academy, St. Christopher Parochial School, St. John Villa Academy-Richmond, St. Joseph Hill Academy North Tonawanda: North Tonawanda School District Old Westbury: Whispering Pines SDA School Spring Valley, East Ramapo Central School District*: M. L. Colton Intermediate School Weedsport, Weedsport Central School District: Weedsport Elementary School, Weedsport Junior/Senior High School

Pennsylvania Austin, Austin Area School District: Austin Area School Bloomsburg, Bloomsburg Area School District: Bloomsburg Memorial Elementary School Cheswick, Allegheny Valley School District: Acmetonia Primary School Dubois, Diocese of Erie*: Dubois Central Christian High School Ebensburg, Diocese of Altoona Johnstown: Bishop Carroll High School Erie, Millcreek Township School District*: Chestnut Hill Elementary School Farrell: Farrell Area School District Gettysburg, Gettysburg Area School District: Gettysburg Area High School Hadley, Commodore Perry School District: Commodore Perry School Lebanon, Diocese of Harrisburg: Lebanon Catholic Junior/Senior High School Manheim: Manheim Central School District McKeesport, Diocese of Pittsburgh: Serra Catholic High School McKeesport, South Allegheny School District: Glassport Central Elementary School, Manor Elementary School, Port Vue Elementary School, South Allegheny Middle High School

Middleburg, Midd-West School District: Penns Creek Elementary School, Perry-West Perry Elementary School Philadelphia, Philadelphia School District-Bartram*: Bartram High School Philadelphia, Philadelphia School District-Franklin*: Stoddart-Fleisher Middle School Philadelphia, Philadelphia School District-Gratz*: Thomas M. Peirce Elementary School Philadelphia, Philadelphia School District-Kensington*: Alexander Adaire Elementary School Philadelphia, Philadelphia School District-Olney*: Jay Cooke Middle School Philadelphia, Philadelphia School District-Overbrook*: Lewis Cassidy Elementary School Philadelphia, Philadelphia School District-William Penn*: John F. Hartranft Elementary School Pittsburgh: St. Matthew Lutheran School Pittsburgh, Diocese of Pittsburgh*: St. Mary of the Mount School

Rhode Island Johnston: Trinity Christian Academy Providence: Providence Hebrew Day School Providence, Diocese of Providence: All Saints Academy, St. Xavier Academy

Vermont Chelsea, Chelsea School District: Chelsea School Williston: Brownell Mountain SDA School

Southeast Alabama Abbeville, Henry County School District: Abbeville Elementary School, Abbeville High School Columbiana, Shelby County School District: Helena Elementary School, Oak Mountain Middle School Dothan, Dothan City School District*: Beverlye Middle School, East Highland Learning Center, Girard Middle School, Honeysuckle Middle School Eclectic, Elmore County School District: Eclectic Elementary School Elberta, Baldwin County School District: Elberta Middle School Fairhope, Baldwin County School District*: Fairhope Elementary School, Fairhope Intermediate School Jacksonville: Jacksonville Christian Academy Mobile, Mobile County School District: Cora Castlen Elementary School, Dauphin Island Elementary School Mobile, Mobile County School District*: Adelia Williams Elementary School, Florence Howard Elementary School, Mobile County Training School Monroeville, Monroe County School District: Monroe County High School Tuscaloosa, Tuscaloosa County School District: Hillcrest High School

17

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 18

Arkansas

Kentucky

Altus, Altus-Denning School District 31: Altus-Denning Elementary School, Altus-Denning High School Beebe, Beebe School District*: Beebe Elementary School, Beebe Intermediate School Bismarck, Bismarck School District*: Bismarck Elementary School Conway, Conway School District*: Ellen Smith Elementary School, Florence Mattison Elementary School, Ida Burns Elementary School, Marguerite Vann Elementary School Fouke, Fouke School District 15*: Fouke High School, Fouke Middle School Gentry: Ozark Adventist Academy Grady: Grady School District 5 Little Rock*: Heritage Christian School Mountain Home, Mountain Home School District 9: Pinkston Middle School Norman: Caddo Hills School District Springdale, Springdale School District 50*: Parson Hills Elementary School Strawberry: River Valley School District

Benton*: Christian Fellowship School Bowling Green, Warren County School District*: Warren East High School Campbellsville: Campbellsville Independent School District Elizabethtown, Hardin County School District: East Hardin Middle School, Parkway Elementary School, Rineyville Elementary School, Sonora Elementary School, Upton Elementary School, Woodland Elementary School Elizabethtown, Hardin County School District*: Brown Street Alternative Center, G. C. Burkhead Elementary School, Lynnvale Elementary School, New Highland Elementary School Florence: Northern Kentucky Christian School Fordsville, Ohio County School District: Fordsville Elementary School Hardinsburg, Breckinridge County School District: Hardinsburg Primary School Hartford, Ohio County School District*: Ohio County High School, Ohio County Middle School, Wayland Alexander Elementary School Hazard, Perry County School District*: Perry County Central High School Louisville: Eliahu Academy Pineville: Pineville Independent School District Williamstown, Williamstown Independent School District: Williamstown Elementary School

Florida Archer, Alachua County School District: Archer Community School Century, Escambia School District: George W. Carver Middle School Fort Lauderdale, Archdiocese of Miami: St. Helen School Gainesville, Alachua County School District*: Hidden Oak Elementary School, Kimball Wiles Elementary School Jacksonville Beach: Beaches Episcopal School Kissimmee, Osceola School District: Reedy Creek Elementary School Miami, Archdiocese of Miami*: St. Agatha School Ocala, Marion County School District*: Fort King Middle School Orlando, Orange County School District-East: Colonial 9th Grade Center, University High School Palm Bay, Diocese of Orlando: St. Joseph Catholic School Palm Coast, Flagler County School District: Buddy Taylor Middle School, Old Kings Elementary School Pensacola, Escambia School District*: Redirections

Georgia Barnesville: Lamar County School District Crawfordville, Taliaferro County School District: Taliaferro County School Cumming, Forsyth County School District*: South Forsyth Middle School Dalton, Whitfield County School District: Cohutta Elementary School, Northwest High School, Valley Point Middle School, Westside Middle School Shellman*: Randolph Southern School

18

Louisiana Chalmette, St. Bernard Parish School District: Arabi Elementary School, Beauregard Middle School, Borgnemouth Elementary School, J. F. Gauthier Elementary School, Joseph J. Davies Elementary School, N. P. Trist Middle School, Sebastien Roy Elementary School, St. Bernard High School Chalmette, St. Bernard Parish School District*: Andrew Jackson Fundamental High School, C. F. Rowley Elementary School, Lacoste Elementary School Lafayette, Diocese of Lafayette*: Redemptorist Elementary School, St. Peter School Plain Dealing: Plain Dealing Academy Shreveport, Diocese of Shreveport*: Holy Rosary School, Jesus the Good Shepherd School, St. John Cathedral Grade School West Monroe, Diocese of Shreveport: St. Paschal School

Mississippi Brandon*: University Christian School Gulfport, Gulfport School District: Bayou View Elementary School

North Carolina Greensboro, Guilford County School District: Alamance Elementary School, Montlieu Avenue Elementary School, Shadybrook Elementary School Hillsborough: Abundant Life Christian School Manteo: Dare County School District New Bern, Diocese of Raleigh: St. Paul Education Center

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 19

South Carolina

West Virginia

Beaufort, Beaufort County School District: E. C. Montessori and Grade School Camden: Kershaw County School District North Augusta, Diocese of Charleston: Our Lady of Peace School Rock Hill: Westminster Catawba Christian School Salem, Oconee County School District*: Tamassee Salem Middle High School Westminster, Oconee County School District: West-Oak High School

Arnoldsburg, Calhoun County School District*: Arnoldsburg School Elizabeth, Wirt County School District: Wirt County High School Grantsville, Calhoun County School District: Pleasant Hill Elementary School Omar: Beth Haven Christian School Wayne, Wayne County School District: East Lynn Elementary School, Lavalette Elementary School, Wayne Middle School Weirton, Diocese of Wheeling-Charleston: Madonna High School

Tennessee Athens, McMinn County School District: Mountain View Elementary School, Niota Elementary School Athens, McMinn County School District*: E. K. Baker Elementary School, Rogers Creek Elementary School Byrdstown*: Pickett County School District Dyer, Gibson County School District: Medina Elementary School, Rutherford Elementary School Fairview, Williamson County School District: Fairview High School Harriman, Harriman City School District*: Raymond S. Bowers Elementary School, Walnut Hill Elementary School Harrogate: J. Frank White Academy Murfreesboro, Rutherford County School District: Central Middle School, Smyrna West Kindergarten Somerville, Fayette County School District: Jefferson Elementary School, Oakland Elementary School Yorkville, Gibson County School District*: Yorkville Elementary School

Virginia Charlottesville, Albemarle County School District: Monticello High School Chesapeake: Tidewater Adventist Academy Forest, Bedford County School District: Forest Middle School Jonesville, Lee County School District*: Ewing Elementary School, Rose Hill Elementary School Madison Heights, Amherst County School District: Madison Heights Elementary School Marion, Smyth County School District: Atkins Elementary School, Chilhowie Elementary School, Chilhowie Middle School, Marion Intermediate School, Marion Middle School, Marion Primary School, Sugar Grove Combined School Saltville, Smyth County School District*: Northwood Middle School, Rich Valley Elementary School, Saltville Elementary School St. Charles, Lee County School District: St. Charles Elementary School Staunton: Stuart Hall School Suffolk, Suffolk Public School District*: Forest Glen Middle School

Great Lakes and Plains Illinois Bartlett, Elgin School District U-46, Area B: Bartlett Elementary School, Bartlett High School Benton, Benton Community Consolidated School District 47: Benton Elementary School Berwyn, Berwyn South School District 100: Heritage Middle School Cambridge, Cambridge Community Unit School District 227*: Cambridge Community Elementary School, Cambridge Community Junior/Senior High School Chicago, Chicago Public School District-Region 1*: Stockton Elementary School Chicago, Chicago Public School District-Region 4*: Brighton Park Elementary School Duquoin, Duquoin Community Unit School District 300: Duquoin Middle School Elgin, Elgin School District U-46, Area A*: Century Oaks Elementary School, Garfield Elementary School, Washington Elementary School Elgin, Elgin School District U-46, Area B*: Elgin High School, Ellis Middle School Glendale Heights, Queen Bee School District 16: Pheasant Ridge Primary School Joliet: Ridgewood Baptist Academy Lake Villa, Lake Villa Community Consolidated School District 41: Joseph J. Pleviak Elementary School Lincoln, Lincoln Elementary School District 27*: Northwest Elementary School, Washington-Monroe Elementary School Mossville, Illinois Valley Central School District 321: Mossville Elementary School Quincy, Diocese of Springfield: St. Francis Solanus School Schaumburg, Schaumburg Community Consolidated School District 54: Douglas MacArthur Elementary School, Everett Dirksen Elementary School Streamwood, Elgin School District U-46, Area C*: Oakhill Elementary School, Ridge Circle Elementary School Villa Park: Islamic Foundation School Wayne City*: Wayne City Community Unit School District 100 Westmont: Westmont Community Unit School District 201

19

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 20

Indiana Hammond, Diocese of Gary*: Bishop Noll Institute, St. Catherine of Siena School, St. John Bosco School Indianapolis, Perry Township School District: Homecroft Elementary School, Mary Bryan Elementary School Logansport, Logansport Community School District*: Lincoln Middle School Spencer, Spencer-Owen Community School District: Gosport Elementary School, Patricksburg Elementary School, Spencer Elementary School Valparaiso, Valparaiso Community School District: Benjamin Franklin Middle School, Thomas Jefferson Middle School Vevay, Switzerland County School District: Switzerland County High School Warsaw: Redeemer Lutheran School Warsaw, Warsaw Community Schools: Eisenhower Elementary School

Iowa Alton, Diocese of Sioux City: Spalding Catholic Elementary School Bellevue, Archdiocese of Dubuque: Marquette High School Davenport, Diocese of Davenport: Cardinal Stritch Junior/Senior High School, Mater Dei Junior/Senior High School, Notre Dame Elementary School, Trinity Elementary School Delhi: Maquoketa Valley Community School District Remsen, Diocese of Sioux City*: St. Mary’s High School Williamsburg: Lutheran Interparish School

Kansas Anthony, Anthony-Harper Unified School District 361: Chaparral High School, Harper Elementary School Columbus, Columbus Unified School District 493: Central School Galena, Columbus Unified School District 493*: Spencer Elementary School Kansas City*: Mission Oaks Christian School Kansas City, Archdiocese of Kansas City: Assumption School, St. Agnes School Osawatomie*: Osawatomie Unified School District 367 Spring Hill, Spring Hill Unified School District 230: Spring Hill High School St. Paul, Erie-St. Paul Consolidated School District 101: St. Paul Elementary School, St. Paul High School Westwood*: Mission Oaks Christian School-Westwood

Michigan Algonac, Algonac Community School District: Algonac Elementary School Auburn*: Zion Lutheran School Berkley, Berkley School District: Pattengill Elementary School Bloomingdale: Bloomingdale Public School District Buckley, Buckley Community School District: Buckley Community School Canton, Plymouth-Canton Community Schools*: Gallimore Elementary School, Hoben Elementary School

20

Carleton: Airport Community School District Dafter, Sault Ste. Marie Area School District*: Bruce Township Elementary School Gaylord, Gaylord Community School District*: Elmira Elementary School, Gaylord High School, Gaylord Intermediate School, Gaylord Middle School, North Ohio Elementary School, South Maple Elementary School Grand Blanc: Grand Blanc Community School District Macomb: St. Peter Lutheran School Plymouth, Plymouth-Canton Community Schools: Allen Elementary School, Bird Elementary School, Farrand Elementary School, Fiegel Elementary School, Smith Elementary School Redford, Archdiocese of Detroit*: St. Agatha High School Reese: Trinity Lutheran School Rockwood, Gibraltar School District: Chapman Elementary School Royal Oak, Royal Oak Public School District: Dondero High School, Franklin Elementary School St. Joseph, Diocese of Kalamazoo*: Lake Michigan Catholic Elementary School, Lake Michigan Catholic Junior/Senior High School Traverse City: Traverse City Area Public Schools Wayland, Wayland Union School District: Bessie B. Baker Elementary School, Dorr Elementary School Whittemore, Whittemore Prescott Area School District: Whittemore-Prescott Alternative Education Center

Minnesota Barnum, Barnum Independent School District 91: Barnum Elementary School Baudette, Lake of the Woods Independent School District 390: Lake of the Woods School Farmington: Farmington Independent School District 192 Hanska, New Ulm Independent School District 88: Hanska Community School Hastings, Hastings Independent School District 200: Cooper Elementary School, John F. Kennedy Elementary School, Pinecrest Elementary School Isanti, Cambridge-Isanti School District 911: Isanti Middle School Lafayette, Minnesota Department of Education: Lafayette Charter School Menahga, Menahga Independent School District 821: Menahga School Mendota Heights: West St. Paul-Mendota-Eagan School District 197 Newfolden, Newfolden Independent School District 441*: Newfolden Elementary School Rochester: Schaeffer Academy St. Paul: Christ Household of Faith School Stillwater, Stillwater School District 834: Stonebridge Elementary School Watertown, Watertown-Mayer School District 111: Watertown-Mayer Elementary School, WatertownMayer High School, Watertown-Mayer Middle School Winsted, Diocese of New Ulm: Holy Trinity Elementary School, Holy Trinity High School

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 21

Missouri Cape Girardeau, Cape Girardeau School District 63: Barbara Blanchard Elementary School Cape Girardeau, Diocese of Springfield/Cape Girardeau: Notre Dame High School Lexington, Lexington School District R5: Lexington High School Liberal: Liberal School District R2 Rogersville: Greene County School District R8 Rueter, Mark Twain Elementary School District R8: Mark Twain R8 Elementary School Sparta: Sparta School District R3 Springfield: New Covenant Academy

Nebraska Burwell, Burwell High School District 100: Burwell Junior/Senior High School Creighton, Archdiocese of Omaha*: St. Ludger Elementary School Lemoyne, Keith County Centers: Keith County District 51 School Omaha, Archdiocese of Omaha: All Saints Catholic School, Guadalupe-Inez School, Holy Name School, Pope John XXIII Central Catholic High School, Roncalli Catholic High School, Sacred Heart School, SS Peter and Paul School, St. James-Seton School, St. Thomas More School Randolph: Randolph School District 45 Seward: St. John’s Lutheran School Spalding, Diocese of Grand Island*: Spalding Academy

North Dakota Belfield, Belfield Public School District 13*: Belfield School Bismarck: Dakota Adventist Academy Fargo: Grace Lutheran School Grand Forks: Grand Forks Christian School Halliday, Twin Buttes School District 37: Twin Buttes Elementary School Minot, Diocese of Bismarck: Bishop Ryan Junior/Senior High School New Town, New Town School District 1*: Edwin Loe Elementary School, New Town Middle School/High School

Ohio Akron, Akron Public Schools*: Academy At Robinson, Erie Island Montessori School, Mason Elementary School Bowling Green: Wood Public Schools Chillicothe, Chillicothe City School District: Tiffin Elementary School Cincinnati, Archdiocese of Cincinnati: Catholic Central High School, St. Brigid School Cleveland: Lutheran High School East

Dalton, Dalton Local School District: Dalton Intermediate School Danville, Danville Local School District: Danville High School, Danville Intermediate School, Danville Primary School East Cleveland, East Cleveland City School District: Caledonia Elementary School, Chambers Elementary School, Mayfair Elementary School, Rozelle Elementary School, Shaw High School, Superior Elementary School Lima, Lima City School District: Lowell Elementary School London, London City School District: London Middle School Ripley: Ripley-Union-Lewis-Huntington Elementary School Sidney, Sidney City School District: Bridgeview Middle School Steubenville, Diocese of Steubenville: Catholic Central High School Toledo, Washington Local School District: Jefferson Junior High School Upper Sandusky, Upper Sandusky Exempted Village School District*: East Elementary School

South Dakota Huron: James Valley Christian School Sioux Falls*: Calvin Christian School

Wisconsin Appleton: Fox Valley Lutheran School, Grace Christian Day School Edgerton: Oaklawn Academy Kenosha, Kenosha Unified School District 1: Bose Elementary School, Bullen Middle School, Grewenow Elementary School, Jeffery Elementary School, Lance Middle School, Lincoln Middle School, McKinley Middle School Milwaukee: Bessie M. Gray Prep. Academy*, Clara Muhammed School, Early View Academy of Excellence, Hickman’s Prep. School*, Milwaukee Multicultural Academy, Mount Olive Lutheran School, The Woodson Academy Milwaukee, Milwaukee Public Schools*: Khamit Institute Oshkosh, Diocese of Green Bay*: St. John Neumann School Plymouth, Plymouth Joint School District: Cascade Elementary School, Fairview Elementary School, Horizon Elementary School, Parkview Elementary School, Parnell Elementary School, Riverview Middle School Stoughton, Stoughton Area School District: Sandhill School, Yahara Elementary School Strum, Eleva-Strum School District: Eleva-Strum Primary School

21

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 22

West and Far West Alaska Anchorage: Heritage Christian School Juneau, Juneau School District: Auke Bay Elementary School, Dzantik’i Heeni Middle School, Floyd Dryden Middle School Nikiski, Kenai Peninsula Borough School District: Nikiski Middle High School Palmer: Valley Christian School

Arizona Litchfield Park, Litchfield Elementary School District 79: Litchfield Elementary School Mesa, Diocese of Phoenix: Christ the King School Mesa, Mesa Unified School District 4: Franklin East Elementary School Phoenix, Creighton School District 14*: Loma Linda Elementary School Phoenix, Washington Elementary School District 6: Roadrunner Elementary School Pima, Pima Unified School District 6*: Pima Elementary School Teec Nos Pos: Immanuel-Carrizo Christian Academy Tempe: Grace Community Christian School Tucson: Tucson Hebrew Academy

California Atascadero, Atascadero Unified School District: Carrisa Plains Elementary School, Creston Elementary School Bakersfield, Panama Buena Vista Union School District: Laurelglen Elementary School Cathedral City, Palm Springs Unified School District: Cathedral City Elementary School Cerritos, ABC Unified School District: Faye Ross Middle School, Joe A. Gonsalves Elementary School, Palms Elementary School Fontana, Fontana Unified School District*: North Tamarind Elementary School Fresno, Fresno Unified School District: Malloch Elementary School Lompoc, Lompoc Unified School District: LA Canada Elementary School, Leonora Fillmore Elementary School Los Angeles, Archdiocese of Los Angeles: Alverno High School, St. Lucy School Los Angeles, Los Angeles Unified School District, Local District C: Lanai Road Elementary School Los Angeles, Los Angeles Unified School District, Local District D: LA Center For Enriched Studies Los Angeles, Los Angeles Unified School District, Local District F: Pueblo De Los Angeles High School Los Angeles, Los Angeles Unified School District, Local District G: Fifty-Second Street Elementary School Los Angeles, Los Angeles Unified School District, Local District I: Alain LeRoy Locke Senior High School, David Starr Jordan Senior High School, Youth Opportunities Unlimited

22

Modesto, Modesto City School District: Everett Elementary School, Franklin Elementary School Norco, Corona-Norco Unified School District: Coronita Elementary School, Highland Elementary School Oakland, Oakland Unified School District: Lockwood Elementary School Oceanside, Oceanside Unified School District: San Luis Rey Elementary School Palm Desert, Desert Sands Unified School District: Palm Desert Middle School Ripon, Ripon Unified School District: Ripon Elementary School Salinas: Winham Street Christian Academy San Clemente, Diocese of Orange: Our Lady of Fatima School San Diego, San Diego Unified School District: Sojourner Truth Learning Academy San Diego, Streetwater Union High School District: Mar Vista Middle School San Francisco: Hebrew Academy-San Francisco Santa Ana, Diocese of Orange*: Our Lady of the Pillar School Santa Ana, Santa Ana Unified School District*: Dr. Martin L. King Elementary School, Madison Elementary School Walnut, Walnut Valley Unified School District*: Walnut Elementary School

Colorado Aurora, Aurora School District 28-J: Aurora Central High School, East Middle School Colorado Springs, Academy School District 20: Explorer Elementary School, Foothills Elementary School, Pine Valley Elementary School Denver: Beth Eden Baptist School Denver, Archdiocese of Denver: Bishop Machebeuf High School Golden, Jefferson County School District R-1: Bear Creek Elementary School, Campbell Elementary School, D’Evelyn Junior/Senior High School, Devinny Elementary School, Jefferson Academy Elementary School, Jefferson Academy Junior High School, Jefferson Hills, Lakewood Senior High School, Lincoln Academy, Moore Middle School, Sierra Elementary School Northglenn, Adams 12 Five Star Schools: Northglenn Middle School Parker, Douglas County School District R-1: Colorado Visionary Academy Penrose, Fremont School District R-2*: Penrose Elementary/Middle School Thornton, Adams 12 Five Star Schools*: Cherry Drive Elementary School, Eagleview Elementary School Windsor, Windsor School District R-4: Mountain View Elementary School, Skyview Elementary School

Hawaii Honolulu: Holy Nativity School, Hongwanji Mission School Lawai: Kahili Adventist School

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 23

Idaho Boise: Cole Christian School Burley, Cassia County Joint School District 151: Albion Elementary School, Cassia County Education Center, Declo Elementary School, Oakley Junior/Senior High School, Raft River Elementary School, Raft River High School, White Pine Intermediate School Kimberly*: Kimberly School District 414 Saint Maries: St. Maries Joint School District 41 Twin Falls: Immanuel Lutheran School

Montana Belfry, Belfry School District 3*: Belfry School Billings*: Billings Christian School Kalispell: Flathead Christian School Miles City, Diocese of Great Falls-Billings: Sacred Heart Elementary School Willow Creek, Willow Creek School District 15-17J: Willow Creek School

Nevada Amargosa Valley, Nye County School District: Amargosa Valley Elementary School Henderson*: Black Mountain Christian School Reno: Silver State Adventist School Sandy Valley, Clark County School District-Southwest*: Sandy Valley School Sparks*: Legacy Christian Elementary School

New Mexico Alamogordo, Alamogordo School District 1*: Sacramento Elementary School Albuquerque: Evangelical Christian Academy Espanola, Espanola School District 55: Chimayo Elementary School, Hernandez Elementary School, Velarde Elementary School

Oklahoma Arapaho, Arapaho School District I-5: Arapaho School Bray: Bray-Doyle School District 42 Chickasha: Chickasha School District 1 Duncan: Duncan School District 1 Elk City: Elk City School District 6 Guthrie: Guthrie School District 1 Laverne: Laverne School District Milburn, Milburn School District I-29: Milburn Elementary School, Milburn High School Mill Creek: Mill Creek School District 2 Oklahoma City: Oklahoma City School District I-89 Oklahoma City, Archdiocese of Oklahoma City: St. Charles Borromeo School Purcell: Purcell School District 15 Roland: Roland Independent School District 5 Shidler, Shidler School District 11: Shidler High School Tulsa: Tulsa Adventist Academy

Tulsa, Diocese of Tulsa: Bishop Kelley High School, Holy Family Cathedral School Wellston, Wellston School District 4*: Wellston Public School

Oregon Boring*: Hood View Junior Academy Corvallis, Corvallis School District 509J: Inavale Elementary School, Western View Middle School Eugene, Eugene School District 4J: Buena Vista Spn. Immersion School, Gilham Elementary School, Meadowlark Elementary School, Washington Elementary School Grants Pass: Brighton Academy Jefferson: Jefferson School District 14J Portland*: Portland Christian Schools Portland, Archdiocese of Portland: Fairview Christian School, O’Hara Catholic School

Texas Amarillo, Diocese of Amarillo*: Alamo Catholic High School Baird*: Baird Independent School District Brownsboro, Brownsboro Independent School District*: Brownsboro Elementary School, Brownsboro High School, Brownsboro Intermediate School, Chandler Elementary School Dallas, Dallas Independent School District-Area 1: Gilbert Cuellar Senior Elementary School Deweyville, Deweyville Independent School District: Deweyville Elementary School Dilley, Dilley Independent School District: Dilley High School Driscoll: Driscoll Independent School District Franklin: Franklin Independent School District Fresno, Fort Bend Independent School District*: Walter Moses Burton Elementary School Gladewater: Gladewater Independent School District Houston, Spring Branch Independent School District*: Cornerstone Academy, Spring Shadows Elementary School Imperial: Buena Vista Independent School District Jacksonville, Jacksonville Independent School District: Jacksonville Middle School, Joe Wright Elementary School Laredo*: Laredo Christian Academy Lubbock, Lubbock Independent School District: S. Wilson Junior High School Nederland, Nederland Independent School District*: Wilson Middle School Odessa, Ector County Independent School District: Burleson Elementary School Perryton: Perryton Independent School District Stockdale, Stockdale Independent School District: Stockdale Elementary School, Stockdale High School Sugar Land, Fort Bend Independent School District: Sugar Mill Elementary School Whitesboro, Whitesboro Independent School District: Whitesboro High School

23

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 24

Utah American Fork, Alpine School District: Lehi High School, Lone Peak High School, Manila Elementary School, Meadow Elementary School Brigham City, Box Elder County School District*: Adele C. Young Intermediate School, Box Elder Middle School, Perry Elementary School, Willard Elementary School Cedar City, Iron County School District: Cedar Middle School Eskdale: Shiloah Valley Christian School Layton, Davis County School District: Crestview Elementary School Murray: Deseret Academy Murray, Murray City School District: Liberty Elementary School Ogden*: St. Paul Lutheran School Ogden, Ogden City School District*: Bonneville Elementary School, Carl H. Taylor Elementary School, Gramercy Elementary School, Ogden High School Ogden, Weber School District*: Green Acres Elementary School Orem, Alpine School District*: Canyon View Junior High School Price: Carbon County School District Tremonton, Box Elder County School District: North Park Elementary School

24

Washington Gig Harbor: Gig Harbor Academy Seattle: North Seattle Christian School Spokane: Spokane Lutheran School Vancouver: Evergreen School District 114

Wyoming Cheyenne: Trinity Lutheran School Torrington: Valley Christian School Yoder, Goshen County School District 1: South East School

961464_ITBS_GuidetoRD.qxp

10/29/10

PART 3

3:15 PM

Page 25

Validity in the Development and Use of The Iowa Tests

Validity in Test Use Validity is an attribute of information from tests that, according to the Standards for Educational and Psychological Testing, “refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” (1999, p. 9). Assessment information is not considered valid or invalid in any absolute sense. Rather, the information is considered valid for a particular use or interpretation and invalid for another. The Standards further state that validation involves the accumulation of evidence to support the proposed score interpretations. This part of the Guide to Research and Development provides an overview of the data collected over the history of The Iowa Tests that pertains to validity. Data and research pertaining to The Iowa Tests consider the five major sources of validity evidence outlined in the Standards: (1) test content, (2) response processes, (3) internal structure, (4) relations to other variables, and (5) consequences of testing. The purposes of this part of the Guide are (1) to present the rationale for the professional judgments that lie behind the content standards and organization of the Iowa Tests of Basic Skills, (2) to describe the process used to translate those judgments into developmentally appropriate test materials, and (3) to characterize a range of appropriate uses of results and methods for reporting information on test performance to various audiences.

Criteria for Evaluating Achievement Tests Evaluating an elementary school achievement test is much like evaluating other instructional materials. In the latter case, the recommendations of other educators as well as the authors and publishers would be considered. The decision to adopt materials locally, however, would require page-by-page scrutiny of the materials to understand their content and organization.

Important factors in reviewing materials would be alignment with local educational standards and compatability with instructional methods. The evaluation of an elementary achievement test is much the same. What the authors and publisher can say about how the test was developed, what statistical data indicate about the technical characteristics of the test, and what judgments of quality unbiased experts make in reviewing the test all contribute to the final evaluation. But the decision about the potential validity of the test rests primarily on local review and item-by-item inspection of the test itself. Local analysis of test content—including judgments of its appropriateness for students, teachers, other school personnel, and the community at large—is critical.

Validity of the Tests Validity must be judged in relation to purpose. Different purposes may call for tests built to different specifications. For example, a test intended to determine whether students have reached a performance standard in a local district is unlikely to have much validity for measuring differences in progress toward individually determined goals. Similarly, a testing program designed primarily to answer “accountability” questions may not be the best program for stimulating differential instruction and creative teaching. Cronbach long ago made the point that validation is the task of the interpreter: “In the end, the responsibility for valid use of a test rests on the person who interprets it. The published research merely provides the interpreter with some facts and concepts. He has to combine these with his other knowledge about the person he tests. . . .” (1971, p. 445). Messick contended that published research should bolster facts and concepts with “some exposition of the critical value contents in which the facts are embedded and with provisional accounting of the potential social consequences of alternative test uses” (1989, p. 88). Instructional decisions involve the combination of test validity evidence and prior information about 25

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 26

the person or group tested. The information that test developers can reasonably be expected to provide about all potential uses of tests in decision-making is limited. Nevertheless, one should explain how tests are developed and provide recommendations for appropriate uses. In addition, guidelines should be established for reporting test results that lead to valid score interpretations so that the consequences of test use at the local level are clear. The procedures used to develop and revise test materials and interpretive information lay the foundation for test validity. Meaningful evidence related to inferences based on test scores, not to mention desirable consequences from those inferences, can only provide test scores with social utility if test development produces meaningful test materials. Content quality is thus the essence of arguments for test validity (Linn, Baker & Dunbar, 1991). The guiding principle for the development of The Iowa Tests is that materials presented to students be of sufficient quality to make the time spent testing instructionally useful. Passages are selected for the Reading tests, for example, not only because they yield good comprehension questions, but because they are interesting to read. Items that measure discrete skills (e.g., capitalization and punctuation) contain factual content that promotes incidental learning during the test. Experimental contexts in science expose students to novel situations through which their understanding of scientific reasoning can be measured. These examples show ways in which developers of The Iowa Tests try to design tests so taking the test can itself be considered an instructional activity. Such efforts represent the cornerstone of test validity.

Statistical Data to Be Considered The types of statistical data that might be considered as evidence of test validity include reliability coefficients, difficulty indices of individual test items, indices of the discriminating power of the items, indices of differential functioning of the items, and correlations with other measures such as course grades, scores on other tests of the same type, or experimental measures of the same content or skills. All of these types of evidence reflect on the validity of the test, but they do not guarantee its validity. They do not prove that the test measures what it purports to measure. They certainly cannot reveal whether the things being measured are those that ought to be measured. A high reliability coefficient, for example, shows that the test is measuring something consistently but does not indicate what 26

that “something” is. Given two tests with the same title, the one with the higher reliability may actually be the less valid for a particular purpose (Feldt, 1997). For example, one can build a highly reliable mathematics test by including only simple computation items, but this would not be a valid test of problem-solving skills. Similarly, a poor test may show the same distribution of item difficulties as a good test, or it may show a higher average index of discrimination than a more valid test. Correlations of test scores with other measures are evidence of the validity of a test only if the other measures are better than the test that is being evaluated. Suppose, for example, that three language tests, A, B, and C, show high correlations among themselves. These correlations may be due simply to the three tests exhibiting the same defects—such as overemphasis on memorization of rules. If Test D, on the other hand, is a superior measure of the student’s ability to apply those rules, it is unlikely to correlate highly with the other three tests. In this case, its lack of correlation with Tests A, B, and C is evidence that Test D is the more valid test. This is not meant to imply that well-designed validation studies are of no value; published tests should be supported by a continuous program of research. Rational judgment also plays a key part in evaluating the validity of achievement tests against content and process standards and in interpreting statistical evidence from validity studies.

Validity of the Tests in the Local School Standardized tests such as the Iowa Tests of Basic Skills are constructed to correspond to widely accepted goals of instruction in schools across the nation. No standardized test, no matter how carefully planned and constructed, can ever be equally suited for use in all schools. Local differences in curricular standards, grade placement, and instructional emphasis, as well as differences in the nature and characteristics of the student population, should be taken into account in evaluating the validity of a test. The two most important questions in the selection and evaluation of achievement tests at the local level should be: 1. Are the skills and abilities required for successful test performance those that are appropriate for the students in our school? 2. Are our standards of content and instructional practices represented in the test questions?

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 27

To answer these questions, those making the determination should take the test or at least answer a sample of representative questions. In taking the test, they should try to decide by which cognitive processes the student is likely to reach the correct answer. They should then ask: • Are all the cognitive processes considered important in the school represented in the test? • Are any desirable cognitive processes omitted? • Are any specific skills or abilities required for successful test performance unrelated to the goals of instruction? Evaluating an achievement test battery in this manner is time-consuming. It is, however, the only way to discern the most important differences among tests and their relationships to local curriculum standards. Considering the importance of the inferences that will later be drawn from test results and the influence the test may exert on instruction and guidance in the school, this type of careful review is important.

Domain Specifications The content and process specifications for The Iowa Tests have undergone constant revision for more than 60 years. They have involved the experience, research, and expertise of professionals from a variety of educational specialties. In particular, research in curriculum practices, test design, technical measurement procedures, and test interpretation and utilization has been a continuing feature of test development. Criteria for the design of assessments, the selection and placement of items, and the distribution of emphasis in a test include: 1. Placement and emphasis in current instructional materials, including textbooks and other forms of published materials for teaching and learning. 2. Recommendations of the education community in the form of subject-matter standards developed by national organizations, state and national curriculum frameworks, and expert opinion in instructional methods and the psychology of learning. 3. Continuous interaction with users, including discussions of needs and priorities, reviews, and suggestions for changes. Feedback from students, teachers, and administrators has resulted in improvements of many kinds (e.g., Frisbie & Andrews, 1990).

4. Frequency of need or occurrence and social utility studies in various curriculum areas. 5. Studies of frequency of misunderstanding, particularly in reading, language, and mathematics, as determined from research studies and data from item tryout. 6. Importance or cruciality, a judgment criterion that may involve frequency, seriousness of error or seriousness of the social consequences of error, expert judgment, instructional trends, public opinion, etc. 7. Independent reviews by professionals from diverse cultural groups for fairness and appropriateness of content for students of different backgrounds based on geography, race/ethnicity, gender, urban/suburban/rural environment, etc. 8. Empirical studies of differential item functioning (e.g., Qualls, 1980; Becker & Forsyth, 1994; Lewis, 1994; Lee, 1995; Lu & Dunbar, 1996; Witt, Ankenmann & Dunbar, 1996; Huang, 1998; Ankenmann, Witt & Dunbar, 1999; Dunbar, Ordman & Mengeling, 2002; Snetzler & Qualls, 2002). 9. Technical characteristics of items relating to content validity; results of studies of characteristics of item formats; studies of commonality and uniqueness of tests, etc. (e.g., Schoen, Blume & Hoover, 1990; Gerig, Nibbelink & Hoover, 1992; Nibbelink & Hoover, 1992; Nibbelink, Gerig & Hoover, 1993; Witt, 1993; Bray & Dunbar, 1994; Lewis, 1994; Frisbie & Cantor, 1995; Perkhounkova, Hoover & Ankenmann, 1997; Bishop & Frisbie, 1999; Perkhounkova & Dunbar, 1999; Lee, Dunbar & Frisbie, 2001). The importance of each of these criteria differs from grade to grade, from test to test, from level to level, and even from skill to skill within tests. For example, the correspondence between test content and textbook or instructional treatment varies considerably. In the upper grades, the Vocabulary, Reading Comprehension, and Math Problem Solving tests are relatively independent of the method used to teach these skills. On the other hand, there is a close correspondence between the first part of the Math Concepts and Estimation test and the vocabulary, scope, sequence, and methodology of leading textbooks, as well as between the second part of that test and The National Council of Teachers of Mathematics (NCTM) Standards for estimation skills. 27

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 28

Content Standards and Development Procedures New forms of The Iowa Tests are the result of an extended, iterative process during which “experimental” test materials are developed and administered to national and state samples to evaluate their measurement quality and appropriateness. The flow chart in Figure 3.1 shows the steps involved in test development. Curriculum Review Review of local, state, and national guidelines for curriculum in the subjects included in The Iowa Tests is an ongoing activity of the faculty and staff of the Iowa Testing Programs. How well The Iowa Tests reflect current trends in school curricula is monitored through contact with school administrators, curriculum coordinators, and classroom teachers across the United States. New editions of the tests are developed to be consistent with lasting shifts in curriculum and instructional practice when such changes can be accommodated by changes in test content and item format. Supplementary measures of achievement, such as the Iowa Writing Assessment and the ConstructedResponse Supplement, are developed when the need arises for a new approach to measurement. Preliminary Item Tryout Developing The Iowa Tests involves research in the areas of curriculum, instructional practice, materials design, and psychometric methods. This work contributes to the materials that undergo preliminary tryout as part of the Iowa Basic Skills Testing Program. During this phase of development, final content standards for new forms of the tests are determined. Preliminary tryouts involve multiple revisions to help ensure high quality in the materials that become part of the item bank used to develop final test forms. Materials that do not meet the necessary standards for content and technical quality are revised or discarded. The preliminary tryout of items is a regular part of the Iowa Basic Skills Testing Program. This state testing program is a cooperative effort maintained and supported by the College of Education of the University of Iowa. Over 350 school systems, testing over 250,000 students annually, administer the ITBS under uniform conditions. To participate, each school agrees to schedule a twenty-minute testing period for tryout materials for new editions. In the preliminary tryouts, new items are organized into units (short test booklets) that are distributed to 28

students in a spiraled sequence. For example, if 30 units are tried out in a given grade, each student in each consecutive set of 30 students receives a different unit. This process assures the sample of students to which each unit is administered represents all schools in the tryout. It also assures a high degree of comparability of results from unit to unit. For Levels 9 to 14, the ITBS forms published since 1955 (18,772 total items), 1,681 units with 46,741 test items were tried out in this fashion. Each unit was administered to a sample of approximately 200 students per grade, usually in three consecutive grades. For Levels 9 to 14 of Forms A and B (3,158 total items), 361 units with 8,708 items were included in the preliminary item tryout. Because most tests in Levels 5 through 8 are read aloud by the teacher, tryout units for these levels are given to intact groups. The procedures used for these tryouts involve stratification of the available schools according to prior achievement test scores. Tryout units are systematically rotated to ensure comparable groups across units. For the nine forms of Levels 5 through 8 (7,340 total items), 336 units with approximately 13,010 items were tried out. For Forms A and B (1,837 total items), 76 tryout units with 2,388 items were assembled. Nearly 200,000 students in kindergarten through grade 8 participated in the preliminary item tryouts for Forms A and B. Standard procedures are used to analyze item data from the preliminary tryout. Difficulty and discrimination indices are computed for each item; because performance in Iowa schools differs significantly and systematically from national performance, difficulty indices are adjusted. Biserial correlations between items and total unit score measure discrimination. Items with increasing percent correct in successive grades provide evidence of developmental discrimination. National Item Tryout After the results of the preliminary tryout are analyzed, items from all tests are administered to selected national samples. The national item tryout for Forms A and B of the ITBS was conducted in the fall of 1998 and the spring and fall of 1999. Approximately 100,000 students were tested (approximately 11,000 students per grade in kindergarten through grade 8). A total of 10,370 items were included in the national item tryouts for Forms A and B. The major purpose of the national item tryouts is to obtain item difficulty and discrimination data on

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 29

Figure 3.1 Steps in Development of the Iowa Tests of Basic Skills

Educational Community • National Curriculum Organizations • State Curriculum Guides • Content Standards • Textbooks and Instructional Materials

Iowa Testing Programs (ITP) Test Specifications

Item Writers

ITP Editing and Content Review

NO

Iowa Tryout

Analysis of Iowa Data NO ITP Revisions

Iowa Tryout

YES

Iowa Item Bank

RPC Editing and Content Review

National Tryout

NO

YES External Fairness/Content Review

Analysis of National Tryout Data and Reviewer Comments NO

YES

Test Item Bank

Preliminary Forms of the Tests

Special Content Review

Final Forms of the Tests

Standardization

29

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 30

a national sample of diverse curricular and demographic characteristics and objective data on possible ethnic and gender differences for the analysis of differential item functioning (DIF). Difficulty and discrimination indices, DIF analyses, and other external criteria were used for final item selection. Results of DIF analyses by gender and race/ethnicity are described in Part 7. Fairness Review Content analysis of test materials is a critical aspect of test development. To ensure that items represent ethnic groups accurately and show the range of interests of both genders, expert panels review all materials in the national item tryout. Such panels serve a variety of functions, but their role in test development is to ensure that test specifications cover what is intended given the definition of each content domain. They also ensure that item formats in each test make questions readily accessible to all students and that sources of construct-irrelevant variance are minimized. Members of these panels come from national education communities with diverse social, cultural, and geographic perspectives. A description of the fairness review procedures used for Forms A and B of the tests appears in Part 7. Development of Individual Tests The distribution of skills in Levels 5 through 14 of the Iowa Tests of Basic Skills appears in Table 3.1. The table indicates major categories in the content specifications for each test during item development. For some tests (e.g., Vocabulary), more categories are used during item development than may be used on score reports because their presence ensures variety in the materials developed and chosen for final forms. For most tests, however, the major content categories are broken down further in diagnostic score reports. The current edition of the ITBS reflects a tendency among educators to focus on core standards and goals, particularly in reading and language arts, so in some parts of the battery there are fewer specific skill categories than in earlier editions. The following descriptions of major content areas of the Complete Battery provide information about the conceptual definition of each domain. They address general issues related to measuring achievement in each domain and give the rationale for approaches to item development. A unique feature of tests such as the ITBS is that the continuum of achievement they measure spans ages 5 to 14, when cognitive and 30

social development proceed at a rapid rate. The definition of each domain describes school achievement over a wide range. Thus, each test in the battery is actually conceived as a broad measure of educational development in school-related subjects for students ages 5 through 14. Vocabulary

Understanding the meanings of words is essential to all communication and learning. Schools can contribute to vocabulary power through planned, systematic instruction; informal instruction whenever the opportunity arises; reading of a variety of materials; and activities and experiences, such as field trips and assemblies. One of a teacher’s most important responsibilities is to provide students with an understanding of the specialized vocabulary and concepts of each subject area. The Vocabulary test involves reading and word meaning as well as concept development. Linguistic/structural distinctions among words are monitored during test development so each form and level includes nouns, verbs, and modifiers. At each test level, vocabulary words come from a range of subjects and represent diverse experiences. Because the purpose of the Vocabulary test is to provide a global measure of word knowledge, specific skills are not reported. Monitoring the parts of speech in the Vocabulary test is important because vocabulary is closely tied to concept development. The classification of words is based on functional distinctions; i.e., the part that nouns, verbs, modifiers, and connectives play in language. Representing the domain of word knowledge in this way is especially useful for language programs that emphasize writing. Although words are categorized by part of speech (nouns, verbs, and modifiers) for test development, skill scores are not reported by category. In Levels 5 and 6, the Vocabulary test is read aloud by the teacher. The student is required to identify the picture that goes with a stimulus word. In Levels 7 and 8, the Vocabulary test is read silently by the student and requires decoding skills. In the first part, the student selects the word that describes a stimulus picture. In the second part, the student must understand the meaning of words in the context of a sentence. In Levels 9 through 14, items consist of a word in context followed by four possible definitions. Stimulus words were chosen from The Living Word Vocabulary (Dale & O’Rourke, 1981), as were words constituting the definitions. Emphasis is on grade-

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 31

Table 3.1 Distribution of Skills Objectives for the Iowa Tests of Basic Skills, Forms A and B

Levels 5 and 6 Test

Levels 7 and 8

Levels 9 –14

Number of Major Categories

Number of Skills Objectives

Number of Major Categories

Vocabulary

1

3

1

3

1

3

Reading Comprehension

2

5

2

4

3

9

3

8

3

7

4

12

Spelling





4

4

3

5

Capitalization









7

19

Punctuation









4

21

Usage and Expression









5

22

7

7

4

13

19

67

Math Concepts and Estimation





4

13

6

20

Math Problem Solving and Data Interpretation





6

12

6

13

Number of Skills Objectives

Number of Major Categories

Number of Skills Objectives

Reading

Reading Total Language

Language Total Mathematics

Math Computation





2

6

12

24

4

11

12

31

24

57

Social Studies





4

12

4

14

Science





4

10

4

11

Maps and Diagrams









3

10

Reference Materials









3

10





4

9

6

20

Listening

2

8

2

8

2*

8*

Word Analysis

2

6

2

8

2*

8*

18

40

35

98

Math Total

Sources of Information

Sources Total

Total

65

197

*Listening and Word Analysis are supplementary tests at Level 9.

level appropriate vocabulary that children are likely to encounter and use in daily activities, both in and out of school, rather than on specialized or esoteric words, jargon, or colloquialisms. Nouns, verbs, and modifiers are given approximately equal representation. Target words are presented in a short context to narrow the range of meaning and, in the upper grades, to allow testing for knowledge of uncommon meanings of common vocabulary words. Word selection is carefully monitored to prevent the use of extremely common words and cognates as distractors and to ensure that the same target

words do not appear in parallel forms of current and previous editions of the tests. Few words in the English language have exactly the same meaning. An effective writer or speaker is one who can select words that express ideas precisely. It is not the purpose of an item in the Vocabulary test to determine whether the student knows the meaning of a single word (the stimulus word). Nor is it necessary that the response words be easier or more frequently used than the stimulus word, although they tend to be. Rather, the immediate 31

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 32

purpose of each item is to determine if the student is able to discriminate among the shades of meaning of all words used in the item. Thus, a forty-item Vocabulary test may sample as many as 200 words from a student’s general vocabulary. Word Analysis

The purpose of the Word Analysis test is to provide diagnostic information about a student’s ability to identify and analyze distinct sounds and symbols of spoken and written language. At all levels the test emphasizes the student’s ability to transfer phonological representations of language to their graphemic counterparts. Transfer is from sounds to symbols in all items, which is consistent with developmental patterns in language learning. Word analysis skills are tested in Levels 5 through 9. Skills involving sound-letter association, phonemic awareness, and word structure are represented. Stimuli consist of pictures, spoken language, written language, and novel words. In Levels 5 and 6, the skills involve letter recognition, letter-sound correspondence, initial sounds, final sounds, and rhyming sounds. In Levels 7 through 9, more complex phonological structures are introduced: medial sounds; silent letters; initial, medial, and final substitutions; long and short vowel sounds; affixes and inflections; and compound words. Items in the Word Analysis test measure decoding skills that require knowledge of grapheme-phoneme relationships. Information from such items may be useful in diagnosing difficulties of students with low scores in reading comprehension. Reading

The Reading tests in the Complete Battery of the ITBS have emphasized comprehension over the 15 editions. This emphasis continues in the test specifications for all levels. The Reading tests are concerned with a student’s ability to derive meaning; skills related to the so-called building blocks of reading comprehension (graphemephoneme connections, word attack, sentence comprehension) are tested in other parts of the battery in the primary grades. When students reach an age when independent reading is a regular part of daily classroom activity, the emphasis shifts to questions that measure how students derive meaning from what they read. The Reading test in Levels 6 through 8 of the ITBS accommodates a wide range of achievement in early reading. Level 6 consists of Reading Words and Reading Comprehension. Reading Words includes 32

three types of items to measure how well the student can identify and decode letters and words in context. Auditory and picture cues are used to measure word recognition. Complete sentences accompanied by picture cues are used to measure word attack. Reading Comprehension includes three types of items to assess how well the student can understand sentences, picture stories, and paragraphs. Separate scores are reported for Reading Words and Reading Comprehension so it is possible to obtain a Reading score for students who are not reading beyond the word level. In Levels 7 and 8, the Reading test measures sentence and story comprehension. Sentence comprehension is a cloze task that requires the student to select the word that completes a sentence appropriately. Story comprehension is assessed with pictures or text as stimuli. Pictures that tell a story are followed by questions that require students to identify the story line and understand connections between characters and plot. Fiction and nonfiction topics are used to measure how well the student can read and comprehend paragraphs. Story comprehension skills require students to understand factual details as well as make inferences and draw generalizations from what they have read. Levels 5 through 9 of the Complete Battery of Forms A and B combine the tests related to early reading in the Primary Reading Profile. The reading profile can be used to help explain the reasons for low scores on the Reading test at these levels. It is described in the Interpretive Guide for Teachers and Counselors for Levels 5–8. New to Forms A and B is a two-part structure for the Reading Comprehension test at Levels 9–14. Prior to standardization, a preliminary version of the Form A Reading Comprehension test was administered in two separately timed sections. Item analysis statistics and completion rates indicated that the two-part structure was preferable to a single, timed administration. In Levels 9 through 14, the Reading Comprehension test consists of passages that vary in length, generally becoming longer and more complex in the progression from Levels 9 to 14. The range of content in Form A of the Complete Battery is shown in Table 3.2. The passages represent various types of material students read in and out of school. They come from previously published material and information sources, and they include narrative, poetry, and topics in science and social studies. In addition, some passages explain how to make or do something, express an opinion or point of view, or

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 33

Table 3.2 Types of Reading Materials Iowa Tests of Basic Skills — Complete Battery, Form A Items

Nonfiction (Biography) Social Studies (U.S. History) Literature (Fiction) Literature (Poetry) Science (Field Observation) Nonfiction (Newspaper Editorial) Social Studies (Anthropology) Science (Earth Science)

6 4 4 7 6 4 6 5 7 4

Table 3.3 presents the content specifications for the Reading Comprehension test. It includes a sample of items from Levels 10 and 14 that corresponds to each process skill in reading. More detailed information about other levels of the Reading Comprehension test is provided in the Interpretive Guide for Teachers and Counselors.

7 6 6 4 8

Level 13

Nonfiction (Social Roles) Science (Human Anatomy) Social Studies (Preservation) Social Studies (Government Agency)

5

Listening

5 7 5 7 8 7 5

Level 14

Nonfiction (Personal Essay) Literature (Poetry) Literature (Fiction) Social Studies (Culture and Traditions)

5

Level 11

Nonfiction (Transportation) Literature (Fiction) Nonfiction (Food and Culture) Science (Insects)

4 5

Reading requires critical thinking on a number of levels. The ability to decode words and understand literal meaning is, of course, important. Yet active, strategic reading has many other components. An effective reader must draw on experience and background knowledge to make inferences and generalizations that go beyond the words on the page. Active readers continually evaluate what they read to comprehend it fully. They make judgments about the central ideas of the selection, the author’s point of view or purpose, and the organizational scheme and stylistic qualities used. This is true at all developmental levels. Children do not suddenly learn to read with such comprehension at any particular age or grade. Thoughtful reading is the result of a period of growth in comprehension that begins in kindergarten or first grade; no amount of concentrated instruction in the upper elementary grades can make up for a lack of attention to reading for meaning in the middle or lower grades. Measurement of these aspects of the reading process is required if test results are used to support inferences about reading comprehension. The ITBS Reading tests are based on content standards that reflect reading as a dynamic cognitive process.

Level 12

Nonfiction (Biography) Science (Animal Behavior) Literature (Folktale) Literature (Poetry)

4 4

Level 9

Literature (Fiction) Literature (Fable) Social Studies (Urban Wildlife) Literature (Fiction)

Level 10

Passages

describe a person or topic of general interest. When needed, an introductory note is included to give background information. Passages are chosen to satisfy high standards for writing quality and appeal. Good literature and high-quality nonfiction offer opportunities for questions that tap complex cognitive processes and that sustain student interest.

Listening comprehension is measured in Levels 5 through 9 of the ITBS Complete Battery and in Levels 9 through 14 of the Listening Assessment for ITBS. The Listening Assessment for ITBS is described in Part 9. Listening is often referred to as a “neglected area” of the school curriculum. Children become good listeners through a combination of direct instruction and incidental learning; however, children with good listening strategies use them throughout their school experiences. Good listening strategies developed in the early elementary years contribute to effective learning later. 33

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 34

Table 3.3 Reading Content/Process Standards Iowa Tests of Basic Skills — Complete Battery, Form A

Illustrative Items Content/Process Standards Level 10

Level 14

Factual Understanding • Understand stated information • Understand words in context

6, 15, 28 19, 25

26, 35, 42 5, 28, 38

Inference and Interpretation • Draw conclusions • Infer traits, feelings, or motives of characters • Interpret information in new contexts • Interpret nonliteral language

4, 18, 29 17, 27, 30 7, 26 12, 14

19, 27, 41 2, 24 44, 51 21, 37

Analysis and Generalization • Determine main ideas • Identify author’s purpose or viewpoint • Analyze style or structure of a passage

11, 33 37 13, 20

17, 25, 47 39, 52 11, 22, 45

Levels 5 through 9 of the Listening test measure general comprehension of spoken language. Table 3.4 shows the content/process standards for these tests. They emphasize understanding meaning at all levels. Many comprehension skills measured by the Listening tests in the early grades reflect aspects of cognition measured by the Reading Comprehension tests in the later grades, when a student’s ability to construct meaning from written text has advanced. Such items would be much too difficult for the Reading tests in the Primary Battery because of the complex reading material needed to tap these skills. It is possible, however, to measure such skills through spoken language, so at the early levels the Listening tests are important indicators of the cognitive processes that influence reading. Language

Language arts programs comprise the four communication skills that prepare students for effective interaction with others: reading, writing, listening, and speaking. These aspects of language are assessed in several ways in The Iowa Tests. Reading, because of its importance in the elementary grades, is assessed by separate tests in Levels 6 through 14 of the ITBS. Writing, the process of generating, organizing, and expressing ideas in written form, is the focus of the Iowa Writing Assessment in Levels 9 through 14. 34

Table 3.4 Listening Content/Process Standards Iowa Tests of Basic Skills — Complete Battery Process Skill

Levels Levels 5–9 9–14*

Literal Meaning Following Directions Visual Relationships Sustained Listening Inferential Meaning Concept Development Predicting Outcomes Sequential Relationships Numerical / Spatial / Temporal Relationships Speaker’s Purpose, Point of View, or Style *Listening is a supplementary test at these levels.

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 35

Listening, the process of paying attention to and understanding spoken language, is measured in the ITBS Complete Battery at Levels 5 through 9 and in the Listening Assessment for ITBS in Levels 9 through 14. Selecting or developing revisions of written text is measured by the ConstructedResponse Supplement to The Iowa Tests: Thinking About Language in Levels 9 through 14. The domain specifications for the Complete Battery and Survey Battery of the ITBS identify aspects of spoken and written language important to clarity of thought and expression. The complexity of tasks presented and the transition from spoken to written language both progress from Level 5 through Level 14. The Language tests in Levels 5 and 6 measure the student’s comprehension of linguistic relationships common to spoken and written language. The tests focus on ways language is used to express ideas and to understand relationships. The student is asked to select a picture that best represents the idea expressed in what the teacher reads aloud. The subdomains of the Language tests in Levels 5 and 6 include: Operational Language: understanding relationships among subject, verb, object, and modifier Verb Tense: discriminating past, present, and future Classification: recognizing common characteristics or functions Prepositions: understanding relationships such as below, behind, between, etc. Singular-Plural: differentiating singular and plural referents Comparative-Superlative: understanding adjectives that denote comparison Spatial-Directional: translating verbal descriptions into pictures The Language tests in Levels 7 through 14 of the Complete Battery and Survey Battery were developed from domain specifications with primary emphasis on linguistic conventions common to standard written English.1 Although writing is

taught in a variety of ways, the approaches share a common goal: a written product that expresses the writer’s meaning as precisely as possible. An important quality of good writing is a command of the conventions of written English that allows the writer to communicate effectively with the intended audience. The proofreading, editing, and revising stages of the writing process involve these skills, and the proofreading format used for the Language tests is an efficient way to measure knowledge of these conventions. Although linguistic conventions change constantly (O’Conner, 1996), the basic skills in written communication have changed little over the years. The importance of precision in written communication is greater than ever for an increasing proportion of the adult population, whether because of the Internet or because of greater demand for information. To develop tests of language skills, authors must strike a balance between precision on the one hand and fluctuating standards of appropriateness on the other. In general, skills for the Language tests sample aspects of spelling, capitalization, punctuation, and usage pertaining to standard written English, according to language specialists and writing guides (e.g., The Chicago Manual of Style, Webster’s Guide to English Usage, The American Heritage Dictionary). Content standards for usage and written expression continue to evolve, reflecting the strong emphasis on effective writing in language arts programs. Levels 7 and 8 of the ITBS provide a smooth transition from the emphasis on spoken language found in Levels 5 and 6 to the emphasis on written language found in Levels 9 through 14. The entire test at Levels 7 and 8 is read aloud by the teacher. In sections involving written language, students read along with the teacher as they take the test. Spelling is measured in a separate test; the teacher reads a sentence that contains three keywords, and the student identifies which word is spelled incorrectly. Capitalization, punctuation, and usage/expression are measured in context, using items similar in format to those in Levels 9 through 14. The content of Levels 9 through 14 includes skills in spelling, capitalization, punctuation, and

1

The Language tests measure skills in the conventions of “standard” written English. Students with multicultural backgrounds or, particularly, second-language backgrounds may have special difficulty with certain types of situations presented in the Language tests. It is important to remember that the tests measure proficiency in standard written English, which may differ from the background language of the home. Such differences should be taken into consideration in interpreting the scores from the Language tests, and in any follow-up instruction that may be based, in part, on test results.

35

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 36

usage/expression. Separate tests in each area are used in the Complete Battery; a single test is used in the Survey Battery. Table 3.5 shows the distribution of skills for language tests in the Complete Battery and Survey Battery of Level 10, Form A. Writing effectively requires command of linguistic conventions in all of these areas at once, but greater diagnostic information about strengths and weaknesses in writing is obtained using separate tests in each area. Table 3.5 Comparison of Language Tests by Battery Iowa Tests of Basic Skills — Level 10, Form A Battery Content Area

Complete

Survey

Spelling Root Words Words with Affixes Correct Spelling

32 22 6 4

11

Capitalization Names and Titles Dates and Holidays Place Names Organizations and Groups Writing Conventions Overcapitalization Correct Capitalization

26 3 3 6 3 6 2 3

10

Punctuation End Punctuation Comma Other Punctuation Marks Correct Punctuation

26 12 7 4 3

10

Usage and Expression Nouns, Pronouns, and Modifiers Verbs Conciseness and Clarity Organization of Ideas Appropriate Use

33 9

16

Total

117

8 4 5 7 47

The development of separate tests of language skills offers several advantages. First, content that offers opportunities to measure one skill (e.g., salutations in business letters on a punctuation test) may not offer opportunities to measure another. It is extremely difficult to construct a single test that will provide as comprehensive a definition of each domain—and hence as valid a test—as a separate test in each domain. Second, a single language test 36

covering all skills would need many items to yield a reliable, norm-referenced score in each area. (Note that national percentile ranks are not provided with the Survey Battery in spelling, capitalization, punctuation, and usage/expression.) Third, the directions to students can be complicated when items covering all four skills are included. Finally, it is easier to maintain uniformly high-quality test items if they focus on specific skills. In a unitary test, to retain good items it is sometimes necessary to include a less than ideal item associated with the same stimulus material. A comprehensive school curriculum in the language arts is likely to be considerably broader in coverage than the content standards of the ITBS Language tests. For example, many language arts programs teach the general research skills students need for lifelong learning. Because these abilities are used in many school subjects, they are tested in the Reference Materials test rather than in the Language tests. This permits more thorough coverage of reference skills and underscores their importance to all teachers. Language arts programs also develop competence in writing by having students read each other’s writing. Although such skills are measured in the second part of the Usage and Expression test, they are also covered in the Reading Comprehension test in standards related to inference and generalization. Each Language test in the Complete Battery is developed through delineation of the relevant domain. The content specifications are adjusted for each edition to reflect changing patterns in the conventions of standard written English. The trend toward so-called open punctuation, for example, has led to combining certain skills for that test in the current edition. Other details about content specifications appear in the Interpretive Guide for Teachers and Counselors. Domain descriptions for the language tests are as follows: Spelling. The Spelling test for Levels 9 through 14 directly measures a student’s ability to identify misspelled words. The items consist of four words, one of which may be misspelled. The student is asked to identify the incorrectly spelled word. A fifth response, “No mistakes,” is the correct response for approximately one-eighth of the items. The main advantage of the item format for Spelling is that it tests four spelling words in each item. Another advantage is the words tested better represent spelling words taught at target grade levels. With this item format, one can obtain suitable reliability and validity without using more

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 37

advanced or less familiar spelling words. Careful selection of words is the crucial aspect of content validity in spelling tests, regardless of item format. The spelling words chosen for each form of the test come from the speaking vocabulary of students at a given grade level. Errors are patterned after misspellings observed in student writing. In addition, misspellings are checked so that: (1) the keyword can be identified by the student despite its misspelling, (2) a misspelled English word is not a correctly spelled word in a common second language, and (3) target words do not appear in parallel forms of the current or previous edition of the battery. Spelling words are also selected to avoid overlap with target words and distractors in the Vocabulary test at the same level. The type of spelling item used on The Iowa Tests has been demonstrated to be superior to the type that presents four possible spellings of the same word. The latter type has several weaknesses. Many frequently misspelled words have only one common misspelling. Other spelling variations included as response options are seldom selected, limiting what the item measures. In addition, the difficulty of many spelling items in this form doesn’t reflect the frequency with which the word is misspelled. This inconsistency raises doubt about the validity of a test composed of such items. Educators often question the validity of multiplechoice spelling tests versus list-dictation tests. However, a strong relationship between dictation tests and certain multiple-choice tests has repeatedly been found (Frisbie & Cantor, 1995). Capitalization and Punctuation. Capitalization and punctuation skills function in writing rather than in speaking or reading. Therefore, a valid test of these skills should include language that might have been drawn from the written language of a student. The phrasing of items should also be on a level of sophistication commensurate with the age and developmental level of the student. Efforts have been made to include materials that might have come from letters, reports, stories, and other writing from a student’s classroom experience. The item formats in the Capitalization and Punctuation tests are similar. They include one or two sentences over three lines about equal in length. The student identifies the line with an error or selects a fourth response to indicate no error.

This item type, uncommon in language testing, was the subject of extensive empirical study before being used in the ITBS. Large-scale tryout of experimental tests composed of such items indicated the reliability per unit of testing time was at least as high as that of more familiar item types. Items of this type also have certain logical advantages. An item in uninterrupted discourse is more likely to differentiate between students who routinely use correct language and those who lack procedural knowledge of the writing conventions measured. In the traditional multiple-choice item, the error situations are identified. For example, the student might only be required to decide whether “Canada” should be capitalized or whether a comma is required in a given place in a sentence. In the find-the-error item type used on the ITBS, however, error situations are not identified. The student must be sensitive to errors when they occur. Such items more realistically reflect the situations students encounter in peer editing or in revising their writing. Another reason for using the find-the-error format concerns the frequency of various types of errors. Some errors occur infrequently but are serious nonetheless. With a find-the-error item, all of the student’s language practices, good and bad, have an opportunity to surface during testing. Such items can be made easy or difficult as test specifications demand without resorting to artificial language or esoteric conventions. Usage and Expression. Knowledge and use of word forms and grammatical constructions are revealed in both spoken and written language. Spoken language varies with the social context. Effective language users change the way they express themselves depending on their audience and their intended meaning. Written language in school is a specific aspect of language use, and tests of written expression typically reflect writing conventions associated with “standard” English. Mastery of this aspect of written expression is a common goal of language arts programs in the elementary and middle grades; other important goals tend to vary from school to school, district to district, and state to state, and they tend to be elusive to measure. This is why broad-range achievement test batteries such as the ITBS focus exclusively on standard English. The mode of expression, spoken or written, influences how a person chooses words to express meaning. A student’s speech patterns may contain constructions that would be considered inappropriate in formal writing situations. Similarly, some forms 37

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 38

of written expression would be awkward if spoken because complex constructions do not typically occur in conversation. The ITBS Usage and Expression test includes situations that could arise in both written and spoken language. The first part of the Usage and Expression test uses the same item format found in Capitalization and Punctuation. Items contain one to three sentences, one of which may have a usage error. Students identify the line with the error or select “No mistakes” if they think there is no error. Some of the items in this part form a connected story. The validity, reliability, and functional characteristics of this item type are important considerations in its use. In a study of why students selected various distractors, students were found to make various usage errors—many more than could be sampled in a test of reasonable length. Satisfactory reliability could be achieved with fewer items if the items contained distractors in which other types of usage errors are commonly made. Comparisons of find-theerror items and other item formats indicated better item discrimination for the former. The difficulty of the find-the-error items also reflected more closely the frequency with which various errors occur in student speech and writing. The second part of the Usage and Expression test assesses broader aspects of written language, such as concise expression, paragraph development, and appropriateness to audience and purpose. This part includes language situations more directly associated with editing and revising paragraphs and stories. Students are required to discriminate between more and less desirable ways to express the same idea based on content standards in the areas of: Conciseness and Clarity: being clear and using as few words as possible Appropriateness: recognizing the word, phrase, sentence, or paragraph that is most appropriate for a given purpose Organization: understanding the structure of sentences and paragraphs The item types in this part of the test were determined by the complexity and linguistic level of the skill to be assessed. In some cases, the student is asked to select the best word or phrase for a given situation; in others, the choice is between sentences or paragraphs. In all cases, the student must evaluate the effectiveness or appropriateness of alternate expressions of the same idea. 38

What constitutes “good” or “correct” usage varies with situations, audiences, and cultural influences. Effective language teaching includes appreciation of the influence of context and culture on usage, but no attempt is made to assess this type of linguistic awareness. A single test embracing these objectives would involve more than “basic” skills and would have to sample the language domain beyond what is common to most school curricula. Mathematics

In general, changes occur slowly in the nature of basic skills and objectives of instruction. The field of mathematics, however, has been a noticeable exception. Elementary school mathematics has been in the process of continual change over the past 45 to 50 years. In grades 3 through 8, the math programs of recent years have modified method, placement, sequence, and emphasis, but quantitative reasoning and problem solving remain important. The National Council of Teachers of Mathematics (NCTM) Principles and Standards for School Mathematics (2000) describes the content of the math curriculum and the process by which it is assessed. Changes in content and emphasis of the ITBS Math tests reflect these new ideas about school math curricula. Forms 1 and 2 of the ITBS were constructed when textbook series exhibited remarkable uniformity in the grade placement of math concepts. Forms 3 and 4 were developed during the transition from “traditional” to “modern” math. In the three years following publication of Forms 3 and 4, math programs changed so dramatically that a special interim edition, The Modern Mathematics Supplement, was published to update the tests. During the late 1960s and early 1970s, the math curriculum was relatively stable. Forms 5 and 6 (standardized in 1970–71) increasingly emphasized concept development, whereas Forms 7 and 8 (standardized in 1977–78) shifted the emphasis to computation processes and problem-solving strategies. Greater attention was paid to estimation skills and mental arithmetic in Forms G and H (standardized in 1984–85). The 1989 NCTM Standards led to significant changes in two of the ITBS Math tests (Concepts and Estimation and Problem Solving and Data Interpretation) in Forms K, L, and M. The 2000 revision of the NCTM Standards is reflected in slight modifications of these tests in Forms A and B. Concepts and Estimation. The curriculum and teaching methods in mathematics show great diversity. Newer programs reflect the NCTM Standards more directly, yet some programs have

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 39

endorsed new standards for math education without significantly changing what is taught. In part, diverse approaches to math teaching belie the high degree of similarity in the purposes and objectives of math education. As with any content area, the method used to teach math is probably less important than a teacher’s skill with the method. When new content standards and methods are introduced, teachers need time to apply them; teachers must learn what works, what does not work, and what to emphasize. During times of curriculum transition, an important part of a teacher’s experience is adjusting to changes in assessment based on new standards. The Iowa Tests have always emphasized understanding, discovery, and quantitative thinking. Math teachers know students need more time to understand a fact or a process when meaning is stressed than when math is taught simply by drill and practice. In the long run, children taught by methods that focus on understanding will develop greater competence, even though they may not master facts as quickly in the early stages of learning. Even with a test aimed at a single grade level, compromises are necessary in placement of test content. A test with many items on concepts taught late in the school year may be inappropriate to use early in the school year. Conversely, if concepts taught late in the school year are not covered on the test, this diminishes the validity of the test if administered at the end of the year. In the Concepts and Estimation test, a student in a given grade should be familiar with 80 to 85 percent of the items for that grade at the beginning of the school year. By midyear, a student should be familiar with 90 to 95 percent of the items for that grade. The remaining items require understanding usually gained in the second half of the year. Assigning levels for the Concepts and Estimation test should be done carefully, because shifts in local curriculum can affect which test levels are appropriate for which grades. Beginning with Forms K and L, Levels 9 through 14 (grades 3 through 8), the Math Concepts test became a two-part test that included estimation. The name of the test was changed to Concepts and Estimation because of the separately timed estimation section. Part 1 is similar to the Math Concepts test in previous editions of the ITBS. In Forms A and B, this part continues to focus on numeration, properties of number systems, and number sequences; fundamental algebraic concepts; and basic measurement and geometric concepts. More emphasis is placed on probability and

statistics. As in past editions, computational requirements of Part 1 are minimal. Students may use a calculator on Part 1. Part 2 of the Concepts and Estimation test is separately timed to measure computational estimation. Early editions included a few estimation items in the Concepts test (about 5 percent). The changing role of computation in the math curriculum, however, created by the growing use of calculators and the continued need for estimation skills in everyday life, requires a more prominent role for estimation. Both the 1989 and 2000 NCTM Standards documented the importance of estimation. Studies indicate that, with proper directions and time limits, students will use estimation strategies and will rarely resort to exact computation (Schoen, Blume & Hoover, 1990). Several aspects of estimation are represented in Part 2 of the Concepts and Estimation test, including: (a) standard rounding—rounding to the closest power of 10 or, in the case of mixed numbers, to the closest whole numbers; (b) order of magnitude involving powers of ten; and (c) number sense, including compatible numbers and situations that require compensation. Besides varying the estimation strategy, some items place estimation tasks in context and others use symbolic form. Student performance on estimation items can differ dramatically depending on whether a context is provided. Because estimation strategies in connection with the use of a calculator or computer rarely involve context, some items are presented without a context. At lower levels of the test, about two-thirds of the items are in context. This proportion decreases to roughly one-half at the upper levels. In Forms A and B, the estimation section was shortened by about 50 percent from what it had been in Forms K, L, and M. Experiences of users indicated that sufficient content coverage and reliability would be obtained with fewer items. Using a calculator is not permitted on this part of the test. Problem Solving and Data Interpretation. Another change in the ITBS Math tests occurred in the Problem Solving and Data Interpretation test. The 2000 NCTM Standards called for continued attention to problem solving in the math curriculum with added emphasis on interpretation and analysis of data. Forms K, L, and M, which were designed with this emphasis in mind, contained two parts: one focusing on problem-solving skills and the other on interpreting data in graphs and tables. Part 1 of Problem Solving and Data Interpretation was 39

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 40

similar in format to earlier editions of the ITBS Problem Solving test. Part 2 included materials that had been in one of the tests on work-study skills. In Forms A and B, problem solving and data interpretation are integrated. The problem situations, graphs, and tables for this test are based on real data and emphasize connections to other areas of the curriculum. The content of the Math Problem Solving tests in the ITBS has been strongly influenced by historical changes in the design of the battery. The addition of a test of computational speed and accuracy beginning with Forms 7 and 8 in the late 1970s marked a fundamental change in the definition of problem solving in the ITBS. What had been a domain that included computational skills in an applied setting became one of problems that require fundamental (often multiple) operations and concepts in a meaningful context. Problem Solving and Data Interpretation still requires computation. But the operations and concepts, in most cases, have been introduced at least a year before the grade level for which the test is intended. Most of the operations are basic facts or facts that do not require renaming. The total number of operations at the upper levels is substantially greater than the number of items. This reflects the large proportion of items at these levels that require multiple steps to solve. The content standards emphasize problem contexts with solution strategies beyond simple application of computational algorithms memorized through a drill and practice curriculum. Table 3.6 outlines the computational skills required for Form A of the Problem Solving and Data Interpretation test. In mathematics, an ideal problem is one that is novel for the individual solving it. Many problems in instructional materials, however, might be better described as “exercises.” Often they are identical or similar to others explained in textbooks or by the teacher. In such examples, the student is not called on to do anything new; modeling, imitation, and recall are the primary behaviors required. This is not to suggest that repetition—such as working exercises and practicing basic facts—is not useful; indeed, it can be important. However, opportunities should also be provided for students to solve realistic problems in novel situations they might experience in daily life. Problem Solving and Data Interpretation includes items that measure “problem-solving process” or “strategy.” These item types were adapted from Polya’s four-step problem-solving model (Polya, 1957). As part of the Iowa Problem Solving Project (Schoen, Blume & Hoover, 1990), items were 40

developed to measure the steps of (1) getting to know the problem, (2) choosing what to do, (3) doing it, and (4) looking back. Information gathered as part of this project was used to integrate items of this type into The Iowa Tests. Including questions that require the interpretation of data underscores a long-standing belief in the importance of graphical displays of quantitative information. The ITBS was the first achievement battery to assess a student’s ability to interpret data in graphs and tables. Such items have been included since the first edition in 1935 and appeared in a separate test until 1978. In more recent editions, these items were part of a test called Visual Materials. Formal instruction in the interpretation and analysis of graphs, tables, and charts has become part of the mathematics curriculum as a result of the 1989 and 2000 NCTM Standards. This trend reflects increased emphasis on statistics at the elementary level. The interpretation of numerical information presented in graphs and tables provides the foundation for basic descriptive statistics. The data interpretation skills assessed in this test are reading amounts, comparing quantities, and interpreting relationships and trends in graphs and tables. Stimulus materials include pictographs, circle graphs, bar graphs, and line graphs. Tables, charts, and other visual displays from magazines, newspapers, television, and computers are also presented. The information is authentic, and the graphical or tabular form used is logical for the data. Items in this part of the test essentially require no computation. Students may use a calculator on the Problem Solving and Data Interpretation test. Computation. The early editions of the ITBS measured computation skills with word problems in the Mathematics Problem Solving test just described. Computational load is now considered a confounded effect in problem solving items. As problem solving itself became the focus of that test, an independent measure of speed and accuracy in computation was needed. Beginning with Forms 7 and 8, a separate test of computational skill was added. Instruction in computation takes place with whole numbers, fractions, and decimals. Each of these areas includes addition, subtraction, multiplication, and division. Although much computation in “real world” settings involves currency, percents, and ratio or proportion, these applications are nothing more than special cases of basic operations with whole numbers, fractions, and decimals. “Real world” problems that require performing these

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 41

specialized computation skills are still part of the Problem Solving and Data Interpretation test. The logic of the computation process and the student’s understanding of algorithms are measured in the first part of the Concepts and Estimation test. The grade placement of content in a computation test is more crucial than in other areas of mathematics. For example, whole number division may be introduced toward the end of third grade in many textbooks. Including items that measure this skill in the Level 9 test would be inappropriate if students have not yet been taught the process. In placing items that measure specific computational skills, only skills taught during the school year preceding the year when the test level is typically used are included. The Computation test for Levels 9 through 14 of Forms A and B differs only in length from recent editions. At all levels, testing time and number of items were each reduced by about 25 percent. The small decrease in the proportion of fraction items and the small increase in the proportion of decimal items made with Forms K, L, and M was maintained in Forms A and B. These modifications reflect increased emphasis on calculators, computers, and metric applications in the math curriculum. This

test remains a direct measure of computation that requires a single operation—addition, subtraction, multiplication, or division on whole numbers, fractions, or decimals at appropriate grade levels. Unlike some tests of computation, the ITBS Math Computation test does not confound concept and estimation skills with computational skill. Computation is included in the Math Total score unless a special composite, Math Total without Computation, is requested. Social Studies

The content domain for the Social Studies test was designed to broaden the scope of the basic skills and to assess progress toward additional concepts and understandings in this area. Although the Social Studies test requires knowledge, it deals with more than memorization of facts or the outcomes of a particular textbook series or course of study. It is concerned with generalizations and applications of principles learned in the social studies curriculum. Many questions on the Social Studies test involve materials commonly found in social studies instruction—timelines, maps, graphs, and other visual stimuli.

Table 3.6 Computational Skill Level Required for Math Problem Solving and Data Interpretation Iowa Tests of Basic Skills — Complete Battery, Form A (Number of operations and percent of items per level) Level Operation

Whole Numbers and Currency (Totals)

• Basic facts (addition, subtraction, multiplication, and division) • Other sums, differences, products, and quotients: No renaming (no remainder) • Other sums, differences, products, and quotients: Renaming (remainder) Fractions and Decimals Total Number of Operations Number of Items Requiring Computation Number of Items Requiring No Computation

7

8

9

10

11

12

13

14

20 (100)

24 (100)

15 (100)

19 (95)

18 (90)

15 (75)

18 (72)

23 (85)

19 (95)

19 (79)

8 (53)

7 (35)

5 (25)

7 (35)

6 (24)

4 (15)

1 (5)

5 (21)

6 (40)

12 (60)

10 (50)

3 (15)

6 (24)

12 (44)





1 (7)



3 (15)

5 (25)

6 (24)

7 (26)







1 (5)

2 (10)

5 (25)

7 (28)

4 (15)

20

24

15

20

20

20

25

27

17

16

12

11

13

14

15

15

11

14

10

13

13

14

15

17

41

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 42

The content areas in the Social Studies test are history, geography, economics, and government and society (including social structures, ethics, citizenship, and points of view). These areas are interrelated, and many tasks involve more than one content area. The history content requires students to view events from non-European as well as European perspectives. The test focuses on historical events and experiences in the lives of ordinary citizens. In geography, students apply principles rather than identify facts. Countries in the eastern and western hemispheres are represented. In economics, students are expected to recognize the impact of technology, to understand the interdependence of national economies, and to use basic terms and concepts. Questions in government and society measure a student’s understanding of responsible citizenship and knowledge of democratic government. They also assess needs common to many cultures and the functions of social institutions. In addition, a student’s knowledge of the unique character of cultural groups is measured. The Social Studies test attempts to represent the curriculum standards for social studies of several national organizations. National panels have developed standards in history, geography, economics, and civics. The National Council for the Social Studies (NCSS) adopted Curriculum Standards for the Social Studies (NCSS, 1994), which specifies ten content strands for the social studies curriculum. The domain specifications for the Social Studies test parallel the areas in which national standards have been developed. The NCSS curriculum strands map onto many content standards of the ITBS. For example, Strand II of the NCSS Standards (Time, Continuity, and Change) is represented in history: change and chronology in the ITBS. Similarly, Strand III (People, Places, and Environments) matches two ITBS standards in geography (Earth’s features and people and the environment). Similar connections exist for other ITBS content skills. Some process skills taught in social studies are assessed in other tests of the ITBS Complete Battery—in particular, the Maps and Diagrams test and the Problem Solving and Data Interpretation test. Science

Content specifications for Forms A and B of the Science test were influenced by national standards of science education organizations. The National

42

Science Education Standards (NRC, 1996), prepared under the direction of the National Research Council, were given significant attention. In addition, earlier work on science standards was consulted. Science for All Americans (Rutherford and Ahlgren, 1990) and The Content Core: A Guide for Curriculum Designers (Pearsall, 1993) were resources for developing Science test specifications. The main impact of this work was to elevate the role of process skills and science inquiry in test development. Depending on test level, one-third to nearly one-half of the questions concern the general nature of scientific inquiry and the processes of science investigation. For test development, questions were classified by content and process. The content classification outlined four major domains: Scientific Inquiry: understanding methods of scientific inquiry and process skills used in scientific investigations Life Science: characteristics of life processes in plants and animals; body processes, disease, and nutrition; continuity of life; and environmental interactions and adaptations Earth and Space Science: the Earth’s surfaces, forces of nature, conservation and renewable resources, atmosphere and weather, and the universe Physical Science: basic understanding of mechanics, forces, and motion; forms of energy; electricity and magnetism; properties and changes of matter These content standards were used with a process classification developed by AAAS: classifying, hypothesizing, inferring, measuring, and explaining. This classification helped ensure a range of thinking skills would be required to answer questions in each content area of Science. As in social studies, skills associated with the science curriculum are measured in other tests in the Complete Battery of the ITBS. Some passages in the Reading test use science content to measure comprehension in reading. Skills measured in Problem Solving and Data Interpretation are essential to scientific work. Some skills in the Sources of Information tests are necessary for information gathering in science. Skill scores from these tests may be related to a student’s progress in science or a school’s science curriculum.

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 43

Sources of Information

The Iowa Tests recognize that basic skills extend beyond the school curriculum. Only a small part of the information needed by an educated adult is acquired in school. An educated student or adult knows how to locate and interpret information from available resources. In all curriculum areas, students must be able to locate, interpret, and use information. For this reason, the ITBS Complete Battery includes separate tests on using information sources. Teaching and learning about information sources differ from other content areas in the ITBS because “sources of information” is not typically a subject taught in school. Skills in using information are developed through many activities in the elementary school curriculum. Further, the developmental sequence of these skills is not wellestablished. As a result, before the specifications for these tests could be written, the treatment and placement of visual and reference materials in instruction were examined. Authorities in related disciplines were also consulted. The most widely used textbooks in five subject areas were reviewed to identify grade placement of information sources and visual displays. Also considered were the extent to which textbook series agreed on placement and emphasis, the contribution of subject areas to skills development, and whether information sources were accompanied by instruction on their use. The original skills selected for these tests were classified as the knowledge and use of (1) map materials, (2) graphic and tabular materials, and (3) reference materials. Graphs and tables by and large have become part of the school math curriculum, so that category of information sources was shifted to the Math tests beginning with Form K. In its place, a category on presentation of information through diagrams and related visual materials was added. Such materials have become a regular part of various publications—from periodicals to textbooks and research documents—and represent an important source of information across the curriculum. General descriptions of the tests in Sources of Information follow. Maps and Diagrams. The Maps and Diagrams test measures general cognitive functions for the processing of information as well as specific skills in reading maps and diagrams. Developing the domain specifications for this test involved formal and informal surveys of instructional materials that use visual stimuli. The specifications for map reading were based on a detailed classification by type,

function, and complexity of maps appearing in textbooks. The geographical concepts used in map reading were organized by grade level. These concepts were classified as: the mechanics of map reading (e.g., determining distance and direction, locating and describing places), the interpretation of data (geographic, sociological, or economic conditions), and inferring behavior and living conditions (routes of travel, seasonal variations, and land patterns). The specifications for the diagrams part of the test came from analyzing materials in textbooks and other print material related to functional reading. Items require locating information, explaining relationships, inferring processes or products, and comparing and contrasting features depicted in diagrams. Reference Materials. Although students in most schools have access to reference materials across the curriculum, few curriculum guides pay explicit attention to the foundation skills a student needs to take full advantage of available sources. The content standards for the Reference Materials test include aspects of general cognitive development as well as specific information-gathering skills. The focus is on skills needed to use a variety of references and search strategies in classroom research and writing activities. The items tap a variety of cognitive skills. In the section on using a dictionary, items relate to spelling, pronunciation, syllabification, plural forms, and parts of speech. In the section on general references, items concern the parts of a book (glossary, index, etc.), encyclopedias, periodicals, and other special references. General skills required to access information—such as alphabetizing and selecting guidewords or keywords—are also measured. To answer questions on the Reference Materials test, students must select the best source, judge the quality of sources, and understand search strategies. The upper levels of Forms A and B also include items that measure note-taking skills. Critical Thinking Skills The items in the ITBS batteries for Levels 9–14 were evaluated for the critical thinking demands they require of most students. Questions were classified by multiple reviewers, and a consensus approach was used for final decisions. The test specifications for the final test forms did not contain specific quotas for critical thinking skills. Cognitive processing demands were considered in developing, revising, and selecting items, but the definition of critical thinking was not incorporated directly in any of those decisions. 43

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 44

Classifying items as requiring critical thinking depends on judgments that draw upon (a) knowledge of the appropriate content, (b) an understanding of how students interact with the content of learning and the remembering of it, and (c) a consistent use of the meaning of the term “critical thinking” in the content area in question. The ITBS classifications represent the consensus of the authors about the critical thinking required of most students who correctly answer such items. Further information about item classifications for critical thinking is provided in the Interpretive Guide for Teachers and Counselors.

Other Validity Considerations Norms Versus Standards In using test results to evaluate teaching and learning, it should be recognized that a norm is merely a description of average performance. Normreferenced interpretations of test scores use the distribution of test scores in a norm group as a frame of reference to describe the performance of individuals or groups. Norms for achievement should not be confused with standards for performance (i.e., indicators of what constitutes “satisfactory,” “proficient,” or “exemplary” performance). The distributions of building averages on the ITBS show substantial variability in average achievement from one content area to another in the same school system. For example, schools in a given locale may spend substantially more time on mathematics than on writing, and schools with high averages on the ITBS Language tests may not emphasize writing. A school that scores below the norm in math and above the norm in language may nevertheless need to improve its writing instruction more than its math instruction. Such a judgment requires thorough understanding of patterns of achievement in the district, current teaching emphasis, test content, and expectations of educators and the community. All of these factors contribute to standards-based interpretations of test scores. Many factors should be considered when evaluating the performance of a school. These include the general cognitive ability of the students, learning opportunities outside the school, the emphasis placed on basic skills in the curriculum, and the grade placement and sequencing of the content taught. Large differences in achievement between schools in the same system can be explained by such factors. These factors also can influence how a school or district ranks compared to general norms. Quality of instruction is not the only determining factor. 44

What constitutes satisfactory performance, or what is an acceptable standard, can only be determined by informed judgments about school and individual performance. It is likely to vary from one content area to another, as well as from one locale to another. Ideally, each school must determine what may be reasonably expected of its students. Belowaverage performance on a test does not necessarily indicate poor teaching or a weak curriculum. Examples of effective teaching are found in many such schools. Similarly, above-average performance does not necessarily mean there is no room for improvement. Interpreting test scores based on performance standards reflects a collective judgment about the quality of achievement. The use of such judgments to improve instruction and learning is the ultimate obligation of a standardsbased reporting system. Some schools may wish to formalize the judgments needed to set performance standards on the Iowa Tests of Basic Skills. National performance standards were developed in 1996 in a workshop organized by the publisher. Details about national performance standards and the method used to develop them are given in The Iowa Tests: Special Report on Riverside’s National Performance Standards (Riverside Publishing, 1998). Using Tests to Improve Instruction Using tests to improve instruction and learning is the most important purpose of any form of assessment. It is the main reason for establishing national norms and developmental score scales. Valid national norms provide the frame of reference to determine individual strengths and weaknesses. Sound developmental scales create the frame of reference to interpret academic growth—whether instructional practices have had the desired effect on achievement. These two frames of reference constitute the essential contribution of a standardized achievement test in a local school. Most teachers provide for individual and group differences in one way or another. It would be virtually impossible to structure learning for all students in exactly the same way even if one wanted to. The characteristics, needs, and desires of students require a teacher to allocate attention and assistance differentially. Test results help a teacher to tailor instruction to meet individual needs. Test results are most useful when they reveal discrepancies in performance—between test areas, from year to year, between achievement and ability tests, and between expectations and performance.

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 45

Many score reports contain unexpected findings. These findings represent information that should not be ignored but instead examined further. Suggestions about using test results to individualize instruction are given in the Interpretive Guide for Teachers and Counselors, along with a classification of content for each test. Any classification system is somewhat arbitrary. The content of the ITBS is represented in the skills a student is required to demonstrate. The skills taxonomy is re-evaluated periodically because of changes in curriculum and teaching methods. For some tests (e.g., Capitalization), the categories are highly specific; for others (e.g., Reading Comprehension), they are more general. The criteria for defining a test’s content classification system are meaningfulness and usefulness to the teacher. The Interpretive Guide for Teachers and Counselors and the Interpretive Guide for School Administrators present the skills classification system for each test. A detailed list of skills measured by every item in each form is included. In addition, suggestions are made for improving achievement in each area. These are intended as follow-up activities. Comprehensive procedures to improve instruction may be found in the literature associated with each curriculum area in the elementary and middle grades. Using Tests to Evaluate Instruction To address the issue of evaluating curriculum and instruction is to confront one of the most difficult problems in assessment. School testing programs do not exist in a vacuum; there are many audiences for assessment data and many stakeholders in the results. Districts and states are likely to consider using standardized tests as part of their evaluation instruction. Like any assessment information, standardized test results provide only a partial view of the effectiveness of instruction. The word “partial” deserves emphasis because the validity of tests used to evaluate instruction hinges on it. First, standardized tests are concerned with basic skills and abilities. They are not intended to measure total achievement in a given subject or grade. Although these skills and abilities are essential to nearly all types of academic achievement, they do not include all desired outcomes of instruction. Therefore, results obtained from these tests do not by themselves constitute an adequate basis for, and should not be overemphasized in, the evaluation of instruction. It is possible, although unlikely, that some schools or classes may do well on these tests yet be relatively deficient in other areas of

instruction—for example, music, literature, health, or career education. Other schools or classes with below-average test results may provide a healthy educational environment in other respects. Achievement tests are concerned with areas of instruction that can be measured under standard conditions. The content standards represented in the Iowa Tests of Basic Skills are important. Effective use of the tests requires recognition of their limits, however. Schools should treat as objectively as possible those aspects of instruction that can be measured in this way. Other less tangible, yet still important, outcomes of education should not be neglected. Second, local performance is influenced by many factors. The effectiveness of the teaching staff is only one factor. Among the others are the cognitive ability of students, the school environment, the students’ educational history, the quality of the instructional materials, student motivation, and the physical equipment of the school. At all times, a test must be considered a means to an end and not an end in itself. The principal value of these tests is to focus the attention of the teaching staff and the students on specific aspects of educational development in need of individual attention. Test results should also facilitate individualized instruction, identify which aspects of the instructional program need greater emphasis or attention, and provide the basis for better educational guidance. Properly used results should motivate teachers and students to improve instruction and learning. Used with other information, test results can help evaluate the total program of instruction. Unless test results are used in this way, however, they may do serious injustice to teachers or to well-designed instructional programs. Local Modification of Test Content The Iowa Tests are based on a thorough analysis of curriculum materials from many sources. Every effort is made to ensure content standards reflect a national consensus about what is important to teach and assess. Thus, many questions about content representativeness of the tests are addressed during test development. Adapting the content of nationally standardized tests to match more closely local or state standards is a trend in large-scale assessment. Sometimes these efforts involve augmenting standardized tests and reporting criterion-referenced information along 45

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 46

with national norms from the intact test. In other cases, local districts modify a test by selecting some items and omitting others based on the match with the local curriculum. Studies of the latter type of modified test (e.g., Forsyth, Ansley & Twing, 1992), also known as customized tests, have shown that national norms after modification can differ markedly from norms on the test as originally standardized. Some of this distortion may result from selecting items so tailored to the curriculum that students perform better than they would on the original version. Other distortions are caused by context effects on items. When items are removed, those remaining may not have the same psychometric properties (Brennan, 1992), which affects national norms. When this occurs, other normative score interpretations are also affected. The evaluation of strengths and weaknesses, the assessment of growth, and the status relative to national standards can be distorted if local modifications do not retain the same balance of content as in the original test. Content standards at the local and state level can change dramatically over a short time. Performance standards can be as influenced by politics as by advances in understanding how students learn. For these reasons, The Iowa Tests should be administered under standard conditions to ensure the validity of norm-referenced interpretations. Predictive Validity The Iowa Tests of Basic Skills were not designed as tests of academic aptitude or as predictors of future academic success. However, the importance of basic skills to high school and college success has been demonstrated repeatedly. Evidence of the predictive “power” of tests is difficult to obtain because selection eliminates from research samples students whose performance fails to qualify them for later education. Many college students complete high school and enter college in part because of high proficiency in the basic skills. Students who lack proficiency in the basic skills are either not admitted to college or seek employment. Therefore, coefficients of predictive validity are obtained for a select population. Estimates of correlations for an unselected population can be obtained (e.g., Linn & Dunbar, 1982), but the assumptions underlying the computations are not always satisfied.

46

Five studies of predictive validity are summarized in Table 3.7 with correlation coefficients between the ITBS Complete Composite and several criterion measures. In a study by Scannell (1958), ITBS scores at grades 4, 6, and 8 of students entering one of the Iowa state universities were correlated with three criterion measures. These criteria were (a) grade 12 Composite scores on the Iowa Tests of Educational Development, (b) high school grade-point average (GPA), and (c) first-year college GPA. Considerable restriction in the range of the ITBS scores was present. The observed correlations should be regarded as lower-bound estimates of the actual correlation in an unselected population. When adjustments for restriction in range were made, the estimated correlations with the ITED grade 12 Composite were .77, .82, and .81 for grades 4, 6, and 8, respectively. In Rosemier’s (1962) study of freshmen entering The University of Iowa in the fall of 1962, test scores were obtained for the ITBS in grade 8 and for the ITED in grades 10–12. Scores on the American College Tests (ACT), high school GPA, and first-year college GPA were also obtained. The standard deviation of the ITBS Composite scores for the sample was 7.52, compared to 14.91 for the total grade 8 student distribution that year. Differences between the obtained and adjusted correlations show the effect of range restriction on estimated validity coefficients. Loyd, Forsyth, and Hoover (1980) conducted a study of the relation between ITBS, ITED, and ACT scores and high school and first-year college GPA of 1,997 graduates of Iowa high schools who entered The University of Iowa in the fall of 1977. As in the Rosemier study, variability in the college-bound population was much smaller than that in the general high school population. Ansley and Forsyth (1983) obtained final college GPAs of the students in the Loyd et al. study. They found the ITBS predicted final college GPA as well as it predicted first-year college GPA. Qualls and Ansley (1995) replicated the Loyd et al. study with data from freshmen entering The University of Iowa in the fall of 1991. They found ITBS and ITED scores still showed substantial predictive validity, but the correlations between tests scores and grades were somewhat lower than those in the earlier study. To investigate these differences, a study is under way with data from the

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 47

Table 3.7 Summary Data from Predictive Validity Studies Correlations with ITBS Complete Composite ITED Composite Source Scannell, 1958 (#) Grade 8 Grade 6 Grade 4 Rosemier, 1962(#) Grade 8 Loyd et al., 1980 Grade 8 Grade 6 Ansley & Forsyth, 1983 (#) Grade 8 Grade 6 Qualls & Ansley, 1995 (#) Grade 8 Grade 6 Grade 4

Grade 10

Grade 11

High School Grade 12

HS GPA

0.73 0.76 0.68

0.61 0.59 0.53

ACT Composite

College GPA Freshman

0.48 0.49 0.42

0.78

0.77

0.77

0.59

0.73

0.41

0.84 0.78

0.84 0.80

0.79 0.74

0.49 0.49

0.78 0.73

0.44 0.45

freshman classes of 1996 and 1997. Preliminary results indicate the correlations between test scores and GPA are smaller than those reported in the 1950s and 1960s, but the relationship is stronger than the Qualls and Ansley research suggests. Three predictive validity studies that examine the relation between achievement test scores in eighth grade and subsequent course grades in ninth grade are summarized below. Dunn (1990) found that correlation coefficients between the two measures of performance, test scores and course grades, were relatively consistent for composites. The average correlation between the ITBS Complete Composite and grades across 13 high school courses—including language arts, U.S. history, general math, algebra, etc.—was .62. The smallest correlations were observed in courses for which selection criteria narrowed the range of overall achievement considerably (e.g., algebra). As part of this investigation, a variety of regression analyses were performed to examine the joint role of test scores and course grades in predicting later performance in school. These analyses showed that

Final

0.43 0.44 0.38 0.36 0.32

0.26 0.21 0.18

achievement test scores added significantly to the prediction of course grades in high school after performance in middle school courses was taken into account. Course grades in the middle school years tended to be better predictors of high school performance than test scores, suggesting unique factors influence grades. Similar results were obtained in a study by Barron, Ansley, and Hoover (1991) that looked specifically at predicting achievement in Algebra I. As in the Dunn study, multiple regression analyses showed that ITBS scores added significantly to the prediction of ninth-grade algebra achievement even after previous grades were taken into account. A more recent analysis of the relation between ITBS Core Total scores in grade 8 and ACT composite scores was conducted by Iowa Testing Programs (1998). Predictive validity coefficients in this study were .76, .81, and .78 for fall, midyear, and spring administrations of the ITBS in grade 8. Predictive validity coefficients of this magnitude compare favorably with those of achievement tests given in the high school years.

47

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 48

Tests such as the ITBS have been used in many ways to support judgments about how well students are prepared for future instruction, that is, as general measures of readiness. This aspect of test use has become somewhat controversial in recent years because of situations where tests are used to control access to educational opportunities. Readability The best way to determine the difficulty of a standardized test is to examine its norms tables and distribution of item difficulty. The difficulty data for items, skills, and tests in the ITBS are reported in Content Classifications with Item Norms. Of the various factors that influence difficulty, readability is the focus of much attention. The readability of written materials is measured in several ways. An expert may judge the grade level of a reading passage based on perception of its complexity. The most common method of quantifying these judgments is to use a readability formula. Readability formulas are often used by curriculum specialists, classroom teachers, and school librarians to match textbooks and trade books to the reading abilities of students. The virtue of readability formulas is objectivity. Typically, they use vocabulary difficulty (e.g., word familiarity or length) and syntactic complexity (e.g., sentence length) to predict the difficulty of a passage. The shortcoming of readability formulas is failure to account for qualitative factors that influence how easily a reader comprehends written material. Such factors include organization and cohesiveness of a selection, complexity of the concepts presented, amount of knowledge the reader is expected to bring to the passage, clarity of new information, and interest level of the material to its audience. Readability formulas were originally developed to assess the difficulty of written prose and sometimes have been used as a basis for modifying written material. Using a readability formula in this way does not automatically result in more readable text. Short, choppy sentences and familiar but imprecise words can actually increase the difficulty of a selection even though they lower its index of readability (Davison & Kantor, 1982). Readability formulas use word lists that become dated over time. For instance, the 1958 Dale-Chall (Dale & Chall, 1948; Powers, Sumner & Kearl, 1958) and the Bormuth (1969) formulas use the Dale List of 3,000 words, which reflects the reading vocabulary of students of the early 1940s. This list 48

results in some curiosities: “Ma” and “Pa,” which today would appear primarily in regional literature, are considered familiar; “Mom” is unfamiliar. Similarly, “bicycle” is an easy word, but “bike” is hard. “Radio” is familiar; “TV” is not. The 1995 DaleChall revision addresses some of these concerns, but what is truly familiar and what is not will always be time dependent. A similar problem exists in predicting the vocabulary difficulty of subject-area tests. Words that are probably familiar to students in a particular curriculum (e.g., “cost,” “share,” and “subtract” in math problem solving; “area,” “map,” and “crop” in social studies; “body,” “bone,” and “heat” in science; and even the days of the week in a capitalization test) are treated as unfamiliar words by the Spache formula. Readability concerns are often raised on tests such as math problem solving. It is generally believed that a student’s performance in math should not be influenced by reading ability. Readability data are frequently requested for passages in reading tests. Here, readability indices document the range of difficulty included so the test can discriminate the reading achievement of all students. Since reading achievement in the middle grades can span seven or eight grade levels, the readability indices of passages should vary substantially. Three readability indices for Forms A and B are reported in the accompanying table for Reading Comprehension, Language Usage and Expression, Math Problem Solving, Social Studies, and Science. The Spache (1974) index, reported in grade-level values, measures two factors: mean sentence length and proportion of “unfamiliar” or “difficult” words. The Spache formula uses a list of 1,041 words and is appropriate with materials for students in grades 1 through 3. The Bormuth formula (Bormuth, 1969, 1971) reflects three factors: average sentence length, average word length, and percent of familiar words (based on the old Dale List). Bormuth’s formula predicts the cloze mean (CM), the average of percent correct for a set of cloze passages (the higher the value, the easier the passage). The value reported in the table is an inverted Bormuth index. The inverted index was multiplied by 100 to remove the decimal.

9 10 11 12 13 14

Form B Level

9 10 11 12 13 14

46-58 38-58 38-49 35-49 27-49 27-44

2.2-4.2 2.2-3.8 — — — —

53 51 50 41 39 38

30 33 35 38 40 43

53 51 50 47 43 41

Number of 1995 Items Dale-Chall

30 33 35 38 40 43

Number of 1995 Items Dale-Chall

45 49 50 52 57 59

Inverted Bormuth

44 47 48 57 61 59

Inverted Bormuth

1995 Dale-Chall

Spache

3.0 3.1 — — — —

Spache

3.5 3.8 — — — —

Spache

52 49 44 44 40 36

1995 Dale-Chall

48 44 44 42 39 38

1995 Dale-Chall

49 51 54 55 58 60

Inverted Bormuth

53 56 57 58 59 59

Inverted Bormuth

9 10 11 12 13 14

Form B Level

9 10 11 12 13 14

Form A Level 46 48 47 46 48 47

22 24 26 28 30 32

48 49 50 48 45 42

Number of 1995 Items Dale-Chall

22 24 26 28 30 32

Number of 1995 Items Dale-Chall

47 48 52 52 52 53

Inverted Bormuth

49 49 51 51 51 52

Inverted Bormuth

Mathematics Problem Solving and Data Interpretation

42-56 44-59 50-59 50-62 54-67 54-67

Inverted Bormuth

49-59 51-65 49-65 49-69 54-69 54-67

Inverted Bormuth —

30 34 37 39 41 43

51 49 46 43 39 39

9 10 11 12 13 14

30 34 37 39 41 43

49 48 46 40 38 42

Form B Number of 1995 Level Items Dale-Chall

9 10 11 12 13 14

50 51 51 56 58 56

Inverted Bormuth

48 51 53 54 57 57

44 46 48 49 52 55

2.5 2.8 — — — —

53 50 47 46 43 39

1995 Dale-Chall

51 47 45 44 41 40

1995 Dale-Chall

Science

Spache

2.8 3.1 — — — —

Spache

Means

30 34 37 39 41 43

49 47 49 44 42 42

9 10 11 12 13 14

30 34 37 39 41 43

52 51 49 48 45 42

Form B Number of 1995 Level Items Dale-Chall

9 10 11 12 13 14

Form A Number of 1995 Level Items Dale-Chall

Inverted Bormuth

44 47 49 51 55 56

Inverted Bormuth

Inverted Bormuth

54 52 50 49 45 44

1995 Dale-Chall

54 51 49 47 44 42

1995 Dale-Chall

Social Studies

2.3 2.6 — — — —

Spache

2.3 2.6 — — — —

Spache

Form A Number of 1995 Level Items Dale-Chall

37 41 43 45 48 52

Number of Items

37 41 43 45 48 52

Number of Items

Means

49 50 49 51 55 58

Inverted Bormuth

49 50 51 56 59 58

Inverted Bormuth

47 50 52 53 56 59

Inverted Bormuth

50 53 54 56 58 58

Inverted Bormuth

Passages Plus Items

3:15 PM

Form A Level

Number of Passages 4 4 8 8 8 8 8 8

Form B Level 7 8 9 10 11 12 13 14

42-53 34-49 34-53 28-53 28-45 25-45

3.0-5.1 3.0-5.1 — — — —



1995 Dale-Chall —

Spache

Means

Items Only

10/29/10

Usage and Expression

Number of Passages 1 4 4 8 8 8 8 8 8

Form A Level 6 7 8 9 10 11 12 13 14

Range

Passages Only

Reading Comprehension

Table 3.8 Readability Indices for Selected Tests

961464_ITBS_GuidetoRD.qxp Page 49

49

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 50

Like the Bormuth formula, the 1995 Dale-Chall index estimates the cloze mean but from only two factors, percent of unfamiliar words and average sentence length. Easy passages tend to have relatively few unfamiliar words and shorter sentences, whereas difficult passages tend to have more unfamiliar words and longer sentences. Readability indices for Reading Comprehension are reported separately by form and level for passages, items, and passages plus items. Indices for passages in each level vary greatly, which helps to discriminate over the range of reading achievement. The average, however, is typically at or below grade level. For example, the 1995 Dale-Chall index for passages in Form A, Level 9, ranges from 42 to 53; the average is 48. The readability index for an item

50

usually indicates the item is easier to read than the corresponding passage. In Form A, Levels 10–14, for example, the average 1995 Dale-Chall index is 43 for the passages, 48 for the items, and 45 for the passages plus items. The formulas tend to treat many words common to specific subjects as unfamiliar. This could have an effect on the readability indices for the Social Studies, Science, and Math Problem Solving and Data Interpretation tests, especially in the lower grades. The values given in the accompanying table, however, are usually below the grade level in which tests are typically given.

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 51

PART 4

Scaling, Norming, and Equating The Iowa Tests

Frames of Reference for Reporting School Achievement Defining the frame of reference to describe and report educational development is the fundamental challenge in educational measurement. Some educators are interested in determining the developmental level of students, and in describing achievement as a point on a continuum that spans the years of schooling. Others are concerned with understanding student strengths and weaknesses across the curriculum, setting goals, and designing instructional programs. Still others want to know whether students satisfy standards of performance in various school subjects. Each of these educators may share a common purpose for assessment but would require different frames of reference for reports of results. This part of the Guide to Research and Development describes procedures used for scaling, norming, and equating The Iowa Tests. Scaling methods define longitudinal score scales for measuring growth in achievement. Norming methods estimate national performance and long-term trends in achievement and provide a basis for measuring strengths and weaknesses of individuals and groups. Equating methods establish comparability of scores on equivalent test forms. Together these techniques produce reliable scores that satisfy the demands of users and meet professional test standards.

Comparability of Developmental Scores Across Levels: The Growth Model The foundation of any developmental scale of educational achievement is the definition of gradeto-grade overlap. Students vary considerably within any given grade in the kinds of cognitive tasks they can perform. For example, some students in third grade can solve problems in mathematics that are difficult for the average student in sixth grade. Conversely, some students in sixth grade read no better than the average student in third grade. There is even more overlap in the cognitive skills of students in adjacent grades—enough that some communities have devised multi-age or multi-grade

classrooms to accommodate it. Grade-to-grade overlap in the distributions of cognitive skills is basic to any developmental scale that measures growth in achievement over time. Such overlap is sometimes described by the ratio of variability within grade to variability between grades. As this ratio increases, the amount of grade-to-grade overlap in achievement increases. The problems of longitudinal comparability of tests and vertical scaling and equating of test scores have existed since the first use of achievement test batteries to measure educational progress. The equivalence of scores from various levels is of special concern in using tests “out-of-level” or in individualized testing applications. For example, a standard score of 185 earned on Level 10 should be comparable to the 185 earned on any other level; a grade equivalent score of 4.8 earned on Level 10 should be comparable to a grade equivalent of 4.8 earned on another level. Each test in the ITBS battery from Levels 9 through 14 is a single continuous test representing a range of educational development from low grade 3 through superior grade 9. Each test is organized as six overlapping levels. During the 1970s, the tests were extended downward to kindergarten by the addition of Levels 5 through 8 of the Primary Battery. Beginning in 1992, the Iowa Tests of Educational Development, Levels 15–17/18 were jointly standardized with the ITBS. A common developmental scale was needed to relate the scores from each level to the other levels. The scaling requirement consisted of establishing the overlap among the raw score scales for the levels and relating the raw score scales to a common developmental scale. The scaling test method used to build the developmental scale for the ITBS and ITED, Hieronymus scaling, is described in Petersen, Kolen & Hoover (1989). Scaling procedures that are specific to current forms of The Iowa Tests are discussed in this part of the Guide to Research and Development. The developmental scales for the previous editions of the ITBS steadily evolved over the years of their use. The growth models and procedures used to 51

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 52

derive the developmental scales for the Multilevel Battery (Forms 1 through 6) using Hieronymus scaling are described on pages 75–78 of the 1974 Manual for Administrators, Supervisors, and Counselors. The downward extension of the growth model to include Levels 7 and 8 is outlined in the Manual for Administrators for the Primary Battery, 1975, pages 43–45. The further downward extension to Levels 5 and 6 in 1978 is described on page 118 of the 1982 Manual for School Administrators. Over the history of these editions of the tests, the scale was adjusted periodically. This was done to accommodate new levels of the battery or changes in the ratio of within- to between-grade variability observed in national standardization studies and large-scale testing programs that used The Iowa Tests. In the 1963 and 1970 national standardization programs, minor adjustments were made in the model at the upper and lower extremes of the grade distributions, mainly as a result of changes in extrapolation procedures. During the 1970s it became apparent that differential changes in achievement were taking place from grade to grade and from test to test. Achievement by students in the lower grades was at the same level or slightly higher during the seven-year period. In the upper grades, however, achievement levels declined markedly in language and mathematics over the same period. Differential changes in absolute level of performance increased the amount of grade-tograde overlap in performance and necessitated major changes in the grade-equivalent to percentilerank relationships. Scaling studies involving the vertical equating of levels were based on 1970–1977 achievement test scores. The procedures and the resulting changes in the growth models are described in the 1982 Manual for School Administrators, pages 117–118. Between 1977 and 1984, data from state testing programs and school systems across the country suggested that differential changes in achievement across grades had continued. Most of the available evidence, however, indicated that these changes differed from changes of the previous seven-year period. In all grades and test areas, achievement appeared to be increasing. Changes in median achievement by grade for 1977–1981 and 1981–84 are documented in the 1986 Manual for School Administrators (Hieronymus & Hoover, 1986). Changes in median achievement after 1984 are described in the 1990 Manual for School Administrators, Supplement (Hoover & Hieronymus, 1990), and later in Part 4 of this Guide. 52

Patterns of achievement on the tests during the 1970s and 1980s provided convincing evidence that another scaling study was needed to ascertain the grade-to-grade overlap for future editions of the tests. Not only had test performance changed significantly, so had school curriculum in the achievement areas measured by the tests. In addition, in 1992 the ITED was to be jointly standardized and scaled with the ITBS for the first time, so developmental links between the two batteries were needed.

The National Standard Score Scale Students in the 1992 spring national standardization participated in special test administrations for scaling the ITBS and ITED. The scaling tests were wide-range achievement tests designed to represent each content domain in the Complete Battery of the ITBS or ITED. Scaling tests were developed for three groups: kindergarten through grade 3, grades 3 through 9, and grades 8 through 12. These tests were designed to establish links among the three sets of tests from the data collected. During the standardization, scaling tests in each content area were spiraled within classrooms to obtain nationally representative and comparable data for each subtest. The scaling tests provide essential information about achievement differences and similarities between groups of students in successive grades. For example, the scores show the variability among fourth graders in science achievement and the proportion of fourth graders that score higher in science than the typical fifth grader. The study of such relations is essential to building developmental score scales. These score scales monitor year-to-year growth and estimate students’ developmental levels in areas such as reading, language, and math. To describe the developmental continuum in one subject area, students in several different grades must answer the same questions. Because of the range of item difficulty in the scaling tests, special Directions for Administration were prepared. The score distributions on the scaling tests defined the grade-to-grade overlap needed to establish the common developmental achievement scale in each test area. An estimated distribution of true scores was obtained for every content area using the appropriate adjustment for unreliability (Feldt & Brennan, 1989). The percentage of students in a given grade who scored higher than the median of other grades on that scaling test was determined

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 53

from the estimated distribution of true scores. This procedure provided estimates of the ratios of withinto between-grade variability free of chance errors of measurement and defined the amount of grade-tograde overlap in each achievement domain.

The table summarizes the relations among grade medians for Language Usage and Expression for Forms G and H in 1984 and for Forms K and L in 1992. Each row of Table 4.1 reports the percent of students in that grade who exceeded the median of the grade in each column. The entries for 1992 also describe the scale used for Forms A and B after the 2000 national standardization.

The relation of standard scores to percentile ranks for each grade was obtained from the results of the scaling test. Given the percentages of students in the national standardization in one grade above or below the medians of other grades, within-grade percentiles on the developmental scale were determined. These percentiles were plotted and smoothed. This produced a cumulative distribution of standard scores for each test and grade, which represents the growth model for that test. The relations between raw scores and standard scores were obtained from the percentile ranks on each scale.

Two factors created the differences between the 1984 and 1992 distributions. First, the ratio of within- to between-grade variability in student performance increased. Second, before 1992, the parts of the growth model below grade 3 and above grade 8 were extrapolated from the available data on grades 3–8. In the 1992 standardization, scaling test data were collected in the primary and high school grades, which allowed the growth model to be empirically determined below grade 3 and above grade 8.

Table 4.1 illustrates the changes in grade-to-grade overlap that led to the decision to rescale the tests.

Table 4.1 Comparison of Grade-to-Grade Overlap Iowa Tests of Basic Skills, Language Usage and Expression—Forms K/L vs. Forms G/H National Standardization Data, 1992 and 1984 Percent of GEs in Each Grade Exceeding Grade Median (Fall) Determined from the 1992 and 1984 Scaling Studies Grade

Year

Grade Medians K 123 K.2

1 140 1.2

2 157 2.2

3 174 3.2

4 190 4.2

5 205 5.2

6 219 6.2

7 230 7.2

8 241 8.2

8

1992 1984

99 99

97 99

91 99

84 96

76 88

66 76

58 64

50 50

7

1992 1984

99 99

95 99

90 98

81 91

70 79

59 65

50 50

42 35

6

1992 1984

99 99

95 99

88 93

77 82

63 67

50 50

41 33

33 17

5

1992 1984

99 99

93 97

83 85

67 68

50 50

35 31

27 16

20 6

4

1992 1984

99 99

98 99

89 88

74 70

50 50

32 30

21 15

14 5

9 1

3

1992 1984

99 99

95 93

79 73

50 50

26 28

13 13

6 3

3 1

1 1

2

1992 1984

99 99

87 78

50 50

19 27

7 11

2 2

1 1

1

1992 1984

94 86

50 50

10 24

1 9

1 1

K

1992 1984

50 50

2 22

1 7

1 1

53

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 54

Table 4.1 indicates that the amount of grade-tograde overlap in the 1992 and 2000 developmental standard score scale tends to increase steadily from kindergarten to eighth grade. This pattern is consistent with a model for growth in achievement in which median growth decreases across grades at the same time as variability in performance increases within grades. The type of data illustrated in Table 4.1 provides empirical evidence of grade-to-grade overlap that must be incorporated into the definition of growth reflected in the final developmental scale. But such data do not resolve the scaling problem. Units for the description of growth from grade to grade must be defined so that comparability can be achieved between descriptions of growth in different content areas. To define these units, achievement data were examined from several sources in which the focus of Grade:

standard-score points, but from grade 7 to grade 8 it averages only 11 points. The grade-equivalent (GE) scale for The Iowa Tests is a monotonic transformation of the standard score scale. As with previous test forms, the GE scale measures growth based on the typical change observed during the school year. As such, it represents a different growth model than does the standard score scale (Hoover, 1984). With GEs, the average student ‘‘grows’’ one unit on the scale each year, by definition. As noted by Hoover, GEs are a readily interpretable scale for many elementary school teachers because they describe growth in terms familiar to them. GEs become less useful during high school, when school curriculum becomes more varied and the scale tends to exaggerate growth.

K

1

2

3

4

5

6

7

8

9

10

11

12

SS:

130

150

168

185

200

214

227

239

250

260

268

275

280

GE:

K.8

1.8

2.8

3.8

4.8

5.8

6.8

7.8

8.8

9.8

10.8

11.8

12.8

measurement was on growth in key curriculum areas at a national level. The data included results of scaling studies using not only the Hieronymus method, but also Thurstone and item-response theory methods (Mittman, 1958; Loyd & Hoover, 1980; Harris & Hoover, 1987; Becker & Forsyth, 1992; Andrews, 1995). Although the properties of developmental scales vary with the methods used to create them, all data sources showed that growth in achievement is rapid in the early stages of development and more gradual in the later stages. Theories of cognitive development also support these general findings (Snow & Lohman, 1989). The growth model for the current edition of The Iowa Tests was determined so that it was consistent with the patterns of growth over the history of The Iowa Tests and with the experience of educators in measuring student growth and development. The developmental scale used for reporting ITBS results was established by assigning a score of 200 to the median performance of students in the spring of grade 4 and 250 to the median performance of students in the spring of grade 8. The table above shows the developmental standard scores that correspond to typical performance of grade groups on each ITBS test in the spring of the year. The scale illustrates that average annual growth decreases as students move through the grades. For example, the growth from grade 1 to grade 2 averages 18 54

Before 1992, the principal developmental score scale of the ITBS was defined with grade equivalents (GEs) using the Hieronymus method. Other scales for reporting results on those editions of the tests, notably developmental standard scores, were obtained independently using the Thurstone method. For reasons related to the non-normality of achievement test score distributions (Flanagan, 1951), the Thurstone method was not used for the current editions of The Iowa Tests. Beginning with Forms K/L/M, both developmental standard scores and grade equivalents were derived with the Hieronymus method. As a result of the development of new scales, neither GEs nor standard scores for Forms A and B are directly comparable to those reported before Forms K/L/M. The purpose of a developmental scale in achievement testing is to permit score comparisons between different levels of a test. Such comparisons are dependable under standard conditions of test administration. In some situations, however, developmental scores (developmental standard scores and grade equivalents) obtained across levels may not seem comparable. Equivalence of scores across levels in the scaling study was obtained under optimal conditions of motivation. Differences in attitude and motivation, however, may affect comparisons of results from ‘‘on-level’’ and ‘‘out-oflevel’’ testing of students who differ markedly in

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 55

developmental level. If students take their tests seriously, scores from different levels will be similar (except for errors of measurement). If students are frustrated or unmotivated because a test is too difficult, they will probably obtain scores in the ‘‘chance’’ range. But if students are challenged and motivated taking a lower level, their achievement will be measured more accurately. Greater measurement error is expected if students are assigned an inappropriate level of the test (too easy or too difficult). This results in a higher standard score or grade equivalent on higher levels of the test than lower levels, because standard scores and grade equivalents that correspond to ‘‘chance’’ increase from level to level. The same is true for perfect or near-perfect scores. These considerations show the importance of motivation, attitude, and assignment of test level in accurately measuring a student’s developmental level. For more discussion of issues concerning developmental score scales, see ‘‘Scaling, Norming, and Equating’’ in the third edition of Educational Measurement (Petersen, Kolen & Hoover, 1989). Characteristics of developmental score scales, particularly as they relate to statistical procedures and assumptions used in scaling and equating, have been addressed by the continuous research program at The University of Iowa (Mittman, 1958; Beggs & Hieronymus, 1968; Plake, 1979; Loyd, 1980; Loyd & Hoover, 1980; Kolen, 1981; Hoover, 1984; Harris & Hoover, 1987; Becker & Forsyth, 1992; Andrews, 1995).

Development and Monitoring of National Norms for the ITBS The procedures used to develop norms for the ITBS were described in Part 2. Similar procedures have been used to develop national norms since the first forms of the ITBS were published in 1956. These procedures form the basis for normative information available in score reports for The Iowa Tests: student norms, building norms, skill norms, item norms, and norms for special school populations. Over the years, changes in performance have been monitored to inform users of each new edition about the normative differences they might expect with new test forms. The 2000 national standardization of The Iowa Tests formed the basis for the norms of Forms A and B of the Complete Battery and Survey Battery. Data from the standardization established benchmark

performance for nationally representative samples of students in the fall and spring of the school year and were used to estimate midyear performance through interpolation. The differences between 1992 and 2000 performance, expressed in percentile ranks for the main test scores, are shown in Table 4.2. The achievement levels in the first column are expressed in terms of 1992 national percentile ranks. The entries in the table show the corresponding 2000 percentile ranks. For example, a score on the Reading test that would have a percentile rank of 50 in grade 5 according to 1992 norms would convert to a percentile rank of 58 on the 2000 norms.

Trends in Achievement Test Performance In general, true changes in educational achievement take place slowly. Despite public debate about school reform and education standards, the underlying educational goals of schools are relatively stable. Lasting changes in educational methods and materials tend to be evolutionary rather than revolutionary, and student motivation and public support of education change slowly. Data from the national standardizations provide important information about trends in achievement over time. Nationally standardized tests of ability and achievement are typically restandardized every seven to ten years when new test forms are published. The advantage of using the same norms over a period of time is that scores from year to year can be based on the same metric. Gains or losses are “real”; i.e., no part of the measured gains or losses can be attributed to changes in the norms. The disadvantage, of course, is the norms become dated. How serious this is depends on how much performance has changed over the period. Differences in performance between editions, which are represented by changes in norms, were relatively minor for early editions of the ITBS. This situation changed dramatically in the late 1960s and the 1970s. Shortly after 1965, achievement declined—first in mathematics, then in language skills, and later in other curriculum areas. This downward trend in achievement in the late 1960s and early 1970s was reflected in the test norms during that period, which were “softer” than norms before and after that time. Beginning in the mid-1970s, achievement improved slowly but consistently in all curriculum areas until the early 1990s, when it reached an all-time high. 55

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 56

Table 4.2 Differences Between National Percentile Ranks Iowa Tests of Basic Skills — Forms K/L vs. A/B National Standardization Data, 1992 and 2000 Reading Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1

Language

Corresponding PRs: 2000 National Norms 1

2

3

98 94 88 78 68 58 48 39 31 22 12 5 1

99 97 93 84 73 63 52 41 31 21 12 5 1

99 98 93 86 77 68 58 48 37 26 15 7 3

Grade 5 4

6

7

8

99 98 92 83 74 66 57 48 38 28 16 8 3

99 97 93 83 71 60 50 39 28 18 9 3 1

99 97 91 80 70 60 52 43 34 24 13 6 1

99 97 91 82 73 62 52 41 30 20 11 5 2

99 99 96 88 78 67 58 46 37 26 15 7 2

Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1

Corresponding PRs: 2000 National Norms 1

2

3

Grade 5 4

6

7

8

98 96 89 79 71 61 51 42 34 24 12 5 2

99 97 91 81 72 62 54 44 34 24 12 5 2

99 97 92 84 75 66 56 47 35 25 15 6 2

99 97 93 85 77 68 60 51 41 30 17 9 3

99 97 91 80 70 59 48 39 30 21 11 5 1

99 96 91 81 71 61 50 42 33 23 14 7 3

98 95 88 79 70 61 51 42 33 25 15 8 3

Mathematics Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1

Social Studies

Corresponding PRs: 2000 National Norms 1

2

3

Grade 5 4

6

7

8

98 95 89 78 70 61 51 41 32 23 13 6 2

98 95 88 78 69 58 48 38 29 19 10 4 1

98 95 90 79 69 60 50 40 31 21 11 4 1

99 96 90 81 70 60 51 42 32 22 12 5 1

99 97 90 78 67 56 46 37 28 20 10 4 1

99 96 90 79 68 57 47 38 30 20 11 5 1

98 96 90 80 69 59 49 39 31 21 11 5 2

99 96 91 80 71 62 52 44 34 25 14 6 2

Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1

Science Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1

56

99 97 93 85 77 68 60 51 41 30 19 10 5

Corresponding PRs: 2000 National Norms 1

2

3

Grade 5 4

6

7

8

98 95 91 83 75 67 60 51 41 29 15 6 2

98 96 92 84 75 66 56 48 37 28 15 7 2

99 99 95 87 80 72 61 50 39 27 14 5 2

99 98 94 86 78 68 58 50 41 29 15 6 2

99 97 91 81 71 61 51 42 32 21 11 4 1

98 96 91 82 70 59 47 38 28 19 8 3 1

99 97 92 81 71 62 52 44 32 22 11 4 1

99 98 93 86 79 72 64 53 43 31 19 9 4

Sources of Information

Corresponding PRs: 2000 National Norms 1

2

3

Grade 5 4

99 96 90 80 70 61 53 43 33 22 13 5 1

98 96 90 81 72 64 53 44 35 25 14 6 2

97 93 87 78 69 61 53 43 32 22 13 7 2

98 95 88 79 70 62 53 46 38 27 15 7 2

97 94 87 77 69 62 54 46 37 28 17 9 3

6

7

8

98 94 88 77 68 58 47 38 29 19 8 3 1

99 97 91 82 72 62 51 41 31 21 9 4 1

99 97 91 81 73 63 54 47 38 29 17 7 2

Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1

Corresponding PRs: 2000 National Norms 1

2

3

Grade 5 4

6

7

8

97 95 90 80 70 60 52 42 32 21 10 3 1

98 96 89 80 70 60 52 41 31 23 13 6 3

99 97 93 85 77 68 58 49 38 26 13 5 1

99 97 91 83 74 65 57 47 38 27 14 7 2

99 98 94 85 74 63 52 42 32 23 14 7 2

99 98 94 85 74 63 52 42 33 25 14 6 2

99 96 92 82 71 60 50 41 32 22 14 7 2

99 96 92 83 73 65 56 48 39 28 17 8 3

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 57

Figure 4.1 Trends in National Performance Iowa Tests of Basic Skills — Complete Composite, Grades 3–8 National Standardization Data, 1955–2000 9.0 Grade 8

Grade-Equivalent Score

8.0 7 Grade

7.0

Grade 6 6.0 5 Grade

5.0 4 Grade

4.0 3 Grade

3.0 1955

1963

1970

1977

1984

1992

2000

Year of National Standardization

Since the early 1990s, no dominant trend in achievement test scores has appeared. Scores have increased slightly in some areas and grades and decreased slightly in others. In the context of achievement trends since the mid-1950s, achievement in the 1990s has been extremely stable. National trends in achievement measured by the ITBS Complete Composite and monitored across standardizations are shown in Figure 4.1. Differences in median performance for each standardization period from 1955 to 2000 are summarized in Table 4.3 using 1955 grade equivalents as the base unit. Between 1955 and 1963, achievement improved consistently for most test areas, grade levels, and achievement levels. The average change for the composite over all grades represented an improvement of 2.3 months.

From 1963 to 1970, differences in median composite scores were negligible, averaging a loss of twotenths of a month. Small but consistent qualitative differences occurred in some achievement areas and grades, however. In general, these changes were positive in the lower grades and tended to balance out in the upper grades. Gains were fairly consistent in Vocabulary and Sources of Information, but losses occurred in Reading and some Language skills. Math achievement improved in the lower grades, but sizable losses in concepts and problem solving occurred in the upper grades. Between 1970 and 1977, test performance generally declined, especially in the upper grades. The average median loss over grades 3 through 8 for the Composite score was 2.2 months. Differences varied from grade to grade and test to test. Differences in grade 3 were generally positive, especially at the median and above. From grades 4 to 8, performance declined more markedly. In general, the greatest 57

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 58

Table 4.3 Summary of Median Differences Iowa Tests of Basic Skills National Standardization Data, 1955–2000 (1995 National Grade Equivalents in “Months”) Reading Vocabulary

Grade Period

Spelling

Capitalization

Punctuation

L2

Mathematics

Concepts Problems Usage & Language Computa& Data & Expression Total tion InterpreEstimation tation

Sources of Information Math Total

Social Studies

Science

SS

SC

Maps Reference Sources and Materials Total Diagrams

Composite

Word Analysis

Listening

WA

Li

RV

RC

L1

L3

L4

LT

M1

M2

M3

MT

S1

S2

ST

CC

8

00-92 92-84 84-77 77-70 70-63 63-55

-1.0 1.1 5.2 -4.9 .4 -.8

-1.0 2.3 6.0 -4.2 -2.1 2.5

-2.2 -.6 5.7 -.3 1.0 2.1

-2.2 3.4 7.8 -10.1 .6 3.3

.1 11.4 5.9 -5.6 -1.6 -2.0

-.7 .4 7.8 -9.7 -3.1 1.6

-1.0 3.5 7.0 -7.2 .0 1.3

1.6 1.9 4.0 -7.9 -1.7 3.4

-1.3 3.1 5.0 -5.1 -3.1 1.6

-1.9 .3 5.6

.0 1.9 4.9 -6.4 -2.4 2.5

-.7 -.6 5.5 -5.2 .8 2.6

-1.5 -2.9 4.4 -2.6 1.4 1.8

-1.0 -1.8 5.0 -3.9 1.0 2.3

-.8 1.2 5.5 -5.0 -1.1 1.6

7

00-92 92-84 84-77 77-70 70-63 63-55

-2.3 1.1 3.1 -2.9 1.0 1.8

-.4 2.2 5.6 -2.6 -1.4 1.5

-2.0 .0 5.5 -1.9 1.1 4.8

-1.6 2.2 6.6 -6.4 -.9 6.6

-1.0 7.7 5.7 -5.4 .0 -2.4

-.6 2.8 5.0 -7.2 -1.6 1.4

-1.2 3.0 5.8 -5.2 .0 2.6

.3 .9 3.7 -6.1 -2.2 3.5

.7 .7 4.2 -5.2 -4.1 4.6

-.7 -.3 4.1

.5 .6 4.5 -5.6 -2.8 4.1

-.8 -.3 3.8 -3.3 1.7 3.8

-4.4 -1.5 2.1 -2.2 2.0 2.8

-2.4 -.9 3.0 -2.7 1.8 3.4

-1.2 1.1 4.3 -3.4 -.4 2.7

6

00-92 92-84 84-77 77-70 70-63 63-55

-.3 2.8 3.3 -3.4 .9 .8

.2 .6 5.5 -3.0 -2.6 2.2

-.8 .2 5.1 -1.0 -1.4 1.5

-1.0 3.0 4.4 -8.3 -2.8 4.5

.5 4.5 5.6 -6.4 -1.4 -.2

.9 3.3 4.9 -7.6 -1.6 2.4

.0 2.4 5.2 -5.8 -1.4 2.1

1.6 3.1 3.5 -6.4 -2.7 4.4

.0 1.2 1.9 -4.3 -3.0 2.5

-1.0 .9 2.9

.7 1.7 2.9 -5.3 -3.0 3.4

.1 -.7 4.0 -3.5 1.4 1.6

-4.1 -1.4 3.2 -3.4 .3 3.8

-1.9 -1.0 3.6 -3.4 1.1 2.3

-.3 1.3 4.1 -3.8 -1.1 2.2

5

00-92 92-84 84-77 77-70 70-63 63-55

-3.3 2.7 3.6 -2.2 1.4 .8

-3.7 3.1 5.2 -1.6 -.7 2.6

-5.3 2.8 5.5 .0 .4 4.0

-4.7 4.1 4.2 -3.6 -2.3 4.4

-3.2 5.5 4.5 -6.0 .0 1.8

-4.0 3.5 4.5 -5.3 -.8 2.0

-4.2 4.0 4.8 -3.7 -.5 3.1

-.6 2.6 3.0 -3.3 .8 3.6

-1.3 2.4 2.1 -3.8 .6 2.6

-1.0 .7 2.4

-.9 1.9 2.5 -3.5 .6 3.1

-1.1 -.6 2.7 -1.5 2.4 1.5

-4.1 .5 3.4 -1.5 .7 1.8

-2.5 .2 3.0 -1.5 1.5 1.6

-2.9 2.5 3.8 -2.3 .6 2.2

4

00-92 92-84 84-77 77-70 70-63 63-55

-3.6 1.3 2.0 -2.1 .6 3.6

-2.2 1.3 4.6 -.9 -.6 2.5

-4.3 1.7 4.8 .3 -1.0 3.4

-4.4 .7 3.0 -3.5 -1.6 6.6

-2.2 4.4 4.0 -3.4 -.6 3.4

-3.3 2.0 4.3 -2.2 -2.2 3.2

-3.5 2.0 4.2 -2.2 -1.4 4.2

-.8 2.0 2.7 -1.9 .6 1.9

-.4 1.5 1.6 -2.1 .3 .6

-1.0 .9 1.9

-.4 1.9 2.1 -2.0 .7 1.3

-2.0 1.2 1.8 -.5 2.3 1.0

-2.3 1.0 2.3 -1.3 1.0 2.8

-2.1 1.3 2.0 -.9 1.9 1.6

-2.4 1.6 2.9 -1.4 .3 2.6

3

00-92 92-84 84-77 77-70 70-63 63-55

-2.9 .7 1.4 3.9 1.3 2.4

-2.4 -1.1 3.2 3.7 .5 4.3

-1.5 2.4 3.2 2.1 .0 3.0

-2.9 3.7 1.9 -.3 -.6 4.6

-1.3 3.7 3.5 -1.4 1.0 1.8

-2.2 4.6 2.9 2.2 .0 .5

-1.9 3.5 2.8 .7 -.6 2.5

.2 1.7 2.1 1.0 .7 2.8

-.1 .4 1.3 2.0 1.9 .6

-1.0 .9 .8

.0 1.1 1.5 1.5 1.6 1.7

-2.4 2.5 1.4 1.9 .6 2.0

-1.9 .4 1.2 2.5 1.0 2.2

-2.1 1.9 1.3 2.2 .9 2.1

-1.9 1.2 2.1 2.6 .4 2.6

2

00-92 92-84 84-77 77-70

-.7 .3 1.9 4.2

-.8 -.1 3.2 1.8

-.6 1.3 1.8 -.4

.3 1.9 .6 -1.2

.6 2.6 1.9 -1.6

-.9 .7 .7

.4 1.7 1.1 -1.4

-.4 .9 1.3 1.2

-.4 2.1 2.0 1.0

.4 3.7 2.0 .9

-1.0 4.9 2.7 .7

1

00-92 92-84 84-77 77-70

-.8 -.9 1.8 .1

1.2 1.8 .9 .4

.1 2.7 2.6 -2.2

-.3 2.6 .8 -.8

-.1 .8 -.9

-.2 1.8 1.6 -.5

-.8 2.4 .9 .6

-1.0 1.5 2.7 -1.0

00-92 92-84 84-77 77-70 70-63 63-55

-2.2 1.6 3.1 -1.9 .9 1.4

-1.6 1.4 5.0 -1.4 -1.2 2.6

-2.0 -.1 3.0 -1.7 1.4 2.2

-1.6 1.5 3.8 -2.2 -.2 2.3

Av. Grades 3-8

58

Comprehension

Language

-2.7 1.1 5.0 -.6 .0 3.1

-2.8 2.9 4.7 -5.4 -1.3 5.0

-1.2 6.2 4.9 -4.7 -.4 .4

-1.7 2.8 4.9 -5.0 -1.6 1.8

-2.0 3.1 5.0 -3.9 -.6 2.6

.4 2.0 3.2 -4.1 -.8 3.3

-.4 1.6 2.7 -3.1 -1.2 2.1

-1.1 .6 3.0

.0 1.5 3.0 -2.1 .9 2.7

-1.2 .3 3.2 -2.0 1.5 2.1

-3.1 -.7 2.8 -1.4 1.1 2.5

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 59

Figure 4.2 Trends in Iowa Performance Iowa Tests of Basic Skills — Complete Composite, Grades 3–8 Iowa State Testing Program Data, 1955–2001 (1965 Iowa Grade Equivalents) 9.0

de

Gra

Grade-Equivalent Score

8.0

Gr

7.0

ad

8

e7

de 6

Gra

6.0

de 5

Gra

5.0 e4

Grad

4.0 e3

Grad

3.0 1955

1960

1965

1970

1975

1980

1985

1990

1995

2000

Year declines appeared in Capitalization, Usage and Expression, and Math Concepts. These trends are consistent with other national data on student performance for the same period (Koretz, 1986). Between 1977 and 1984, the improvement in ITBS test performance more than made up for previous losses in most test areas. Achievement in 1984 was at an all-time high in nearly all test areas. This upward trend continued through the 1980s and is reflected in the norms developed for Forms K and L in 1992. During February 2000, Forms K and A were jointly administered to a national sample of students in kindergarten through grade 8. This sample was selected to represent the norming population in terms of variability in achievement, the main requirement of an equating sample (Kolen & Brennan, 1995). A single-group, counterbalanced design was used. In each grade, students took Form K and Form A of the ITBS Complete Battery.

For each subtest and level, matched student records of Form K and Form A were created. Frequency distributions were obtained, and raw scores were linked by the equipercentile method. The resulting equating functions were then smoothed with cubic splines. This procedure defined the raw-score to raw-score relationship between Form A and Form K for each test. Standard scores on Form A could then be determined for norming dates before 2000 by linear interpolation. In this way, trend lines could be updated, and expected relations between new and old test norms could be determined. Trends in the national standardization data for the ITBS are reflected in greater detail in the trends for the state of Iowa. Virtually all schools in Iowa participate on a voluntary basis in the Iowa Basic Skills Testing Program. Trend lines for student performance have been monitored as part of the Iowa state testing program since the 1950s.

59

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 60

Trend lines for the state of Iowa (Figure 4.2) show a pattern in achievement test scores that is similar to that in national standardization samples. For any given grade, the peaks and valleys of overall achievement measured by the ITBS occur at about the same time. Further, both Iowa and national trends indicate that test scores in the lower elementary grades, grades 3 and 4, have generally held steady or risen since the first administration of the ITBS Multilevel Battery. An exception to this observation appears in the Iowa data in the years since the 1992 standardization, when declining Composite scores were observed in grades 3 and 4 for the first time. This decline was also evident in the 2000 national standardization and extended to grade 5.

Norms for Special School Populations As described in Part 2, the 2000 standardization sample included three independent samples: a public school sample, a Catholic school sample, and a private non-Catholic school sample. Schools in the standardization were further stratified by socioeconomic status. Data from these sources were used to develop special norms for The Iowa Tests for students enrolled in Catholic/private schools, as well as norms for other groups. The method used to develop norms was the same for each special school population. Frequency distributions from each grade in the standardization sample were cumulated for the relevant group of students. The cumulative distributions were then plotted and smoothed.

Interpretive Guide for Teachers and Counselors, show the parallelism achieved in content for each test and level. Alternate forms of tests should be similar in difficulty as well. Concurrent assembly of test forms provides some control over difficulty, but small differences between forms are typically observed during standardization. Equating methods are used to adjust scores for differences in difficulty not controlled during assembly of the forms. Forms A and B were assembled concurrently to the same content and difficulty specifications from the pool of items included in preliminary and national item tryouts. In the tests consisting of discrete questions (Vocabulary, Spelling, Capitalization, Punctuation, Concepts and Estimation, Computation, and parts of Social Studies, Science, and Reference Materials), items in the same or similar content, skills, and difficulty categories were first assigned, more or less at random, to Form A or B. Then, adjustments were made to avoid too much similarity from item to item and to achieve comparable difficulty distributions across forms. Concurrent assembly of multiple test forms is the best way to ensure comparability of scores and reasonable results from equating. Linking methods rely on comparable content to justify the term “equating” (Linn, 1993; Kolen & Brennan, 1995). The Iowa Tests are designed so truly equated scores on parallel forms can be obtained.

The Iowa Tests of Basic Skills have been restandardized approximately every seven years. Each time new forms are published, they are carefully equated to previous forms. Procedures for equating previous forms to each other have been described in the Manual for School Administrators for those forms. The procedures used in equating Forms A and B of the current edition are described in this part of the Guide to Research and Development.

Forms A and B of the Complete Battery were equated with a comparable-groups design (Petersen, Kolen & Hoover, 1989). In Levels 7 and 8 of the Complete Battery, which are read aloud by classroom teachers, test forms were administered by classroom to create comparable samples. Student records were matched to Form A records from the spring standardization. Frequency distributions in the fall sample were weighted so that the Form A and B cohorts had the same distribution on each subtest in the spring sample. The weighted frequency distributions were used to obtain the equipercentile relationship between Form A and B of each subtest. This relation was smoothed with cubic splines (Kolen, 1984) and standard scores were attached to Form B raw scores by interpolation.

The equivalence of alternate forms of The Iowa Tests is established through careful test development and standard methods of test equating. The tests are assembled to match tables of specifications that are sufficiently detailed to allow test developers to create equivalent forms in terms of test content. The tables of skill classifications, included in the

At Levels 9 through 14, Forms A and B were spiraled within classroom to obtain comparable samples. Frequency distributions for the two forms were linked by the equipercentile method and smoothed with cubic splines. Standard scores were attached to each raw score distribution using the equating results. The raw-score to standard-score

Equivalence of Forms

60

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 61

conversions were then smoothed. Table 4.4 reports the sample sizes used in the equating of Levels 7–14 of Forms A and B. Table 4.4 Sample Sizes for Equating Forms A and B Form B

Form A Level 7 8 9 10 11 12 13 14 1 2

Survey

1

3030 1697 3098 5966 5548 6189 5445 5834

Complete 1

Survey 2

5797 3703 3045 6038 5620 6533 5681 6091

2767 2006 1437 2918 2696 2857 2463 2633

Comparable-groups design Single-group design

The raw-score to standard-score conversions for the ITBS Survey Battery of Forms A and B also were developed with data from the 2000 fall standardization sample. At Levels 9 through 14 in the fall standardization, students took one form of the Complete Battery and the alternate form of the Survey Battery in a counterbalanced design. This joint administration defined the equipercentile relation between each Survey Battery subtest and the corresponding test of the Complete Battery via intact administrations of each version. The equating function was smoothed via cubic splines, and the resulting raw-score to raw-score conversion tables were used to attach standard scores to the Survey Battery raw scores. Forms A and B contain a variety of testing configurations of The Iowa Tests. For normative scores, methods for equating parallel forms used empirical data designed specifically to accomplish the desired linking. These methods do not rely on mathematical models, such as item response theory or strong true-score theory, which entail assumptions about the relationship between individual items and the domain from which they are drawn or about the shape of the distribution of unobservable true scores. Instead, these methods establish direct links between the empirical distributions of raw scores as they were observed in comparable samples of examinees. The equating results accommodate the influence of context or administrative sequence that could affect scores.

Relationships of Forms A and B to Previous Forms Forms 1 through 6 of the Iowa Tests of Basic Skills Multilevel Battery were equivalent forms in many ways. Pairs of forms—1 and 2, 3 and 4, 5 and 6— were assembled as equivalent forms in the manner described for Forms A and B. Because the objectives, placement, and methodology in basic skills instruction changed slowly when these forms were used, the content specifications of the three pairs of forms did not differ greatly. One exception, Math Concepts, was described previously. The organization of the tests in the battery, the number of items per level, the time limits, and even the number of items per page were identical for the first six forms. The first significant change in organization of the battery occurred with Forms 7 and 8, published in 1978. Separate tests in Map Reading and Reading Graphs and Tables were replaced by a single Visual Materials test. In mathematics, separate tests in Problem Solving and Computation replaced the test consisting of problems with embedded computation. Other major changes included a reduction in the average number of items per test, shorter testing time, a revision in grade-to-grade item overlap, and major revisions in the taxonomy of skills objectives. With Forms G and H, published in 1985, the format changed considerably. Sixteen pages were added to the multilevel test booklet. Additional modifications were made in grade-to-grade overlap and in number of items per test. For most purposes, however, Forms G, H, and J were considered equivalent to Forms 7 and 8 in all test areas except Language Usage. As indicated in Part 3, the scope of the Usage test was expanded to include appropriateness and effectiveness of expression as well as correct usage. Forms K, L, and M continued the gradual evolution of content specifications to adapt to changes in school curriculum. The most notable change was in the flexibility of configurations of The Iowa Tests to meet local assessment needs. The Survey Battery was introduced for schools that wanted general achievement information only in reading, language arts, and mathematics. The Survey Battery is described in Part 9. Other changes in Forms K, L, and M occurred in how Composite scores were defined. Three core Composite scores were established for these forms: Reading Total, Language Total, and Mathematics Total. The Reading Total was defined as the average of the

61

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 62

standard scores in Vocabulary and Reading Comprehension. The Language Total, identical to previous editions, was the average standard score of the four Language tests: Spelling, Capitalization, Punctuation, and Usage and Expression. The Math Total for Forms K, L, and M was defined in two ways: the average of the first two subtests (Concepts & Estimation and Problem Solving & Data Interpretation) or the average of all three subtests (including Computation). The Social Studies and Science tests, considered supplementary in previous editions, were moved to the Complete Battery and were added to the ITBS Complete Composite score beginning with Forms K, L, and M. Forms K, L, and M also introduced changes to the makeup of the tests in math and work-study skills (renamed Sources of Information). In math, a separately timed estimation component was added to the Concepts test, resulting in a new test called Concepts and Estimation. The Problem Solving test was modified by adding items on the interpretation of data using graphs and tables, which had been in the Visual Materials test in Forms G, H, and J. This math test was called Problem Solving and Data Interpretation in Forms K, L, and M. A concomitant change in what had been the Visual Materials test involved adding questions on schematic diagrams and other visual stimuli for a new test: Maps and Diagrams. An additional change in the overall design specifications for the tests in Forms K, L, and M concerned grade-to-grade overlap. Previous test forms had overlapping items that spanned three levels. Overlapping items in the Complete Battery of Forms K, L, and M, Levels 9–14, spanned two levels. The Survey Battery contained no overlapping items. Forms A and B of the ITBS are equivalent to Forms K, L, and M in most test areas. Minor changes were introduced in time limits, number of items, and content emphasis in Vocabulary, Reading Comprehension, Usage and Expression, Concepts and Estimation, Problem Solving and Data Interpretation, Computation, Science, and Reference Materials. These changes were described in Part 3.

62

The other fundamental change in the ITBS occurred in the 1970s with the introduction of the Primary Battery (Levels 7 and 8) with Forms 5 and 6 in 1971 and the Early Primary Battery (Levels 5 and 6) in 1977. These levels were developed to assess basic skills in kindergarten through grade 3. Machinescorable test booklets contain responses with pictures, words, phrases, and sentences designed for the age and developmental level of students in the early grades. In Levels 5 and 6 of the Early Primary Battery, questions in Listening, Word Analysis, Vocabulary, Language, and Mathematics are read aloud by the teacher. Students look at the responses in the test booklet as they listen. Only the Reading test in Level 6 requires students to read words, phrases, and sentences to answer the questions. In the original design of Levels 7 and 8, questions on tests in Listening, Word Analysis, Spelling, Usage and Expression, Mathematics (Concepts, Problems, and Computation), Visual Materials, Reference Materials, Social Studies, and Science were read aloud by the teacher. In Vocabulary, Reading, Capitalization, and Punctuation, students read the questions on their own. Because of changes in instructional emphasis, Levels 5 through 8 of the ITBS have been revised more extensively than other levels. Beginning with Forms K, L, and M, the order of the subtests was changed. The four Language tests were combined into a single test with all questions read aloud by the teacher. At the same time, graphs and tables were moved from Visual Materials to Math Problem Solving, and a single test on Sources of Information was created. Forms A and B are equivalent to Forms K, L, and M in most respects. The number of spelling items in the Language test was increased so that a separate Spelling score could be reported. Slight changes were also made in the number of items in several other subtests and in the number of response options.

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 63

PART 5

Reliability of The Iowa Tests

Methods of Determining, Reporting, and Using Reliability Data A soundly planned, carefully constructed, and comprehensively standardized achievement test battery represents the most accurate and dependable measure of student achievement available to parents, teachers, and school officials. Many subtle, extraneous factors that contribute to unreliability and bias in human judgment have little or no effect on standardized test scores. In addition, other factors that contribute to the apparent inconsistency in student performance can be effectively minimized in the testing situation: temporary changes in student motivation, health, and attentiveness; minor distractions inside and outside the classroom; limitations in number, scope, and comparability of the available samples of student work; and misunderstanding by students of what the teacher expects of them. The greater effectiveness of a well-constructed achievement test in controlling these factors—compared to a teacher’s informal evaluation of the same achievement—is evidenced by the higher reliability of the test. Test reliability may be quantified by a variety of statistical data, but such data reduce to two basic types of indices. The first of these indices is the reliability coefficient. In numerical value, the reliability coefficient is between .00 and .99, and generally for standardized tests between .60 and .95. The closer the coefficient approaches the upper limit, the greater the freedom of the test scores from the influence of factors that temporarily affect student performance and obscure real differences in achievement. This ready frame of reference for reliability coefficients is deceptive in its simplicity, however. It is impossible to conclude whether a value such as .75 represents a “high” or “low,” “satisfactory” or “unsatisfactory” reliability. Only after a coefficient has been compared to those of equally valid and equally practical alternative tests can such a judgment be made. In practice, there is always a degree of uncertainty regarding the terms “equally valid” and “equally practical,” so the reliability coefficient is rarely free of ambiguity. Nonetheless, comparisons of reliability coefficients

for alternative approaches to assessment can be useful in determining the relative stability of the resulting scores. The second of the statistical indices used to describe test reliability is the standard error of measurement. This index represents a measure of the net effect of all factors leading to inconsistency in student performance and to inconsistency in the interpretation of that performance. The standard error of measurement can be understood by a hypothetical example. Suppose students with the same reading ability were to take the same reading test. Despite their equal ability, they would not all get the same score. Instead, their scores would range across an interval. A few would get much higher scores than they deserve, a few much lower; the majority would get scores fairly close to their actual ability. Such variation in scores would be attributable to differences in motivation, attentiveness, and other factors suggested above. The standard error of measurement is an index of the variability of the scores of students having the same actual ability. It tells the degree of precision in placing a student at a point on the achievement continuum. There is, of course, no way to know just how much a given student’s achievement may have been underor overestimated from a single administration of a test. We may, however, make reasonable estimates of the amount by which the abilities of students in a particular reference group have been mismeasured. For about two-thirds of the examinees, the test scores obtained are “correct” within one standard error of measurement; for 95 percent, the scores are incorrect by less than two standard errors; for more than 99 percent, the scores are incorrect by less than three standard error values. Two methods of estimating reliability were used to obtain the summary statistics provided in the following two sections. The first method employed internal-consistency estimates using Kuder-Richardson Formula 20 (K-R20). Reliability coefficients derived by this technique were based on data from the entire national standardization sample. The coefficients for Form A of the Complete 63

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 64

Battery are reported here. Coefficients for Form B of the Complete Battery and Forms A and B of the Survey Battery are available in Norms and Score Conversions for each form and battery. The second method provided estimates of equivalent-forms reliability for Forms K and A from the spring 2000 equating of those forms, and for Forms A and B from the fall 2000 standardization sample. Prior to the spring standardization, a national sample of students took Forms K and A of the ITBS Complete Battery. Correlations between tests on alternate forms served as one estimate of reliability. During the fall standardization, students were administered the Complete Battery of one form and the Survey Battery of the other form. The observed relationships between scores on the Complete Battery and Survey Battery were used to estimate equivalent-forms reliability of the tests common to the two batteries. These estimates were computed from unweighted distributions of developmental standard scores.

64

Internal-Consistency Reliability Analysis The reliability data presented in Table 5.1 are based on Kuder-Richardson Formula 20 (K-R20) procedures. The means, standard deviations, and item proportions used in computing reliability coefficients for Form A are based on the entire spring national standardization sample. Means, standard deviations, and standard errors of measurement are shown for raw scores and developmental standard scores. Some tests in the current edition of the ITBS have fewer items and shorter time limits than previous forms. The reliability coefficients compare favorably with those of previous editions.

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 65

Table 5.1 Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

Level 5 Kindergarten

Vocabulary

Word Analysis

Listening

Language

Mathematics

Core Total

Reading Profile Total

V

WA

Li

L

M

CT

RPT

Number of items

29

30

29

29

29

Fall RSs

Mean SD SEM

17.3 4.7 2.3

15.8 5.7 2.4

16.3 5.1 2.5

15.5 5.3 2.4

16.2 4.7 2.3

SSs

Mean SD SEM

121.7 13.1 6.4

121.4 12.9 5.5

122.7 9.3 4.5

123.3 8.6 3.9

121.4 8.9 4.4

121.8 9.0 2.9

122.0 8.9 3.2

Reliability K-R20

.763

.820

.770

.797

.748

.896

.873

Spring RSs

Mean SD SEM

20.2 4.0 2.2

19.8 5.2 2.3

20.2 5.0 2.3

19.9 4.9 2.3

20.9 4.6 2.1

SSs

Mean SD SEM

131.1 15.0 8.2

131.5 14.3 6.3

130.8 10.8 4.9

131.1 9.4 4.4

130.7 9.8 4.5

130.8 9.8 3.4

130.9 11.1 3.8

Reliability K-R20

.699

.806

.793

.788

.793

.877

.882

Level 6 Grade 1 Number of items

Vocabulary

Word Analysis

Listening

Language

Mathematics

Core Total

Reading Words

Reading Comprehension

Reading Total

Reading Profile Total

V

WA

Li

L

M

CT

RW

RC

RT

RPT

31

35

31

31

35

29

19

48

18.4 6.3 2.3

8.5 4.4 1.9

27.5 10.0 3.0

Fall RSs

Mean SD SEM

18.8 4.9 2.4

21.5 5.8 2.5

18.2 5.1 2.5

16.1 5.5 2.5

19.9 5.8 2.6

SSs

Mean SD SEM

138.1 16.0 7.9

138.9 15.9 6.9

138.1 11.9 5.8

138.3 11.0 5.0

138.3 11.3 5.1

138.2 10.9 3.6

139.1 10.2 3.7

139.1 10.2 4.5

139.0 9.1 2.7

138.6 12.3 2.7

.754

.811

.764

.790

.793

.893

.871

.805

.909

.953

Reliability K-R20

Spring RSs

Mean SD SEM

22.2 4.3 2.3

25.7 5.2 2.3

22.7 4.5 2.2

21.6 4.9 2.3

25.1 5.6 2.4

SSs

Mean SD SEM

150.9 18.0 9.4

152.2 18.4 8.2

150.4 13.5 6.6

151.5 13.4 6.2

150.2 13.6 5.7

.725

.800

.758

.786

.821

Reliability K-R20

24.4 5.1 1.7

13.7 4.8 1.6

38.1 9.6 2.4

151.4 12.7 4.2

152.2 14.3 4.8

152.2 14.3 4.8

151.5 13.5 3.4

151.3 12.6 3.1

.890

.886

.889

.937

.938

65

.916

.896

.910

.886

Note: -Does not include Computation +Includes Computation

K-R20

152.2 14.3 4.3

150.9 18.0 6.1

Mean SD SEM

SSs

Reliability

22.4 7.9 2.4

18.6 6.8 2.3

Mean SD SEM

RSs

Spring — Grade 1

K-R20

158.9 16.3 4.7

157.5 19.0 6.1

Mean SD SEM

SSs

Reliability

25.6 7.4 2.1

20.9 6.6 2.1

34

30 Mean SD SEM

RC

RV

RSs

Comprehension

Vocabulary

.925

151.3 13.5 3.7

.940

158.4 15.8 3.9

RT

Reading Total

.853

152.2 18.4 7.1

23.7 6.5 2.5

.868

159.2 20.4 7.4

25.8 6.3 2.3

35

WA

Word Analysis

.699

150.4 13.5 7.4

20.7 4.4 2.4

.716

156.9 14.8 7.9

22.6 4.2 2.2

31

Li

Listening

.880

150.4 11.2 3.9

16.3 5.3 1.8

.907

157.0 12.9 3.9

18.7 5.0 1.5

23

L1

Spelling

.869

151.5 13.4 4.8

23.2 6.7 2.4

.874

158.1 15.1 5.4

26.0 6.1 2.2

34

L

Language

.776

150.0 13.5 6.4

20.4 4.5 2.1

.806

156.5 14.5 6.4

22.4 4.3 1.9

29

M1

Concepts

.807

150.7 15.9 7.0

17.0 5.1 2.2

.845

157.5 17.3 6.8

19.0 5.2 2.0

28

M2

Problems & Data Interpretation

.878

150.2 13.6 4.7

.900

157.1 14.8 4.7

M3

MT -

.865

150.1 9.3 3.4

18.6 5.6 2.0

.872

154.2 9.9 3.5

20.6 5.2 1.9

27

Computation

Math Total

Mathematics

.910

150.2 11.2 3.4

.932

156.4 12.8 3.3

MT +

Math Total

.959

151.4 12.7 2.6

.962

157.8 13.8 2.7

CT -

Core Total

.964

151.7 12.2 2.3

.966

157.4 13.4 2.5

CT +

Core Total

.750

151.1 15.2 7.6

21.8 4.7 2.3

.755

157.8 16.3 8.1

23.6 4.4 2.2

31

SS

Social Studies

.726

149.8 16.8 8.8

23.8 4.0 2.1

.702

157.4 18.3 10.0

25.3 3.5 1.9

31

SC

Science

.843

150.5 13.1 5.2

14.3 4.9 1.9

.862

157.6 14.8 5.5

16.6 4.6 1.7

22

SI

.966

151.2 12.3 2.3

.966

157.5 13.2 2.4

CC -

Sources Compoof site Information

.965

151.4 11.8 2.2

.966

157.8 13.0 2.4

CC +

Composite

.956

151.3 12.6 2.6

.961

158.5 14.0 2.8

RPT

Reading Profile Total

3:15 PM

Fall — Grade 2

66

10/29/10

Number of items

Level 7

Reading

Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

961464_ITBS_GuidetoRD.qxp Page 66

.900

.875

.897

.873

Note: -Does not include Computation +Includes Computation

K-R20

170.7 19.6 6.3

168.6 19.8 7.1

Mean SD SEM

SSs

Reliability

27.9 7.2 2.3

20.0 6.8 2.4

Mean SD SEM

RSs

Spring — Grade 2

K-R20

177.5 21.4 6.8

175.4 20.6 7.3

Mean SD SEM

SSs

Reliability

29.9 6.8 2.1

22.3 6.5 2.3

Mean SD SEM

RSs

Fall — Grade 3

38

RC

RV

.939

170.0 19.1 4.7

.939

176.3 20.1 5.0

RT

Reading Total

.847

171.0 23.7 9.3

24.3 6.8 2.7

.862

177.6 25.4 9.4

26.1 6.8 2.5

38

WA

.723

168.2 16.3 8.6

21.5 4.5 2.4

.740

174.6 17.3 8.8

23.1 4.4 2.2

31

Li

Listening

.821

168.5 15.8 6.7

16.9 4.4 1.9

.853

175.4 17.9 6.8

18.5 4.3 1.6

23

L1

Spelling

.875

169.8 17.2 6.1

30.1 7.4 2.6

.891

177.0 19.5 6.4

32.7 7.1 2.4

42

L

Language

.787

168.1 16.5 7.6

21.4 4.8 2.2

.815

174.5 17.7 7.6

23.2 4.7 2.0

31

M1

Concepts

.834

169.1 19.8 8.1

19.5 5.5 2.2

.859

176.1 21.3 8.0

21.2 5.6 2.1

30

M2

Problems & Data Interpretation

.892

168.6 16.9 5.6

.910

175.6 18.4 5.5

M3

MT -

.839

168.3 13.1 5.3

20.0 5.2 2.1

.854

172.3 13.9 5.3

21.4 5.2 2.0

30

Computation

Math Total

Mathematics

.922

168.9 14.7 4.1

.935

174.5 16.0 4.1

MT +

Math Total

.960

169.6 15.9 3.2

.965

175.9 17.4 3.3

CT -

Core Total

.964

169.9 15.3 2.9

.967

175.7 16.7 3.0

CT +

Core Total

.655

169.5 17.8 10.5

20.8 4.1 2.4

.714

176.6 19.4 10.4

22.3 4.2 2.2

31

SS

Social Studies

.726

169.7 21.2 11.1

21.1 4.5 2.4

.736

176.9 22.5 11.6

22.6 4.3 2.2

31

SC

Science

.851

169.1 17.1 6.6

18.5 5.7 2.2

.869

176.5 18.8 6.8

20.8 5.5 2.0

28

SI

.964

169.6 15.2 2.9

.971

176.4 17.1 2.9

CC -

Sources Compoof site Information

.964

169.5 15.0 2.8

.971

175.8 17.1 2.9

CC +

Composite

.955

169.7 16.1 3.4

.957

176.3 17.1 3.5

RPT

Reading Profile Total

3:15 PM

32

Comprehension

Vocabulary

Word Analysis

10/29/10

Number of items

Level 8

Reading

Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

961464_ITBS_GuidetoRD.qxp Page 67

67

15.4 6.6 2.3

12.5 5.2 2.1

11.8 5.1 2.1

24

L3

15.7 6.9 2.4

30

L4

LT

Usage & Language Expression Total

16.4 5.7 2.5

31

M1

11.7 4.9 2.0

22

M2

Concepts Problems & Data & InterpreEstimation tation

M3

MT -

11.5 5.1 2.1

25

Computation

Math Total

MT +

Math Total

CT -

CT +

Core Total

16.1 6.0 2.4

30

SS

Social Studies

13.9 6.1 2.4

30

SC

Science

13.1 4.4 2.1

24

S1

13.0 6.0 2.3

28

S2

ST

Maps Reference Sources and Materials Total Diagrams

Sources of Information

CC -

CC +

Compo- Composite site

21.8 6.2 2.6

35

WA

Word Analysis

19.3 4.4 2.5

31

Li

Listening

RPT

Reading Profile Total

18.8 6.5 2.1

.882

14.7 5.5 2.0

.838

14.0 5.3 2.1

.833

18.6 7.0 2.3

.878

.953

19.7 5.7 2.4

.813

13.8 5.1 1.9

.833

.900

16.0 5.6 2.0

.823

.927

.972

.976

18.9 6.0 2.3

.840

16.5 6.3 2.4

.845

15.0 4.6 2.0

.763

16.0 6.3 2.3

.844

.892

.976

.977

23.9 6.3 2.5

.822

21.6 4.5 2.3

.679

.953

.896

.912

.946

Note: -Does not include Computation +Includes Computation

Reliability K-R20

.897

.863

.850

.892

.957

.827

.855

.912

.865

.934

.976

.979

.848

.855

.805

.868

.901

.980

.981

.849

.736

.960

Mean 185.0 187.8 186.2 185.8 187.2 188.3 188.7 188.1 185.2 186.6 186.2 185.4 185.8 186.8 186.9 186.8 187.4 187.2 186.3 187.0 187.4 187.3 187.2 184.2 186.2 SD 21.6 24.5 21.7 20.4 29.2 27.4 28.2 22.7 19.2 24.0 20.5 16.7 17.7 19.9 19.1 21.7 25.2 25.0 19.8 21.0 20.0 19.9 28.6 19.2 19.0 7.0 7.3 5.0 6.5 10.8 10.6 9.3 4.7 8.0 9.1 6.1 6.1 4.5 3.1 2.8 8.4 9.6 11.0 7.2 6.6 2.8 2.8 11.1 9.8 3.8 SEM

.940

SSs

21.5 8.7 2.6

.896

Mean SD SEM

18.1 6.8 2.2

.885

RSs

Spring

Reliability K-R20

Mean 175.4 177.5 176.3 175.4 175.1 177.5 176.8 177.0 174.5 176.1 175.6 172.3 174.5 175.9 175.7 176.6 176.9 177.6 176.5 176.5 176.4 175.8 177.6 174.6 176.3 SD 20.6 21.4 20.1 17.9 23.2 23.6 23.9 19.5 17.7 21.3 18.4 13.9 16.0 17.4 16.7 19.4 22.5 21.5 16.8 18.8 17.1 17.1 25.4 17.3 17.1 7.0 6.9 4.9 6.1 9.3 9.6 8.4 4.2 7.6 8.7 5.8 5.8 4.3 2.9 2.6 7.8 8.9 10.4 6.6 6.2 2.6 2.6 10.7 9.8 3.7 SEM

17.9 8.2 2.7

24

L2

Punctuation

Core Total

SSs

15.0 6.8 2.3

28

L1

Spelling

Capitalization

Mathematics

Mean SD SEM

RT

Reading Total

Language

RSs

Fall

37

RC

RV

29

Comprehension

Vocabulary

Reading

3:15 PM

Number of items

68

10/29/10

Grade 3

Level 9

Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

961464_ITBS_GuidetoRD.qxp Page 68

Mean SD SEM

SSs

Mean SD SEM

SSs .944

.906

.892

202.5 25.3 8.3

201.2 24.4 5.8

199.9 202.6 23.4 28.7 7.2 9.0 .901

20.3 7.0 2.3

24.4 8.8 2.8

20.3 8.0 2.4

.882

.938

.895

.887

192.2 22.2 7.6

191.8 22.8 5.7

191.1 193.8 22.5 25.9 7.3 8.7

32

L1

17.4 7.1 2.4

RT

Spelling

21.8 8.4 2.8

17.3 7.8 2.5

41

Note: -Does not include Computation +Includes Computation

Reliability K-R20

Mean SD SEM

RSs

Spring

Reliability K-R20

Mean SD SEM

RSs

Fall

34

RC

RV

Reading Total

.841

204.0 36.2 14.4

15.8 5.3 2.1

.820

194.0 31.4 13.3

14.4 5.2 2.2

26

L2

Capitalization

.853

204.9 34.4 13.2

14.3 5.8 2.2

.823

195.4 30.0 12.6

12.7 5.4 2.3

26

L3

.902

204.9 34.6 10.8

20.4 7.7 2.4

.893

195.1 30.5 10.0

18.4 7.6 2.5

33

L4

19.2 6.9 2.7

36

M1

22.4 7.2 2.6

.848

.956

.872

203.9 200.4 28.4 22.5 6.0 8.0

.950

.845

202.9 28.3 11.1

14.4 5.1 2.0

.817

193.2 25.4 10.9

12.8 4.9 2.1

24

M2

.918

201.6 24.0 6.9

.905

191.7 21.8 6.7

M3

MT -

.878

200.7 20.5 7.2

17.0 6.1 2.1

.839

188.8 17.4 7.0

13.5 5.6 2.2

27

Computation

Math Total

Mathematics Concepts Problems & Data & InterpreEstimation tation

194.5 190.4 24.9 20.2 5.6 7.9

LT

Punctu- Usage & Language ation Expression Total

Language

.940

201.5 21.1 5.2

.929

190.5 18.9 5.0

MT +

Math Total

CT +

Core Total

.976

.977

.980

202.5 202.5 23.7 22.9 3.6 3.3

.973

192.6 192.5 21.2 20.4 3.5 3.1

CT -

Core Total

16.7 7.3 2.6

34

SC

Science

19.0 7.5 2.6

.872

.856

.881

202.6 203.5 26.4 29.2 10.0 10.1

20.8 6.7 2.5

.842

192.6 193.8 23.3 26.6 9.2 9.5

18.2 6.5 2.6

34

SS

Social Studies

.842

204.1 31.0 12.3

15.1 5.4 2.1

.818

193.2 27.3 11.6

13.2 5.2 2.2

25

S1

.869

203.2 25.2 9.1

18.2 6.4 2.3

.866

192.7 21.7 7.9

15.6 6.5 2.4

30

S2

.913

203.3 26.0 7.7

.905

192.7 22.8 7.0

ST

Maps Reference Sources and Materials Total Diagrams

Sources of Information

.982

203.1 23.9 3.2

.980

192.9 21.3 3.0

CC -

.983

203.2 23.9 3.1

.981

192.7 21.4 3.0

CC +

Compo- Composite site

3:15 PM

Number of items

Comprehension

Vocabulary

Reading

10/29/10

Grade 4

Level 10

Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

961464_ITBS_GuidetoRD.qxp Page 69

69

Mean SD SEM

SSs

Mean SD SEM

SSs .943

.893

.903

216.9 29.3 9.1

214.6 27.3 6.5

214.0 215.5 25.5 32.2 8.3 10.0 .904

21.9 7.9 2.5

26.2 9.1 2.8

21.7 8.0 2.6

.934

.873

.895

207.7 26.8 8.7

205.8 25.4 6.5

205.1 207.0 24.0 29.9 8.5 9.8 .892

19.4 7.9 2.5

36

L1

Spelling

23.8 8.9 2.9

19.0 7.6 2.7

RT

Reading Total

Note: -Does not include Computation +Includes Computation

Reliability K-R20

Mean SD SEM

RSs

Spring

Reliability K-R20

Mean SD SEM

RSs

Fall

43

RC

RV

37

Comprehension

Vocabulary

Reading

.851

218.5 41.0 15.8

16.8 5.9 2.3

.840

209.0 37.9 15.2

15.5 5.7 2.3

28

L2

Capitalization

.870

219.3 40.0 14.4

16.0 6.4 2.3

.852

210.3 36.6 14.1

14.7 6.1 2.3

28

L3

.892

218.9 40.1 13.2

21.5 7.7 2.5

.881

209.8 36.4 12.6

19.8 7.5 2.6

35

L4

21.0 7.4 2.8

40

M1

23.9 7.7 2.7

.858

.960

.874

218.3 214.6 33.3 25.4 6.7 9.0

.955

.861

216.9 32.2 12.0

15.1 5.8 2.2

.841

208.1 29.7 11.9

13.5 5.6 2.2

26

M2

.927

215.7 27.9 7.5

.915

206.9 25.6 7.4

M3

MT -

.890

215.3 24.7 8.2

18.0 6.6 2.2

.858

204.2 21.4 8.0

15.2 6.2 2.3

29

Computation

Math Total

Mathematics Concepts Problems & Data & InterpreEstimation tation

209.2 205.3 30.5 23.7 6.4 8.9

LT

Punctu- Usage & Language ation Expression Total

Language

.947

215.7 24.8 5.7

.936

205.7 22.3 5.6

MT +

Math Total

CT +

Core Total

.978

.979

.981

216.5 216.3 27.2 26.3 4.0 3.6

.976

207.6 207.3 25.4 24.2 3.9 3.6

CT -

Core Total

18.7 7.9 2.7

37

SC

Science

21.0 8.1 2.7

.881

.865

.891

217.0 217.9 31.2 33.3 11.5 11.0

21.2 7.3 2.7

.847

207.2 208.5 28.2 30.6 11.0 10.6

18.8 7.0 2.7

37

SS

Social Studies

.835

218.1 35.4 14.4

14.6 5.5 2.2

.808

209.1 32.4 14.2

13.4 5.2 2.3

26

S1

.884

217.9 30.0 10.2

19.8 7.1 2.4

.876

208.1 26.8 9.4

17.6 7.1 2.5

32

S2

.917

217.8 30.6 8.8

.906

208.7 27.8 8.5

ST

Maps Reference Sources and Materials Total Diagrams

Sources of Information

.983

217.2 27.7 3.6

.980

208.1 25.1 3.5

CC -

.983

217.1 27.5 3.5

.982

207.9 25.2 3.4

CC +

Compo- Composite site

3:15 PM

Number of items

70

10/29/10

Grade 5

Level 11

Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

961464_ITBS_GuidetoRD.qxp Page 70

Mean SD SEM

SSs

Mean SD SEM

SSs .944

.892

.901

229.5 32.2 10.1

227.0 29.6 7.0

226.7 227.3 27.5 35.3 9.0 10.7 .908

23.5 8.2 2.6

27.7 9.5 2.9

24.0 8.1 2.7

.896

.938

.878

.899

221.5 30.3 9.8

219.5 28.2 7.0

219.2 220.0 26.3 33.4 9.2 10.6

38

L1

21.4 8.2 2.6

RT

Spelling

25.7 9.3 2.9

21.7 7.9 2.8

45

Note: -Does not include Computation +Includes Computation

Reliability K-R20

Mean SD SEM

RSs

Spring

Reliability K-R20

Mean SD SEM

RSs

Fall

39

RC

RV

Reading Total

.825

231.1 44.6 18.6

17.8 5.7 2.4

.813

223.1 42.1 18.2

16.8 5.6 2.4

30

L2

Capitalization

.882

232.4 45.4 15.6

18.4 6.7 2.3

.862

224.0 41.5 15.4

17.3 6.5 2.4

30

L3

.904

230.8 45.0 13.9

23.8 8.4 2.6

.895

223.3 41.7 13.5

22.6 8.2 2.7

38

L4

23.6 8.6 2.9

43

M1

26.1 8.8 2.8

.886

.959

.899

230.8 227.6 36.8 28.5 7.4 9.0

.956

.860

229.9 36.3 13.6

17.4 5.9 2.2

.841

221.8 33.7 13.4

16.1 5.8 2.3

28

M2

.929

228.9 30.6 8.2

.921

220.6 28.9 8.1

M3

MT -

.858

228.4 29.3 11.1

18.3 6.1 2.3

.814

219.3 25.7 11.1

16.5 5.6 2.4

30

Computation

Math Total

Mathematics Concepts Problems & Data & InterpreEstimation tation

222.5 219.3 34.8 26.7 7.3 9.0

LT

Punctu- Usage & Language ation Expression Total

Language

.944

228.8 27.9 6.6

.936

219.8 25.8 6.5

MT +

Math Total

CT +

Core Total

.979

.979

.980

229.3 229.3 29.8 28.9 4.4 4.1

.976

220.6 220.9 28.2 27.5 4.3 4.0

CT -

Core Total

20.5 8.3 2.8

39

SC

Science

22.7 8.5 2.7

.888

.863

.897

229.6 230.7 35.5 36.8 13.1 11.8

21.5 7.6 2.8

.842

221.5 221.8 32.5 34.6 12.9 11.6

20.0 7.2 2.9

39

SS

Social Studies

.830

230.3 39.2 16.2

16.2 5.6 2.3

.815

223.0 37.0 15.9

15.1 5.4 2.3

28

S1

.873

230.8 34.2 12.2

19.9 7.0 2.5

.861

222.6 31.6 11.8

18.2 6.8 2.5

34

S2

.912

230.5 34.2 10.1

.904

222.4 32.0 9.9

ST

Maps Reference Sources and Materials Total Diagrams

Sources of Information

.983

230.1 30.5 4.0

.981

221.7 28.5 4.0

CC -

.983

229.8 30.4 4.0

.981

221.4 28.5 3.9

CC +

Compo- Composite site

3:15 PM

Number of items

Comprehension

Vocabulary

Reading

10/29/10

Grade 6

Level 12

Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

961464_ITBS_GuidetoRD.qxp Page 71

71

Number of items

Mean SD SEM

SSs

Mean SD SEM

SSs .948

.886

.910

240.9 34.0 10.2

238.3 32.1 7.3

238.1 238.4 29.0 38.6 9.8 10.8 .922

24.0 8.9 2.7

28.4 10.7 3.0

24.0 8.3 2.8

.941

.872

.906

233.5 32.8 10.1

231.1 30.4 7.4

231.2 231.4 28.2 36.3 10.1 10.8 .912

22.3 8.9 2.7

40

L1

Spelling

26.5 10.3 3.1

22.0 8.0 2.9

RT

Reading Total

Note: -Does not include Computation +Includes Computation

Reliability K-R20

Mean SD SEM

RSs

Spring

Reliability K-R20

Mean SD SEM

RSs

48

RC

RV

41

Comprehension

Vocabulary

Reading

.837

242.5 48.1 19.4

19.2 6.1 2.4

.821

235.4 45.9 19.4

18.2 5.9 2.5

32

L2

Capitalization

.879

243.6 49.2 17.1

18.6 7.0 2.4

.867

236.8 47.0 17.1

17.6 6.8 2.5

32

L3

.905

241.6 48.8 15.1

24.2 8.7 2.7

.898

234.7 46.3 14.8

22.9 8.5 2.7

40

L4

25.6 9.0 3.0

46

M1

28.0 9.3 2.9

.890

.961

.902

241.6 239.5 40.0 31.1 7.9 9.7

.957

.870

241.3 39.5 14.2

18.0 6.5 2.3

.853

234.0 37.3 14.3

17.0 6.2 2.4

30

M2

.935

240.5 33.9 8.6

.925

233.1 31.7 8.7

M3

MT -

.848

240.6 33.5 13.1

16.7 6.2 2.4

.796

231.8 30.1 13.6

15.1 5.5 2.5

31

Computation

Math Total

Mathematics Concepts Problems & Data & InterpreEstimation tation

234.7 231.7 38.2 29.3 7.9 9.7

LT

Punctu- Usage & Language ation Expression Total

Language

.945

240.4 30.9 7.2

.934

232.4 28.5 7.3

MT +

Math Total

CT +

Core Total

.979

.980

.982

240.8 240.9 32.9 31.8 4.6 4.3

.978

233.2 233.0 30.8 29.9 4.6 4.3

CT -

Core Total

21.4 8.6 2.8

41

SC

Science

23.2 8.8 2.8

.890

.877

.900

240.7 241.7 39.0 39.9 13.7 12.6

21.4 8.3 2.9

.857

233.2 233.9 36.4 37.8 13.8 12.5

19.8 7.8 2.9

41

SS

Social Studies

.817

242.1 42.9 18.3

15.8 5.6 2.4

.791

234.5 40.4 18.5

15.0 5.3 2.4

30

S1

.884

242.2 37.8 12.9

20.5 7.7 2.6

.876

235.0 35.6 12.5

19.2 7.6 2.7

36

S2

.911

242.3 37.5 11.2

.901

234.9 35.4 11.2

ST

Maps Reference Sources and Materials Total Diagrams

Sources of Information

.984

241.1 33.6 4.3

.982

233.7 31.8 4.3

CC -

.984

241.1 33.2 4.2

.982

233.8 31.4 4.2

CC +

Compo- Composite site

3:15 PM

Fall

72

10/29/10

Grade 7

Level 13

Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

961464_ITBS_GuidetoRD.qxp Page 72

Mean SD SEM

SSs

Mean SD SEM

SSs .950

.890

.903

251.2 35.6 11.1

248.8 34.1 7.6

248.7 248.9 30.9 41.4 10.2 11.3 .925

24.5 8.8 2.7

30.9 11.2 3.1

24.8 8.5 2.8

.898

.944

.874

.917

244.4 34.5 11.0

242.3 32.8 7.8

241.9 242.4 29.7 39.5 10.5 11.4

42

L1

22.8 8.8 2.8

RT

Spelling

29.1 10.9 3.1

22.9 8.2 2.9

52

Note: -Does not include Computation +Includes Computation

Reliability K-R20

Mean SD SEM

RSs

Spring

Reliability K-R20

Mean SD SEM

RSs

Fall

42

RC

RV

Reading Total

.843

251.7 50.5 20.0

20.3 6.3 2.5

.836

246.0 49.0 19.9

19.6 6.2 2.5

34

L2

Capitalization

.872

252.4 51.6 18.4

19.0 7.2 2.6

.862

247.2 50.0 18.6

18.3 7.0 2.6

34

L3

.896

251.5 52.6 17.0

24.6 8.8 2.8

.885

245.2 50.2 17.0

23.7 8.5 2.9

43

L4

24.6 9.4 3.1

49

M1

26.7 9.9 3.1

.890

.960

.904

251.6 250.4 42.6 33.5 8.5 10.4

.957

.879

250.9 42.3 14.7

18.7 6.9 2.4

.862

244.9 40.4 15.0

17.8 6.7 2.5

32

M2

.938

250.8 36.1 9.0

.929

244.4 34.5 9.2

M3

MT -

.864

251.3 36.8 13.6

15.9 6.7 2.5

.819

243.9 34.1 14.5

14.6 6.0 2.5

32

Computation

Math Total

Mathematics Concepts Problems & Data & InterpreEstimation tation

245.4 243.5 41.0 32.0 8.5 10.6

LT

Punctu- Usage & Language ation Expression Total

Language

.949

251.0 33.2 7.5

.939

244.1 31.6 7.8

MT +

Math Total

CT +

Core Total

.980

.980

.982

250.9 251.0 34.5 33.6 4.8 4.6

.979

244.3 244.1 33.5 32.6 4.9 4.6

CT -

Core Total

22.3 8.8 2.9

43

SC

Science

23.6 9.0 2.9

.889

.884

.898

250.6 251.5 42.1 42.4 14.3 13.5

23.2 8.7 2.9

.869

244.2 245.0 39.8 40.6 14.4 13.5

22.1 8.3 3.0

43

SS

Social Studies

.837

251.7 45.6 18.4

16.6 6.0 2.4

.823

245.8 44.1 18.5

15.9 5.9 2.5

31

S1

.872

251.9 40.2 14.4

21.8 7.5 2.7

.863

246.0 38.6 14.3

20.8 7.4 2.8

38

S2

.914

251.7 39.8 11.7

.908

245.8 38.5 11.7

ST

Maps Reference Sources and Materials Total Diagrams

Sources of Information

.984

251.9 35.6 4.5

.983

244.8 34.4 4.5

CC -

.984

251.7 35.2 4.4

.983

244.9 33.9 4.5

CC +

Compo- Composite site

3:15 PM

Number of items

Comprehension

Vocabulary

Reading

10/29/10

Grade 8

Level 14

Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

961464_ITBS_GuidetoRD.qxp Page 73

73

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 74

ITBS. In general, alternate-forms coefficients tend to be smaller than their internal-consistency counterparts because they are sensitive to more sources of measurement error. The coefficients in Table 5.2 also reflect changes across editions of The Iowa Tests.

Equivalent-Forms Reliability Analysis Reliability coefficients obtained by correlating the scores from equivalent forms are considered superior to those derived through internalconsistency procedures because all four major sources of error are taken into account: variations arising within the measurement procedure, changes in the specific sample of tasks, changes in the individual from day to day, and changes in the individual’s speed of work. Internal-consistency procedures take into account only the first two sources of error. For this reason, K-R20 reliability estimates tend to be higher than those obtained through the administration of equivalent forms.

Another source of alternate-forms reliability came from the 2000 fall standardization sample. During the fall administration, students took one form of the Complete Battery and a different form of the Survey Battery. The correlations between standard scores on subtests in both Complete and Survey batteries represent indirect estimates of equivalentforms reliability. To render these correlations consistent with the length and variability of Complete and Survey subtests, the estimates reported in Table 5.3 were adjusted for differences in length of the two batteries as well as for differences in variability typically observed between fall and spring test administrations. These reliability coefficients isolate the presence of form-to-form differences in the sample of tasks included on the tests at each level. Equivalent-forms reliability estimates for Total scores and Composites show a tendency to be lower than the corresponding K-R20 coefficients in Table 5.1; however, their magnitudes are comparable to those of internal-consistency reliabilities reported for the subtests of major achievement batteries.

The principal reason that equivalent-forms reliability data are not usually provided with all editions, forms, and levels of achievement batteries is that it is extremely difficult to obtain the cooperation of a truly representative sample of schools for such a demanding project. The reliability coefficients in Table 5.2 are based on data from the equating of Form A to Form K. Prior to the 2000 spring standardization, a national sample of students in kindergarten through grade 8 took both Form K and Form A. Between-test correlations from this administration are direct estimates of the alternate-forms reliability of the

Table 5.2 Equivalent-Forms Reliabilities, Levels 5–14 Iowa Tests of Basic Skills — Complete Battery, Forms A and K Spring 2000 Equating Sample Reading

Level (N)

Language

Vocabulary

Comprehension

Reading Total

Spelling

Capitalization

RV

RC

RT

L1

L2

L3

Sources of Information

Mathematics

Concepts Problems Punctu- Usage & Language Computa& Data & ation Expression Total tion InterpreEstimation tation

L4

LT

M1

M2

M3

Math Total

MT

Social Studies

Science

SS

SC

Maps Reference Sources and Materials Total Diagrams

S1

S2

5 (418)

.63

6 (1121)

.72

7 (879)

.81

.86

.81

.77

.77

.78

.69

.64

8 (1111)

.81

.83

.83

.79

.78

.71

.64

.64

9 (684)

.78

.82

.80

.76

.73

.79

.76

.76

.73

.73

.72

.73

.77

10 (596)

.80

.79

.84

.74

.77

.79

.81

.77

.78

.74

.77

.75

.81

11 (919)

.82

.83

.85

.78

.81

.80

.82

.80

.79

.79

.83

.76

.80

12 (824)

.86

.78

.88

.78

.81

.81

.80

.77

.78

.76

.78

.74

.78

13 (939)

.84

.84

.86

.79

.80

.82

.83

.76

.76

.77

.76

.74

.74

14 (857)

.80

.79

.85

.77

.77

.80

.86

.79

.78

.80

.77

.68

.77

74

.93

ST

Word Analysis

Listening

WA

Li

.74

.74

.78

.75

.76

.82

.78

.70

.74

.80

.67

.76

.82

.67

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 75

Table 5.3 Estimates of Equivalent-Forms Reliability Iowa Tests of Basic Skills — Complete Battery, Forms A and B 2000 National Standardization Time of Year

Reading Total

Language Total

Math Total

Math Total

Core Total

Core Total

Level

RT

LT

MT -

MT +

CT -

CT +

Fall 9 10 11 12 13 14

.854 .852 .855 .870 .866 .858

.863 .882 .888 .890 .911 .893

.817 .811 .866 .842 .849 .859

.839 .836 .879 .865 .854 .874

.920 .915 .925 .926 .927 .927

.923 .919 .927 .929 .928 .929

Spring 9 10 11 12 13 14

.877 .872 .876 .883 .881 .869

.902 .911 .907 .902 .920 .901

.856 .848 .889 .861 .869 .872

.870 .870 .903 .885 .877 .886

.939 .933 .935 .934 .936 .931

.942 .936 .939 .936 .937 .933

Note:

-Does not include Computation +Includes Computation

Sources of Error in Measurement Further investigation of sources of error in measurement for the ITBS was provided in two studies of equivalent-forms reliability. The first (Table 5.4) used data from the spring 2000 equating of Forms K and A. The second (Table 5.5) used data from the fall 1995 equating of Forms K and M of the Primary Battery. As previously described, Forms K and A of the ITBS were given to a large national sample of schools selected to be representative with respect to variability in achievement. Order of administration of the two forms was counterbalanced by school, and there was a seven- to ten-day lag between test administrations. The design of this study made possible an analysis of relative contributions of various sources of measurement error across tests, grades, and schools. In addition to equivalent-forms reliability coefficients, three other “within-forms” reliability coefficients were computed for each school, for each test, for each form, and in each sequence: • K-R20 reliability coefficients were calculated from the item-response records.

• Split-halves, odds-evens (SHOE) reliability coefficients were computed by correlating the raw scores from odd-numbered versus even-numbered items. Full-test reliabilities were estimated using the Spearman-Brown formula. • Split-half, equivalent-halves (SHEH) reliability coefficients were obtained by dividing items within each form into equivalent half-tests in terms of content, difficulty, and test length. For tests composed of discrete items such as Spelling, equivalent halves were assembled by matching pairs of items testing the same skill and having approximately the same difficulty index. One item of the pair then was randomly assigned to the X half of the test and the other was assigned to the Y half. For tests composed of testlets (sets of items associated with a common stimulus) such as Reading, the stimulus and the items dependent on it were treated as a testing unit and assigned intact to the X half or Y half. Small adjustments in the composition of equivalent halves were made for testlet-based tests to balance content, difficulty, and number of items. After equivalent halves were assembled, a correlation coefficient between the X half and the Y half was computed. The full-test equivalent75

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 76

as its predecessor, Form K, but was designed to allow more space on each page for item locator art and other decorative features. To ensure that the formatting changes had no effect on student performance, a subsample of the 1995 Form M equating sample was administered with both Forms K and K/M of the Primary Battery in counterbalanced order.

halves reliabilities were estimated using the Spearman-Brown formula. Differences between equivalent-halves estimates obtained in the same testing session and equivalentforms estimates obtained a week or two apart constitute the best evidence on the effects of changes in pupil motivation and behavior across several days. The means of the grade reliabilities by test are reported in Table 5.4. Overall, the estimates of the three same-day reliability coefficients are similar. The same-day reliabilities varied considerably among the individual tests, however. For the Reading and the Maps and Diagrams tests, the equivalent-halves reliabilities are nearer to the equivalent-forms reliabilities. These lower reliability estimates are due to the manner in which the equivalent halves were established for these two tests.

Table 5.5 contains correlations between scores from the two administrations of Levels 5 through 8 during the 1995 equating study. These values represent direct evidence of the contribution of between-days sources of error to unreliability. These sources of error are thought to be especially important to the interpretation of scores on achievement tests for students in the primary grades. Although some variation exists, the estimates of test-retest reliability are generally consistent with the internal-consistency estimates for these levels reported in Table 5.1. The correlations in Table 5.5 suggest a substantial degree of stability in the performance of students in the early elementary grades over a short time interval (Mengeling & Dunbar, 1999).

Another study of sources of error in measurement was completed during the 1995 equating of Form M to Form K. With the introduction of Form M of the ITBS, Levels 9 through 14, a newly formatted edition of Form K, Levels 5 through 8, was developed and designated Form K/M. This edition of the Primary Battery contained exactly the same items

Table 5.4 Mean (Grades 3–8) Reliability Coefficients: Reliability Types Analysis by Tests Iowa Tests of Basic Skills — Complete Battery, Forms K and A

76

Form K

Form A

Test

K-R20 SHOE SHEH

K-R20 SHOE SHEH

EFKA

Vocabulary Reading Comprehension Spelling Capitalization Punctuation Usage and Expression Math Concepts and Estimation Math Problem Solving and Data Interpretation Mathematics Computation Social Studies Science Maps and Diagrams Reference Materials

.871 .890 .889 .839 .837 .869 .869 .840 .896 .836 .851 .832 .879

.876 .894 .888 .842 .838 .876 .877 .849 .903 .838 .859 .837 .893

.877 .865 .893 .842 .844 .867 .871 .840 .910 .836 .848 .809 .866

.873 .904 .894 .828 .850 .887 .866 .845 .854 .851 .873 .816 .856

.880 .907 .894 .823 .840 .889 .875 .858 .872 .854 .881 .828 .865

.875 .886 .895 .831 .859 .886 .866 .837 .873 .846 .869 .781 .844

.816 .809 .846 .773 .780 .804 .814 .774 .771 .764 .771 .733 .777

Mean

.861

.867

.859

.861

.867

.858

.787

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 77

Table 5.5 Test-Retest Reliabilities, Levels 5–8 Iowa Tests of Basic Skills — Complete Battery, Form K Fall 1995 Equating Sample

Reading

Language

Mathematics

Problems Listening Language Concepts Computa& Data tion Interpretation

Science

Sources Word of Analysis Information

Vocabulary

Comprehension

Reading Total

RV

RC

RT

Li

L

M1

M2

M3

MT -

SS

SC

SI

WA

5 (N > 826)

.74





.71

.80







.81







.80

6 (N > 767)

.88

.83

.90

.78

.82



.81



.85







.86

7 (N > 445)

.90

.93



.76

.90

.83

.83

.77



.82

.80

.84

.87

8 (N > 207)

.91

.93



.83

.88

.84

.85

.72



.82

.83

.85

.91

Level

Math Total

Social Studies

Note: ⫺Does not include Computation

The most important result of these analyses is the quantification of between-days sources of measurement error and their contribution to unreliability. Reliability coefficients based on internal-consistency analyses are not sensitive to this source of error.

Standard Errors of Measurement for Selected Score Levels A study of examinee-level standard errors of measurement based on a single test administration was conducted by Qualls-Payne (1992). The single administration procedures investigated were those originated by Mollenkopf (1949), Thorndike (1951), Keats (1957), Feldt (1984), and Jarjoura (1986), and a modified three-parameter latent trait model. The accuracy and reliability of estimates varied across tests, grades, and criteria. The procedure recommended for its agreement with equivalent-forms estimates was Feldt’s modification of Lord’s binomial error model, with partitioning based on a content classification system. Application of this procedure provides more accurate estimates of individual standard errors of measurement than have previously been available from a single test administration. For early editions of the ITBS, score-level standard errors of measurement were estimated using data from special studies in which

students were administered two parallel forms of the tests. Since that time, additional research has produced methods for estimating the standard error of measurement at specific score levels that do not require multiple test administrations. These conditional SEMs were estimated from the 2000 spring national standardization of Form A of the ITBS. Additional tables with conditional SEMs for Form B are available from the Iowa Testing Programs. The form-to-form differences in these values are minor. The results in Table 5.6 were obtained using a method developed by Brennan and Lee (1997) for smoothing a plot of conditional standard errors for scaled scores based on the binomial error model. In addition to this method, an approach developed by Feldt and Qualls (1998) and another based on bootstrap techniques were used at selected test levels. Because the results of all three methods agreed closely and generally matched the patterns of varying SEMs by score level found with previous editions of the tests, only the results of the Brennan and Lee method are provided.

77

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 78

Table 5.6 Standard Errors of Measurement for Selected Standard Score Levels Iowa Tests of Basic Skills—Complete Battery, Form A 2000 National Standardization

Test Level

5

6

Score Level

Word Analysis

Listening

Language

Mathematics

Reading Words

Reading Comprehension

Reading Total

V

WA

Li

L

M

RW

RC

RT

3.56

5.74

4.45

90–99

2.32

2.91

3.42

3.85

2.94

100–109

4.05

4.66

4.86

4.99

4.89

110–119

7.45

5.76

4.84

4.29

5.59

120–129

8.64

6.18

4.66

4.17

4.80

130–139

9.92

7.24

5.87

4.97

5.22

140–149

11.33

8.14

6.58

5.74

5.83

150–159

11.90

8.59

5.55

4.73

5.36

160–169

11.08

7.43

170–179

8.67

2.37

3.09

2.39

3.66

5.35

90–99

2.39

100–109

4.10

2.41

110–119

5.37

6.50

3.65

4.79

5.94

5.10

6.47

6.29

120–129

7.41

7.17

3.45

6.22

5.57

4.20

5.28

5.21

130–139

9.50

5.31

2.59

7.60

6.16

2.86

4.90

5.61

140–149

10.32

3.70

2.71

8.51

7.07

4.59

6.28

5.94

150–159

10.85

5.30

4.45

9.41

8.02

7.43

8.24

6.86

160–169

11.08

6.66

4.80

10.34

8.81

9.09

7.96

170–179

10.50

10.49

8.24

8.47

7.81

180–189

9.08

9.57

6.71

6.31

6.13

190–199

7.14

7.69

200–209

78

Vocabulary

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 79

Table 5.6 (continued) Standard Errors of Measurement for Selected Standard Score Levels Iowa Tests of Basic Skills—Complete Battery, Form A 2000 National Standardization

Reading

Mathematics Word Analysis

Test Level

Score Level

100–109

7

Vocabulary

Comprehension

V

RC

4.06

Spelling

Language Concepts

WA

Li

L1

2.91

3.44

5.73

L

4.69

Problems & Data Computation Interpretation

M1

M2

2.99

3.73

M3

Social Studies

Science

Sources of Information

SS

SC

SI

2.98

2.19

4.67

110–119

6.02

3.83

4.46

5.65

7.12

5.82

5.37

6.94

6.62

4.96

4.38

6.61

120–129

7.29

4.88

6.59

5.77

7.51

4.96

6.88

8.63

6.90

6.09

5.95

7.48

130–139

7.28

5.25

7.14

6.97

6.71

4.54

7.05

8.67

4.35

6.58

8.63

6.61

140–149

5.81

3.92

6.45

7.93

4.03

4.38

6.99

7.91

3.24

7.78

10.71

4.75

150–159

5.61

3.76

7.13

8.57

4.01

5.39

7.47

7.48

4.51

9.22

11.92

4.69

160–169

7.15

7.44

9.17

9.70

5.31

5.13

170–179

9.64

10.12

10.08

9.68

180–189

10.65

9.56

8.80

7.89

7.37

190–199

8

Listening

6.65

8.70

7.79

10.11

12.86

8.36

6.68

9.04

8.40

10.30

13.10

8.00

8.09

7.11

9.58

11.98

7.90

10.76

100–109

4.00

2.43

3.00

2.94

2.95

2.94

2.40

5.30

110–119

6.12

4.26

4.70

5.27

5.70

4.66

5.47

5.67

6.08

5.47

3.99

7.38

120–129

8.05

5.52

6.80

6.83

7.24

5.05

7.02

8.26

7.48

6.92

6.38

7.69

130–139

8.89

5.14

8.35

7.10

7.49

5.55

7.72

9.73

7.32

7.33

7.43

7.62

140–149

8.34

4.43

9.08

7.98

7.82

5.35

8.17

10.05

5.63

8.69

8.66

6.82

150–159

6.80

4.39

9.19

8.71

7.11

5.39

8.68

9.49

5.32

10.41

10.44

5.74

160–169

6.09

6.93

9.36

8.89

6.20

5.62

8.76

8.61

6.19

11.61

11.94

6.51

170–179

8.28

9.88

10.05

9.95

6.87

6.94

8.64

8.44

6.94

12.40

13.12

10.41

180–189

11.26

11.80

10.71

10.93

7.35

9.03

8.75

9.15

8.36

12.07

14.31

10.55

190–199

12.19

12.74

10.91

11.12

9.91

8.90

9.41

9.19

11.09

14.74

8.75

200–209

11.61

12.44

10.53

10.49

8.98

7.47

8.90

8.20

9.59

14.42

210–219

8.67

10.37

9.61

8.88

7.28

12.66

220–229

7.38

7.48

9.14

79

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 80

Table 5.6 (continued) Standard Errors of Measurement for Selected Standard Score Levels Iowa Tests of Basic Skills—Complete Battery, Form A 2000 National Standardization

Reading

Test Level

Score Level

Vocabulary

Comprehension

RV

RC

Language

Sources of Information

Mathematics

Spelling

Capitalization

Punctuation

Usage & Expression

Concepts & Estimation

L1

L2

L3

L4

M1

Problems Computa& Data tion Interpretation

M2

M3

Social Studies

Science

SS

SC

110–119

9

Maps Reference and Materials Diagrams

S1

S2

5.38

120–129

4.63

5.08

5.92

4.24

6.56

5.08

4.13

5.62

130–139

8.08

7.22

7.76

7.04

8.34

7.86

7.26

8.60

140–149

10.24

8.68

8.61

10.06

9.64

9.76

8.45

10.16

7.97

150–159

10.43

8.99

7.91

10.53 10.07

9.92

8.20

10.40

7.31

160–169

8.93

8.11

6.18

9.63

9.73

8.76

8.20

9.97

7.27

170–179

6.85

6.61

5.63

9.28

9.11

7.16

8.42

9.06

6.51

180–189

6.61

6.52

6.50

11.29

9.99

8.49

8.34

9.08

6.02

190–199

7.60

8.21

8.75

14.56 13.62 12.41

8.96

200–209

9.74

9.63 12.04

17.24 16.73 14.96

10.81

210–219

11.55

11.09 13.88

18.58 18.53 16.61

12.46

12.86

8.25

220–229

10.80 12.00

19.00 19.48 16.60

12.27

10.45

230–239

11.98

17.81 18.92 15.81

9.99

240–249

10.64

15.92 16.34 15.02

250–259

14.03 11.47

10

4.82

4.00

130–139

7.35

6.36

7.07

140–149

10.14

7.82

150–159

11.86

160–169 170–179

7.75

7.60

8.03

11.59

9.03

9.98

9.76

7.54

8.68 12.12

8.81

10.45

9.84

7.77

8.98

11.54

7.69

10.62

10.78

8.11

8.98 10.35

6.44

11.20

10.93

9.11

9.31 10.70

7.20

11.43

9.97

10.94

7.52 10.23

11.12 13.76

9.51

11.99

10.72

12.49

9.16

11.05 13.18 16.72

11.70

13.19

12.71

7.49

13.65

13.67

13.86

12.24

8.84 13.94 16.42 12.52

11.98

9.50

11.59 13.87 10.05

9.00

8.15

11.32

8.00

6.64

6.18

9.95

6.91

8.73

10.12 10.53 10.28

8.71

9.99

9.47

7.24

8.32 10.95

9.53

9.16

8.54

11.98 12.17

11.80

9.76

11.41

9.23

7.16

9.95

12.06

9.82

7.94

12.54 12.82

11.76

9.99

12.10

8.74

9.01

11.12 12.62

9.20

10.89

9.44

7.42

12.65 12.48 10.21

9.75

12.24

9.07

9.71 10.99 12.21

7.77

180–189

8.00

8.55

7.07

13.05 11.69

8.12

8.91

11.85

8.16

9.26

9.74

11.23

7.10

190–199

6.31

7.96

7.68

14.69 11.86

8.63

7.37

11.48

6.99

9.37

8.27

11.16

8.30

200–209

6.97

9.15

9.62

18.20 14.16 12.24

7.66

12.46

7.50

11.08

9.75 13.47

11.54

210–219

8.25

11.46 12.27

21.39 16.31 15.38

9.69

14.17

8.62 12.81 12.45 15.68 14.61

8.07

11.95 10.02

220–229

9.71 12.76 14.33

23.02 18.12 17.65

11.59

15.11

9.74 14.14 14.76 17.44 16.59

230–239

11.09 13.00 15.43

23.32 18.99 19.10

12.80

15.04

9.09 14.43 15.79 17.92 17.22

240–249

11.27 12.58 15.67

23.62 18.61 18.99

12.84

14.16

13.62 15.43 17.43 16.94

11.05 14.40

22.75 17.41 17.75

10.04

12.26

11.87 13.82 16.94 15.54

260–269

9.16 10.89

20.30 15.43 16.47

8.99

8.95 12.14 14.79 13.03

270–279

7.14

16.60 12.81 15.18

290–299

11.86

6.73 9.04

11.39 14.73 18.31 12.92

11.68

9.39 10.99

7.06 8.58

10.98 14.77 18.55 13.21

7.36

14.23

4.06

9.97

4.00

280–289

3.87

5.32

6.65

9.71

Li

7.03

5.11 7.62

WA

4.39

7.76

250–259

80

3.95 6.67

Listening

6.56

7.27

110–119 120–129

Word Analysis

8.98

11.19

9.45

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 81

Table 5.6 (continued) Standard Errors of Measurement for Selected Standard Score Levels Iowa Tests of Basic Skills—Complete Battery, Form A 2000 National Standardization

Reading

Test Level

Score Level

Vocabulary

Comprehension

RV

RC

Language

Spelling

Capitalization

Punctuation

L1

L2

L3

120–129 130–139

11

Usage & Expression

Concepts & Estimation

L4

M1

Problems Computa& Data tion Interpretation

M2

M3

Social Studies

Science

SS

SC

4.00 6.07

5.20

8.00

5.46

6.60

6.75

Maps Reference and Materials Diagrams

S1

S2

7.93 5.74

7.55

6.34

6.14

10.64

140–149

8.56

6.84

9.57

8.16

9.65

9.82

7.82

9.35

8.72

7.91

8.15

12.22

8.69

150–159

11.08

9.00

9.48

11.10

12.69

12.42

9.08

11.20

10.60

8.86

9.34

13.69

10.33

160–169

12.63

10.84

8.49

13.18

14.30

13.79

10.00

13.00

11.06

10.18

10.23

14.80

9.98

170–179

12.39

11.04

8.44

14.17

14.83

13.27

10.52

13.56

11.25

11.61

11.07

15.38

8.94

180–189

10.81

10.42

8.47

14.59

14.60

11.30

10.62

13.36

10.77

12.20

11.10

15.37

8.42

190–199

9.36

9.89

8.16

14.81

13.90

9.96

10.14

12.66

9.10

11.65

10.13

14.80

8.05

200–209

8.50

9.67

8.54

15.53

13.26

11.16

9.43

12.14

7.76

10.79

9.53

14.13

8.56

210–219

8.14

9.97

9.77

17.60

14.57

14.06

9.08

12.55

8.10

11.09

10.37

14.48

11.22

220–229

8.36

10.64

10.99

19.86

17.06

17.20

9.09

14.06

10.23

12.72

12.17

15.85

14.26

230–239

8.70

11.18

12.40

21.38

18.82

19.49

9.93

15.01

11.80

14.00

14.06

17.03

16.07

240–249

9.65

11.96

13.48

22.54

19.43

20.66

11.39

15.29

12.05

14.76

15.50

17.63

16.95

250–259

10.52

13.07

13.81

23.00

19.40

21.32

11.97

14.68

10.78

15.12

15.98

17.78

17.29

260–269

10.25

13.27

13.42

22.39

18.69

21.28

11.28

12.84

8.31

15.04

15.66

16.68

16.25

270–279

8.42

12.65

12.32

20.53

17.46

20.34

9.97

10.88

14.01

13.97

14.86

13.59

280–289

11.12

9.88

17.46

15.91

18.47

7.76

8.21

11.90

11.40

12.18

10.54

290–299

8.36

13.56

14.09

14.24

8.40

8.57

8.64

9.16

11.30

9.23

4.74

5.37

5.83

300–309

120–129

12

Sources of Information

Mathematics

3.98

130–139

5.32

4.79

5.75

6.72

6.30

6.13

8.30

140–149

7.33

7.28

8.22

6.65

7.62

8.63

8.12

8.37

7.48

8.35

7.97

11.23

7.85

150–159

9.86

8.44

10.31

9.65

10.53

12.96

9.67

10.03

9.29

9.08

9.01

12.26

9.53

160–169

11.90

9.68

9.31

13.13

12.91

14.47

10.22

12.07

10.38

9.91

10.11

12.88

9.81

170–179

12.47

10.83

7.88

14.68

13.99

14.65

10.40

13.32

11.22

11.57

11.85

14.11

9.98

180–189

12.19

11.32

8.94

15.71

14.34

13.54

10.43

13.52

11.79

12.86

12.84

14.94

10.46

190–199

11.37

11.43

9.63

17.00

14.03

11.63

10.06

13.25

12.04

13.49

12.57

15.34

10.71

200–209

10.41

11.39

9.80

18.31

13.96

10.86

9.51

12.78

12.03

13.41

11.44

15.73

10.54

210–219

9.62

11.29

9.97

20.19

14.73

12.01

9.22

13.33

11.95

12.88

10.37

16.71

11.16

220–229

9.24

11.10

10.26

22.41

16.66

14.02

9.06

15.03

12.13

12.97

10.66

17.95

13.64

230–239

9.14

11.04

10.99

23.91

18.66

16.41

9.19

16.67

12.73

14.06

12.35

19.37

15.72

240–249

8.72

11.41

12.08

25.16

20.57

19.37

9.71

17.95

13.24

15.20

14.37

20.56

16.97

250–259

8.81

12.03

13.37

25.92

21.77

21.54

11.16

18.39

13.30

16.05

15.97

20.97

17.68

260–269

10.59

12.79

14.56

25.93

21.79

22.55

12.86

17.89

12.75

16.38

16.76

20.91

17.40

270–279

12.08

13.05

14.66

25.06

21.11

22.88

13.23

16.72

11.64

16.15

16.82

20.26

16.26

280–289

11.63

12.62

13.89

23.26

20.40

22.25

12.64

14.88

10.14

15.62

15.94

18.97

15.06

290–299

11.32

12.25

20.69

19.69

20.55

9.74

12.30

7.54

14.21

14.11

17.04

12.95

300–309

9.29

9.31

17.59

17.54

17.86

11.44

11.53

14.59

10.54

310–319

14.30

14.65

14.48

8.63

8.64

8.14

10.11

8.14

320–329

11.04

10.83

10.41

6.06

330–339

7.65

81

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 82

Table 5.6 (continued) Standard Errors of Measurement for Selected Standard Score Levels Iowa Tests of Basic Skills—Complete Battery, Form A 2000 National Standardization Reading Vocabulary

Test Level

82

Score Level

13

130–139 140–149 150–159 160–169 170–179 180–189 190–199 200–209 210–219 220–229 230–239 240–249 250–259 260–269 270–279 280–289 290–299 300–309 310–319 320–329 330–339 340–349 350–359

14

130–139 140–149 150–159 160–169 170–179 180–189 190–199 200–209 210–219 220–229 230–239 240–249 250–259 260–269 270–279 280–289 290–299 300–309 310–319 320–329 330–339 340–349 350–359 360–369

Language

Comprehension

Spelling

RV

RC

L1

5.69 7.19 8.87 11.76 14.04 15.03 15.11 14.28 12.65 11.22 10.44 9.56 8.65 8.71 10.17 11.56 11.81 10.16

4.77 6.94 8.50 11.09 13.29 14.12 14.07 13.22 11.92 10.89 10.32 10.17 10.38 11.08 12.26 13.16 13.36 12.70 11.13 8.89

6.38 8.48 10.85 13.39 15.26 15.93 15.84 14.97 13.36 11.69 10.67 9.98 9.09 8.59 9.92 11.60 12.31 10.99

3.93 5.87 7.68 9.84 12.18 14.28 15.21 15.07 14.13 13.00 12.02 11.36 10.92 11.11 11.61 12.20 12.69 12.76 12.08 11.27 9.68

8.45 10.16 10.65 10.37 10.19 10.06 9.81 9.85 10.12 10.29 10.72 11.86 13.05 13.72 13.85 13.55 11.93 8.36

9.72 11.10 11.00 10.65 10.61 10.97 11.08 11.01 11.39 12.02 12.75 13.48 13.88 13.92 13.52 12.61 11.54 9.97 7.54

Capitalization

L2

Punctuation

Sources of Information

Mathematics Usage & Expression

Concepts & Estimation

Problems Computa& Data tion Interpretation

M3

Science

SS

SC

S1

8.30 10.74 12.47 13.76 15.38 17.17 18.66 19.69 20.06 20.42 20.85 21.54 22.26 22.53 22.46 21.91 20.79 19.11 15.79 11.14 8.17 5.95

L3

L4

M1

4.39 6.28 8.71 11.59 14.00 15.90 17.08 18.34 19.80 21.57 23.50 24.76 25.79 26.40 26.37 25.56 23.87 21.39 18.34 13.69 9.84 7.33

4.77 7.19 10.73 14.04 15.70 16.45 16.63 16.46 16.38 17.23 19.23 21.01 22.01 22.65 22.83 22.47 21.58 20.17 17.31 14.09 11.78 8.75

5.26 8.44 12.25 15.03 16.13 16.08 14.97 13.28 12.47 13.16 15.31 17.93 20.64 22.26 22.89 23.04 22.66 21.71 20.19 18.10 15.48 12.36 8.65

5.75 7.52 8.85 9.80 10.53 11.14 11.37 11.20 10.81 10.34 9.87 9.85 10.33 10.86 11.26 12.02 12.88 12.34 9.98

7.46 10.81 12.65 13.94 14.67 14.79 14.68 14.76 15.24 16.17 16.97 17.38 17.16 16.16 14.73 13.64 11.80 8.91

7.48 9.02 10.59 13.30 15.26 16.51 17.14 17.14 16.59 15.96 15.25 14.31 13.32 12.08 10.19 8.92 7.88

7.74 9.91 10.64 12.04 13.86 14.95 14.94 14.37 13.91 14.10 14.61 15.11 15.30 15.03 14.45 13.87 13.22 12.01 10.44 7.61

6.93 8.93 10.09 11.95 13.69 14.17 13.89 12.88 11.85 11.67 13.10 14.86 16.06 16.64 16.68 16.20 15.06 13.39 10.92 7.38

5.44 8.31 10.88 13.20 14.89 16.23 17.76 20.27 22.52 24.13 25.65 26.92 27.77 28.02 27.59 26.46 24.70 22.44 19.87 15.91 12.20 8.42

6.05 10.71 14.57 16.76 17.66 18.22 18.23 18.05 18.26 18.98 20.23 21.66 22.83 23.38 23.16 22.44 21.28 19.77 18.04 15.35 12.81 11.20 8.91

7.07 9.22 10.59 11.51 12.23 13.00 13.20 12.83 12.24 11.67 11.20 11.09 11.08 10.81 10.27 9.88 9.74 9.49 9.30 8.19

6.78 9.52 12.24 13.77 14.65 15.11 15.41 15.92 16.66 17.45 17.89 17.95 17.61 16.76 15.58 14.22 12.74 11.95 11.16 9.29

8.74 11.56 13.28 14.58 16.26 17.73 18.42 18.86 18.71 18.05 17.19 16.25 14.96 13.63 12.37 10.72 9.08 8.37 7.15

7.06 8.83 9.09 10.16 12.81 14.76 15.75 15.91 15.61 15.19 14.90 14.93 15.08 15.34 15.72 15.93 15.64 14.65 12.60 10.29 8.22 5.98

6.13 7.99 9.64 12.27 14.55 15.58 15.80 15.32 14.17 13.08 12.65 13.44 15.00 16.38 17.35 17.55 17.02 15.88 14.08 11.92 9.46 6.38

4.70 7.02 10.83 14.92 16.52 17.02 16.66 15.98 15.77 16.25 17.27 18.61 20.01 20.96 21.68 22.11 22.11 21.82 20.65 18.68 16.90 13.90 11.07 8.50

M2

Social Studies

Maps Reference and Materials Diagrams

8.67 11.16 12.39 13.76 15.35 17.04 18.45 19.12 19.74 20.10 20.35 20.68 20.98 21.50 21.90 21.95 21.69 21.04 19.92 17.37 14.18 10.64 6.63

S2

8.43 10.35 10.78 10.55 11.01 11.87 12.15 11.65 11.57 12.96 14.87 16.29 17.04 17.24 16.89 16.27 14.79 12.84 11.17 8.70

9.00 10.49 10.29 10.20 11.03 11.70 12.38 13.50 14.85 16.48 18.15 19.13 19.53 19.33 18.80 17.58 16.05 14.26 12.22 10.55 8.14

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 83

Effects of Individualized Testing on Reliability The extensive reliability data reported in the preceding sections are based on group rather than individualized testing and on results from schools representing a wide variety of educational practices. The reliability coefficients obtained in individual schools may be improved considerably by optimizing the conditions of test administration. This can be done by attending to conditions of student motivation and attitude toward the tests and by assigning appropriate test levels to individual students. One of the important potential values of individualized testing is improvement of the accuracy of measurement. The degree to which this is realized, of course, depends on how carefully test levels are assigned and how motivated students are in the testing situation. The reliability coefficients reported apply to the consistency of placing students in grade groups. The reliability with which tests place students along the total developmental continuum has been investigated by Loyd (1980). She examined the effects of individualized or functional-level testing on reliability in a sample of fifth- and sixth-grade students. Two ITBS tests, Language Usage and Math Concepts, were selected for study because they represent different degrees of curriculum dependence. In Loyd’s study, each student was administered an in-level Language Usage test and one of four levels of the parallel form, ranging from two levels below to one level above grade level. Similarly, each student was administered an in-level Math Concepts test and one of four levels of the parallel form, ranging from two levels below to one level above grade level. To address the issue of which test level provided more reliable assessment of developmental placement, an independent assessment of developmental placement was provided. This was done by administering an independently scaled, broad range test, or “scaling test” test, to comparable samples in each of grades 3 through 8. Estimates of reliability and expected squared error in grade-equivalent scores obtained at each level were analyzed for each reading achievement grade level. For students at or below grade level, administering an easier test produced less error. This suggests that testing such students with a lower test level may result in more reliable measurement. For both Language Usage and Math

Concepts, testing above-average students with lower test levels introduced significantly more error into derived scores. The results of this study provide support for the validity of individualized or out-of-level testing. Individualized testing is most often used to test students who are atypical when compared with their peer group. For students lagging in development, the results suggest that a lower level test produces comparable derived scores, and these derived scores may be more reliable. For students advanced in development, the findings indicate that testing with a higher test level results in less error and therefore more reliable derived scores.

Stability of Scores on the ITBS The evidence of stability of scores over a long period of time and across test levels has a special meaning for achievement tests. Achievement may change markedly during the course of the school year, or from the spring of one school year to the fall of the next. In fact, one goal of good teaching is to alter patterns of growth that do not satisfy the standards of progress expected for individual students and for groups. If the correlations between achievement test scores for successive years in school are exceedingly high, this could mean that little was done to adapt instruction to individual differences that were revealed by the tests. In addition to changes in achievement, there are also changes in test content across levels because of the way curriculum in any achievement domain changes across grades. Differences in test content, while subtle, tend to lower the correlations between scores on adjacent test levels. Despite these influences on scores over time, when equivalent forms are used in two test administrations, the correlations may be regarded as lower-bound estimates of equivalent-forms reliability. In reporting stability coefficients for such purposes, it is important to remember that they are attenuated, not only by errors of measurement, but also by differences associated with changes in true status and in test content. The stability coefficients reported in Table 5.7 are based on data from the 2000 national standardization. In the fall, subsamples of students who had taken Form A the previous spring were administered the next level of either Form A or Form B. The correlations in Table 5.7 are based on the developmental standard scores from the spring and fall administrations. 83

84 467 1071 1028 911 890 834 779 942 907 902 840

6(1)A

7(2)A

7(2)B

7(2)A

7(2)B

8(3)A

8(3)B

9(3)A

9(3)B

6(K)A

6(1)A

6(1)A

7(1)A

7(1)A

8(2)A

8(2)A

8(2)A

8(2)A

9(3)A 10(4)A

9(3)A 10(4)B

10(4)A 11(5)A

10(4)A 11(5)B

11(5)A 12(6)A

11(5)A 12(6)B

12(6)A 13(7)A

12(6)A 13(7)B

13(7)A 14(8)A

13(7)A 14(8)B

.77

.77

.79

.82

.78

.78

.76

.81

.74

.77

.74

.78

.77

.81

.77

.77

.34

.52

.67

.43

.79

.74

.76

.82

.76

.79

.79

.79

.75

.78

.70

.72

.70

.81

.73

.73

.71

.74

RC

Note: -Does not include Computation +Includes Computation

541

627

602

306

335

458

554

155

RV

Comprehension

.84

.81

.83

.88

.84

.85

.85

.86

.82

.84

.80

.84

.83

.87

.83

.81

.77

.79

RT

Reading Total

.83

.81

.84

.84

.80

.83

.81

.80

.75

.73

.70

.71

.71

.72

.72

.72

L1

Spelling

.72

.72

.74

.73

.71

.71

.66

.70

.63

.62

L2

Capitalization

.76

.73

.77

.77

.74

.77

.71

.74

.68

.68

L3

.75

.72

.75

.77

.72

.73

.73

.76

.71

.71

L4

.87

.83

.88

.88

.86

.86

.85

.87

.82

.81

.74

.74

.74

.77

.75

.72

.49

.57

.74

.56

LT

Punctu- Usage & Language ation Expression Total

Language

.79

.77

.77

.78

.76

.75

.76

.77

.69

.69

.65

.64

.68

.71

.55

.71

M1

.73

.68

.72

.72

.73

.72

.71

.72

.70

.68

.62

.61

.75

.75

.64

.64

M2

Concepts Problems & Data & InterpreEstimation tation

.82

.78

.81

.81

.82

.80

.82

.81

.79

.76

.74

.71

.81

.81

.70

.76

.71

.70

.75

.66

MT -

Math Total

.68

.66

.57

.64

.64

.62

.58

.62

.58

.52

.57

.54

.61

.60

.53

.58

M3

Computation

Mathematics

.84

.79

.80

.82

.82

.81

.82

.83

.80

.77

.76

.75

.81

.81

.73

.77

MT +

Math Total

.90

.84

.90

.90

.89

.89

.90

.91

.88

.86

.86

.86

.88

.90

.85

.86

.64

.72

.85

.67

CT -

Core Total

.90

.84

.90

.90

.89

.89

.91

.91

.87

.86

.87

.86

.88

.90

.86

.85

CT +

Core Total

.72

.70

.73

.78

.72

.73

.72

.76

.68

.70

.55

.54

.57

.71

.57

.63

SS

Social Studies

.69

.69

.71

.76

.70

.73

.74

.75

.69

.67

.50

.49

.60

.76

.50

.63

SC

Science

.67

.67

.64

.69

.64

.67

.67

.70

.61

.61

S1

.69

.66

.73

.73

.69

.66

.67

.66

.69

.63

S2

.76

.73

.77

.79

.75

.74

.76

.77

.74

.70

.65

.65

.63

.71

.64

.68

ST

Maps Reference Sources and Materials Total Diagrams

Sources of Information

.89

.85

.89

.91

.88

.89

.90

.91

.87

.86

.84

.85

.88

.91

.87

.87

CC -

.88

.84

.88

.90

.87

.88

.90

.90

.86

.85

.84

.84

.87

.91

.87

.87

CC +

Compo- Composite site

.73

.81

.66

.74

.59

.67

.61

.57

WA

Word Analysis

.57

.66

.53

.62

.47

.62

.67

.60

Li

Listening

.87

.91

.86

.85

.77

.82

RPT

Reading Profile Total

3:15 PM

503

6(1)A

5(K)A

N

Fall

Vocabulary

Reading

10/29/10

Spring

Level (Grade) Form

Table 5.7 Correlations Between Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Forms A and B Spring and Fall 2000 National Standardization

961464_ITBS_GuidetoRD.qxp Page 84

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 85

The top row in the table shows correlations for 503 students who took Form A, Level 5 in the spring of kindergarten and Form A, Level 6 in the fall of grade 1. Row 2 shows within-level correlations for 155 students who took Form A, Level 6 in both the spring of kindergarten and the fall of grade 1. Beginning in row 4 and continuing on alternate rows are correlations between scores on alternate forms. Additional evidence of the stability of ITBS scores is based on longitudinal data from the Iowa Basic Skills Testing Program. Mengeling (2002) identified school districts that had participated in the program and had tested fourth-grade students in school years 1993–1994, 1994–1995, and 1995–1996. Each district had also tested at the same time of year and in grades 5, 6, 7, and 8 in successive years. Matched records were created for 40,499 students who had been tested at least once during the years and grades included in the study. Approximately 50 percent of the records had data for every grade,

although all available data were used as appropriate. The correlations in Table 5.8 provide evidence regarding the stability of ITBS scores over the upper-elementary and middle-school years. The relatively high stability coefficients reported in Tables 5.7 and 5.8 support the reliability of the tests. Many of the year-to-year correlations are nearly as high as the equivalent-forms reliability estimates reported earlier. The high correlations also indicate that achievement in the basic skills measured by the tests was very consistent across the same years analyzed in research a decade or more earlier (Martin, 1985). As discussed previously, these results might suggest that schools are not making the most effective use of test results. On the other hand, stability rates are associated with level of performance. That is, there is a tendency for aboveaverage students to obtain above-average gains in performance and for below-average students to achieve more modest gains.

Table 5.8 Correlations Between Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Forms K and L Iowa Basic Skills Testing Program

4th to 5th

4th to 6th

4th to 7th

4th to 8th

Test

Fall

Spring

Fall

Spring

Fall

Spring

Fall

Spring

Reading Total

.86

.85

.85

.84

.83

.84

.81

.83

Language Total

.87

.85

.84

.84

.82

.81

.80

.79

Math Total

.81

.80

.81

.79

.79

.79

.78

.77

Core Total

.92

.90

.90

.89

.88

.88

.87

.86

85

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 86

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 87

PART 6 Difficulty of the Tests Elementary school teachers, particularly those in the primary grades, often criticize standardized tests for being too difficult. This probably stems from the fact that no single test can be perfectly suited in difficulty for all students in a heterogeneous grade group. The use of individualized testing should help to avoid the frustrations that result when students take tests that are inappropriate in difficulty. It also is important for teachers to understand the nature of a reliable measuring instrument; they should especially realize that little diagnostic information is gained from a test on which all students correctly answer almost all of the items. Characteristics of the “ideal” difficulty distribution of items in a test have been the subject of considerable controversy. Difficulty specifications differ for types of tests: survey versus diagnostic, norm-referenced versus criterion-referenced, mastery tests versus tests intended to maximize individual differences, minimum-competency tests versus tests designed to measure high standards of excellence, and so forth. Developments in the area of individualized testing and adaptive testing also have shed new light on test difficulty. As noted in the discussion of reliability, the problem of placing students along a developmental continuum may differ from that of determining their ranks in grade. To maximize the reliability of a ranking within a group, an achievement test must utilize nearly the entire range of possible scores; the raw scores on the test should range from near zero to the highest possible score. The best way to ensure such a continuum is to conduct one or more preliminary tryouts of items that will determine objectively the difficulty and discriminating power of the items. A few items included in the final test should be so easy that at least 80 percent of students answer them correctly. These should identify the least able students. Similarly, a few very difficult items should be included to challenge the most able students. Most items, however, should be of medium difficulty

Item and Test Analysis

and should discriminate well at all levels of ability. In other words, the typical student will succeed on only a little more than half of the test items, while the least able students may succeed on only a few. A test constructed in this manner results in the widest possible range of scores and yields the highest reliability per unit of testing time. The ten levels of the Iowa Tests of Basic Skills were constructed to discriminate in this manner among students in kindergarten through grade 8. Item difficulty indices for three times of the year (October 15, January 15, and April 15) are reported in the Content Classifications with Item Norms booklets. In Tables 6.1 and 6.2, examples of item norms are shown for Word Analysis on Level 6 and for Language Usage and Expression on Level 12 of Form A. Content classifications and item descriptors are shown in the first column. The content descriptors are cross-referenced to the Interpretive Guide for Teachers and Counselors and to various criterion-referenced reports. The entries in the tables are percent correct for total test, major skill grouping, and item, respectively. For the Level 6 Word Analysis test, there are 35 items. The mean item percents correct are 53% for kindergarten, spring; 61% for grade 1, fall; 68% for grade 1, midyear; and 73% for grade 1, spring. The items measuring letter recognition (Printed letters) are very easy—all percent correct values are 90 or above. Items measuring other skills are quite variable in difficulty. In Levels 9 through 14 of the battery, most items appear in two consecutive grades. In Language Usage and Expression, item 23, for example, appears in Levels 11 and 12, and item norms are provided for grades 5 and 6. In grade 5, the percents answering this item correctly are 51, 56, and 60 for fall, midyear, and spring, respectively. In grade 6 the percents are 63, 65, and 67 (Table 6.2). The consistent increases in percent correct—from 51% to 67%—show that this item measures skill development across the two grades.

87

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 88

Table 6.1 Word Analysis Content Classifications with Item Norms Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

Level 6 WORD ANALYSIS

Average Percent Correct Kindergarten

Grade 1

Item Number

Spring

Fall

Midyear

(35)

53

61

68

73

(20)

56

64

70

76

Initial sounds: pictures Initial sounds: pictures Initial sounds: pictures Initial sounds: pictures

8 9 10 11

64 52 79 69

68 57 87 76

71 61 90 80

74 65 93 84

Initial sounds: words Initial sounds: words Initial sounds: words Initial sounds: words Initial sounds: words Initial sounds: words Initial sounds: words Initial sounds: words

12 13 14 15 16 17 18 19

42 24 30 65 49 37 52 42

44 30 36 67 61 50 59 49

53 40 46 68 71 61 67 59

61 49 55 68 81 71 74 68

4 5 6 7

78 70 59 52

90 84 69 65

93 90 78 75

95 95 87 84

20 21 22 23

66 60 62 65

73 67 68 75

78 70 72 79

82 72 75 82

(15)

50

58

64

70

1 2 3

91 90 91

98 96 97

99 97 98

99 98 98

Letter substitutions Letter substitutions Letter substitutions Letter substitutions Letter substitutions Letter substitutions

24 25 26 27 28 29

46 46 33 41 39 23

57 53 39 48 49 30

62 57 49 55 57 39

67 61 58 61 65 47

Word building Word building Word building Word building Word building Word building

30 31 32 33 34 35

48 58 31 45 32 35

63 67 37 63 33 42

73 78 49 72 35 47

82 88 61 81 37 51

WORD ANALYSIS Phonological Awareness and Decoding

Letter-sound correspondences Letter-sound correspondences Letter-sound correspondences Letter-sound correspondences Rhyming sounds Rhyming sounds Rhyming sounds Rhyming sounds Identifying and Analyzing Word Parts Printed letters Printed letters Printed letters

88

(Number of Items)

Spring

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 89

Table 6.2 Usage and Expression Content Classifications with Item Norms Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization

Level 12 USAGE AND EXPRESSION

(Number of Items)

Average Percent Correct Grade 6

Item Number

Fall

Midyear

USAGE AND EXPRESSION

(38)

59

61

63

Nouns, Pronouns, and Modifiers

(10)

58

60

62

Irregular plurals

Spring

6

58

59

59

Homonyms

12

26

27

30

Redundancies

15

63

65

67

Pronoun case

5

70

72

73

Nonstandard pronouns

7

49

50

50

Comparative adjectives

4

55

57

58

1 9 11 14

73 62 65 63

74 65 68 65

75 68 70 67

Misuse of adjective for adverb Misuse of adjective Misuse of adverb Misuse of adjective for adverb Verbs

(6)

58

61

62

Subject-verb agreement

19

60

62

63

Tense Tense

3 17

58 57

60 59

62 60

Participles Verb forms Verb forms

2 10 34

59 69 47

63 72 48

66 74 49

Conciseness and Clarity

(5)

53

55

57

Lack of conciseness

35

39

40

40

Combining sentences

24

36

38

41

Misplaced modifiers

27

70

73

75

Ambiguous references Ambiguous references

23 32

63 57

65 59

67 61

Organization of Ideas

(6)

59

61

62

Appropriate sentence order Appropriate sentence order

21 26

57 52

59 54

61 58

Sentences appropriate to function Sentences appropriate to function Sentences appropriate to function

25 33 37

74 65 43

76 67 44

77 68 44

Sentences suitable to purpose

20

62

63

64

Appropriate Use

(11)

64

66

67

Use of complete sentences Use of complete sentences

16 28

60 63

62 65

64 66

Appropriate word order in sentences Appropriate word order in sentences Appropriate word order in sentences Appropriate word order in sentences

13 22 30 36

62 69 64 59

65 70 66 61

67 71 67 63

Parallel construction

38

47

48

49

Conjunctions Conjunctions

29 31

67 60

69 62

71 63

Correct written language Correct written language

8 18

78 74

79 75

79 76

89

961464_ITBS_GuidetoRD.qxp

10/29/10

3:15 PM

Page 90

The distributions of item norms (proportion correct) are shown in Table 6.3 for all tests and levels of the ITBS. The results are based on an analysis of the weighted sample from the 2000 spring national standardization of Form A. As can be seen from the various sections of Table 6.3, careful test construction led to an average national item difficulty of about .60 for spring testing. At the lower grade levels, tests tend to be slightly easier than this; at the upper grade levels, they tend to be slightly harder. In general, tests with average item difficulty of about .60 will have nearly optimal internal consistency-reliability coefficients.

These distributions also illustrate the variability in item difficulty needed to discriminate throughout the entire ability range. It is extremely important in test development to include both relatively easy and relatively difficult items at each level. Not only are such items needed for motivational reasons, but they are critical for a test to have enough ceiling for the most capable students and enough floor for the least capable ones. Nearly all tests and all levels have some items with difficulties above .8 as well as some items below .3.

Table 6.3 Distribution of Item Difficulties Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization (Weighted Sample)

Level 5 Grade K

Vocabulary

Word Analysis

Listening

Language

Mathematics

V

WA

Li

L

M

Proportion Correct >=.90

3

4

2

5

.80–.89

11

1

6

5

7

.70–.79

3

8

11

7

7

.60–.69

3

8

7

8

4

.50–.59

2

3

3

4

2

.40–.49

5

4

2

3

3

.30–.39

2

2

0

.20–.29

1

.10–.19 =.90

3

.80–.89

2

0

.70–.79

4

3

2

.60–.69

5

7

6

1

3

2

.50–.59

8

5

8

8

7

6

.40–.49

5

8

6

6

7

8

1

9

.30–.39

4

7

5

11

8

9

9

18

.20–.29

3

2

4

2

6

4

8

12

1

1

.29

.37

.10–.19

4

3

2 6

=.90

7

.80–.89

6

9

9

9

7

14

6

20

.70–.79

5

5

6

10

8

7

5

12

.60–.69

5

9

6

5

9

1

.50–.59

5

3

3

2

5

.40–.49

4

2

3

3

1

1

1

.30–.39

1

.20–.29

5

6

3

3

.72

.79

1

.10–.19 =.90 .80–.89

5

4

2

4

7

7

10

6

6

5

12

3

.70–.79

3

4

13

7

6

9

4

3

8

12

5

4

.60–.69

8

13

11

7

5

10

4

6

5

9

2

9

.50–.59

10

11

6

7

3

7

3

3

4

2

4

4

.40–.49

3

1

1

3

2

1

2

3

2

1

1

2

.30–.39

1

0

0

3

5

1

0

1

1

1

.68

.67

.71

.68

Word Analysis

Listening

Spelling

Language

.20–.29

1

1

.10–.19 =.90 .80–.89

1

.10–.19

4

1

1

=.90 .80–.89

4

2

1

4

2

8

3

.70–.79

4

8

6

5

4

8

6

4

7

12

4

6

6

4

9

.60–.69

9

8

5

6

3

10

6

5

4

5

9

3

7

15

6

.50–.59

7

10

8

5

6

8

7

6

7

5

7

4

7

4

6

.40–.49

4

9

2

2

6

2

5

4

3

4

6

4

4

2

2

.30–.39

0

2

3

2

1

1

0

2

2

3

4

1

1

.20–.29

1

.62

.57

.68

.70

1

2

.10–.19 =.90

1

.20–.29

1

2

.10–.19 =.90

1 4

.20–.29

1

1

1

1

1

3

.10–.19 =.90

1

1

.80–.89

3

2

6

4

1

.70–.79

7

7

6

3

9

.60–.69

9

16

7

7

.50–.59

13

12

9

7

.40–.49

5

6

9

.30–.39

1

1

1

.20–.29

1

1

2

4

6

2

3

3

10

7

4

6

4

6

4

8

6

18

12

5

4

6

14

7

6

7

4

13

7

5

15

10

5

9

6

6

5

6

7

4

11

6

5

2

1

1

1

2

4

1

3

3

5

1

1

.58

.59

2

1

.10–.19 =.90 .80–.89

1

1

4

2

1 2

1

.70–.79

6

7

7

4

8

11

7

5

6

.60–.69

10

14

8

8

4

15

11

7

3

12

7

3

5

13

8

12

.50–.59

14

20

18

8

13

6

14

7

4

.40–.49

7

6

2

5

0

4

7

7

10

12

7

4

10

13

11

5

.30–.39

1

3

3

4

2

2

1

5

7

4

3

4

.20–.29

1

1

2

2

.58

.60

1

4

.10–.19 =.90

1 9

1

2

6

2

2

.10–.19

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.