A 21st century approach towards statistical inference – Evaluating the [PDF]

randomization methods into the introductory statistics course. .... in activities where they build Monte Carlo simulatio

0 downloads 7 Views 82KB Size

Recommend Stories


Food Matters Towards a Strategy for the 21st century
Why complain about yesterday, when you can make a better tomorrow by making the most of today? Anon

the 21st century campaign
The butterfly counts not months but moments, and has time enough. Rabindranath Tagore

The 21st Century Councillor
Before you speak, let your words pass through three gates: Is it true? Is it necessary? Is it kind?

the 21st century pharmacy
No amount of guilt can solve the past, and no amount of anxiety can change the future. Anonymous

Statistical Inference
Courage doesn't always roar. Sometimes courage is the quiet voice at the end of the day saying, "I will

PDF Download 21st Century Skills
Suffering is a gift. In it is hidden mercy. Rumi

Statistical Inference
Almost everything will work again if you unplug it for a few minutes, including you. Anne Lamott

From 21st Century Learning to Learning in the 21st Century
Make yourself a priority once in a while. It's not selfish. It's necessary. Anonymous

A Legacy for the 21st Century
You're not going to master the rest of your life in one day. Just relax. Master the day. Than just keep

Biomonitoring for the 21st Century
Open your mouth only if what you are going to say is more beautiful than the silience. BUDDHA

Idea Transcript


A 21st century approach towards statistical inference – Evaluating the effects of teaching randomization methods on students’ conceptual understanding Robert delMas* University of Minnesota, Minneapolis, MN, USA – [email protected] Abstract In a landmark paper in 2007, George Cobb argued for a 21st century approach to teaching introductory statistics. He advocated the use of randomization and simulation methods for instruction in statistical inference, rather than the traditional formula-based approach (e.g., using methods such as the t-test and ANOVA). Since then, several efforts have been made to develop curriculum that incorporate randomization methods into the introductory statistics course. Three of these curriculums are discussed: a “blended” curriculum where for each statistical inference context (e.g., comparing differences between two groups) students are first taught a randomization method followed by the corresponding parametric procedure (e.g., two-sample t-test), a “sequential” curriculum that first introduces randomization methods for all statistical inference contexts, followed by a unit that revisits each context in terms of the corresponding parametric procedures, and a “randomization-only” curriculum that teaches only randomization methods for statistical inference. Results from separate research studies conducted with students who participated in the blended and randomization-only curriculums are examined with respect to evidence of what aspects of students’ conceptual statistical understanding are and are not facilitated by the curriculums, and to identify areas where additional research is needed. Keywords: randomization methods; statistical thinking; empirical investigations; statistics education. 1. Introduction George Cobb (2007) argues for a 21st century approach to teaching introductory statistics. He advocates the use of randomization and simulation methods for instruction in statistical inference, rather than traditional formula-based approaches (e.g., using methods such as the t-test and ANOVA). Others have argued that simulation-based nonparametric methods (e.g. bootstrap and randomization procedures) are easier to learn and conceptually simpler than classical normal theory procedures, align well with student intuition, and build in a logical progression that is quite accessible to the introductory student (Cobb, 2007; Efron, 2000; Simon, Atkinson, & Shevokas, 1976; Tintle, Topliff, Vanderstoep, Holmes, & Swanson, 2012). Research by Gigerenzer and Hoffrage (1995) showed evidence that reasoning built upon intuition for frequencies (e.g., 43 times out of 1000) produces more successful learning outcomes than similar tasks phrased as probabilities (e.g., 0.043). Simulationbased methods connect naturally to frequency concepts (e.g., Ernst, 2004; Rossman, 2008), suggesting a possible advantage over methods based on the Normal and t-Distributions. The required mathematics is largely combinatorial in nature and often relies on little more than simple arithmetic, counting algorithms, and indicator functions (Coakley, 1996). For procedures with more computational intensity like permutation tests, Monte Carlo methods can be used to generate a large number of resampled combinatorial outcomes (Efron & Tibshirani, 1993; Ernst, 2004). The robustness of simulation-based inference (SBI) methods is due to relatively simple principles that generalize well to many situations (Cobb, 2007; Efron, 2000; Garfield, delMas, & Zieffler, 2012). Simulation-based procedures rely on fewer distributional assumptions than parametric tests, generally have good statistical power, and can be used with non-random samples with randomized allocation to treatments (Garthwaite, 1996). In addition, simulation-based inference procedures can make concepts like sampling variability and sampling distributions more apparent for the student (Cobb, 2007; Rossman, 2008; Tintle, Vanderstoep, Holmes, Quisenberry, & Swanson, 2011). Rather than approximate the sampling distribution of some statistic for all possible samples with an abstract mathematical model, students approximate the sampling distribution directly by simulating a large







number of possible samples. The approximation to the sampling distribution is used to support a frequency-based interpretation of the estimated p-value (Rossman, 2008). 2. Randomization-Based Introductory Statistics Curriculums Three randomization-based introductory statistics curriculums developed and published by statistics educators in the United States are reviewed in this paper. Each curriculum consists of a textbook and a website that includes accompanying materials such as lessons plans, activities, data sets, and analysis tools. Nathan Tintle and colleagues (Tintle et al., 2016) have published the Introduction to Statistical Investigations (ISI) textbook with accompanying materials posted at http://math.hope.edu/isi/. Robin Lock and his family have published the Statistics: Unlocking the Power of Data (Lock, Lock, Morgan, Lock, & Lock, 2017) textbook with accompanying material posted at http://www.lock5stat.com/. The course is commonly referred to as the Lock-5 course. The third curriculum, commonly referred to as the CATALST (Change Agents for Teaching and Learning Statistics) course, was developed by Andrew Zieffler and colleagues at the University of Minnesota. The CATALST textbook, Statistical Thinking: A Simulation Approach to Uncertainty (Zieffler, 2017), and course materials are posted at https://github.com/zief0002/Statistical-Thinking. All three courses use SBI methods to introduce statistical thinking, statistical inference, and procedures for conducting hypothesis testing and interval estimation. The three courses differ in several aspects such as the topics covered, the sequencing of topics, and the tools used to carry out randomization-based tests and estimation methods (see Table 1). Table 1. Comparison of Three U.S. Simulation-Based Inference (SBI) Curriculums

FEATURE Textbook and Website Covers Descriptive Statistics Teach Traditional Normal Methods Course Sequence Simulation tool Simulation built by:

ISI Yes No Yes SBI & Normal for Each Topic Applets Tool

CURRICULUM Lock-5 Yes Yes Yes Descriptive, SBI, then Normal StatKey Tool

CATALST Yes Yes No Modeling & Problem-Based TinkerPlots™ Student

The ISI course by Tintle et al. (2016) assumes that students have learned basic concepts of descriptive statistics in elementary and secondary schools and therefore the ISI textbook does not provide a chapter on descriptive statistics. The course consists of three units that cover one-sample inferences, two-sample inferences, and extensions to multiple group comparison and regression contexts, respectively. Each unit consists of several topics and the concepts and methods for each topic are introduced using SBI methods, followed by coverage of corresponding parametric, normal-based methods. In Unit 1 on one-sample inferences, the four chapters cover what the ISI group refers to as the four pillars of inference: strength of evidence, generalization, estimation, and association versus causation. The four chapters in Unit 1 cover methods for one-sample inference of a proportion and one-sample inference for the mean of a quantitative variable followed by confidence interval estimation methods for both types of population parameters. The second unit covers methods for twogroup comparisons, emphasizing the four pillars of inference for comparisons of groups on a categorical response variable, a numerical response variable, and paired-sample situations respectively across the three chapters in the unit. The first two chapters in the penultimate unit cover SBI and theory-based methods for statistical inference when there are more than three groups for categorical data and numeric data, respectively. The final chapter covers SBI and theory-based inferential methods for correlations and regression coefficients when there are two numerical variables.







The Lock-5 course by Lock et al. (2017), divided into four units, takes a different approach than the ISI course in that it introduces all concepts and methods for statistical inference through SBI methods in the second unit and follows up with theory-based methods for the same types of analyses in the third unit. A second contrast with the ISI course is that the two chapters in the first unit of the Lock-5 course cover collecting data and descriptive statistics, respectively. A third difference is that the Lock-5 course first introduces confidence intervals, with a focus on sampling variability and estimating standard errors through SBI methods, and then follows with SBI methods for hypothesis testing. Similar to the ISI curriculum, the fourth unit of Lock-5 covers the additional topics of analysis of variance and bivariate regression, but additionally covers chi-square tests for categorical data and multiple regression. The third curriculum, the CATALST course, is based on a modeling paradigm described by George Cobb (2007) as randomize-repeat-reject and transferred into the CATALST course as modelsimulate (randomize and repeat)-evaluate. Similar to the other two courses, the CATALST course has an emphasis on statistical thinking and a focus on modeling through SBI methods. In contrast to the other two courses, the CATALST course does not teach theory-based methods. The course is divided into five units. The first unit develops students’ understanding of randomness and chance by engaging students in activities where they build Monte Carlo simulations based on probability models to answer statistical inference questions. The second unit builds on the first to focus on one-sample hypothesis testing. Descriptive statistics are reviewed, the null hypothesis is introduced as a just-by-chance probability model, the model-simulate-evaluate paradigm is practiced through several one-sample analyses, and the foundation for understanding p-values is developed. The third unit extends the students’ statistical thinking to comparing two groups by having students conduct randomization tests to analyse data from studies with random assignment to treatments. The null hypothesis is represented by a no-effect probability model, re-randomization is introduced as a method for estimating variability due to random assignment to treatment, methods for estimating the p-value are formalized, and ideas of statistical and practical significance are discussed. The fourth unit focuses on topics related to study design (e.g., random assignment, confounding variables, random selection, observational studies, and causation versus association). The final unit teaches students bootstrap methods for estimating confidence intervals. Students learn that the standard deviation of a sample statistic estimates the standard error, which is extended to the concept of margin of error, and engage in several activities to learn about the relationship between sample size and standard error. While all three curriculums teach SBI methods, they differ with the respect to the tools used to conduct SBI. All of the tools used in the respective curriculums allow the students to enter and edit data, and adjust some of the input and output features of the simulations. The ISI course uses a collection of online applets. Essentially a different applet is used for each type of statistical hypothesis test. While some features are the same or similar across the applets, there are differences in the layout of fields and buttons and the display of output across the applets that students have to learn. With respect to SBI methods, only applets for randomization tests are provided (i.e., none of the applets perform bootstrap sampling for confidence interval estimation). The interfaces allow the user to paste or enter observed data from which the applet generates reshuffled assignments, however only generic variable labels are provided (e.g., Group A, Group B) that cannot be customized. Separate applets are provided with preloaded data sets for eight different analyses covered in the ISI course. The ISI applets provide some animation to represent reshuffling random assignment to treatments and allow the student to customize input (e.g., specify the number of reshuffles) and output features (e.g., display results in tabular or graphical form). While the distribution of the sample statistic (e.g., difference in sample proportions) can be visually built in any specified increment for the number of shuffles, only results from the last reshuffled sample are available for review. The mean and standard deviation of the distribution of the sample statistic are displayed. The user can enter a reference value into a field (e.g., the observed difference in the sample) and click a button to estimate the corresponding p-value from the empirical distribution of the sample statistic. The Lock-5 course uses an online tool called StatKey. Similar to the ISI applets, StatKey is a collection of JavaScript applets for producing bootstrap confidence intervals and randomization hypothesis tests. Separate sets of applets are provided for randomization tests and bootstrap sampling







for confidence interval estimation. The applets share a common interface, which may facilitate use of all StatKey applets once students learn to use one of the applets. Similar to the ISI applets, observed data can be pasted or entered through an interface (although only generic variable labels are provided (e.g., Group 1, Group 2) that cannot be customized), the distribution of a sample statistic can be built up in increments, the mean and standard deviation of the distribution are displayed, and a p-value can be estimated by entering a reference value for randomization tests. There are also several dozen data sets from the course that can be selected and automatically loaded into StatKey. StatKey does not provide any animation to represent reshuffling or resampling. TinkerPlots™, a stand-alone software program (Konold & Miller, 2015) is used in the CATALST course. TinkerPlots™ does not include ready-made applets. Instead, TinkerPlots™ provides a set of tools that students use to build simulations to carry out randomization-based methods. Students learn to select from an array of chance devices (e.g., spinners, mixers, stacks, counters) to create samplers that appropriately model the null hypothesis, use the samplers to generate samples, collect a statistic from each sample, display a distribution of the collected statistics, and interact with the collections of statistics and distributions to estimate p-values, standard errors, and confidence intervals (see Garfield et al., 2012). Building a sampler involves several decisions ranging from selecting one or more random devices, populating devices (values can be entered, imported or pasted into devices), and deciding whether or not random selection or assignment is with or without replacement in order to appropriately match a study design. Animation is a key feature of TinkerPlots™ objects so that students can see values randomly selected from chance devices and moved one at a time into the collection of values for the sample. The TinkerPlots™ environment is highly customizable and interactive, allowing students to “tinker” with graphic displays, as well as see links between data collections and visual displays through dynamic highlighting and color shading. While students arguably encounter a steeper learning curve with TinkerPlots™ than might occur with ready-made applets, the decision process required by the active construction of the simulations may develop a deeper understanding of simulation-based methods. 3. Research on Randomization-Based Curriculums Several recent studies indicate that teaching with randomization-based methods can have positive effects on undergraduate students learning of statistical inference. Maurer and Lock (2016) conducted a complex experimental study with 101 undergraduates at the University of Iowa to compare a traditional approach to a SBI curriculum based on the Lock-5 course. Each student was enrolled in one of four course sections. All students received the same instruction on descriptive statistics, correlation and linear regression, and probability during the first seven weeks of the course. Instruction during the last seven weeks of the course differed, with two of the sections randomly assigned to receive instruction in simulation-based methods followed by instruction in normal theory methods, and the other two sections to receive instruction in only normal theory methods. Instruction on inference in both courses covered methods related to one-sample proportions and means, and testing differences between groups for proportions and means. The same two instructors team-taught both courses but instructor effects were counterbalanced by having each instructor teach during alternating weeks throughout the semester. A multivariate analysis of covariance (MANCOVA) was conducted. Scores on assignments and a midterm administered prior to the last seven weeks of the course were entered as covariates and curriculum was entered as the grouping variable. The two dependent variables were scores on the Confidence Interval and Tests of Significance assessments from the Assessment Resource Tools for Improving Statistical Thinking (ARTIST) project (Garfield & delMas, 2010). After controlling for pre-treatment measures, curriculum was found to be only marginally statistically significant (p = 0.099). Post hoc analyses indicated a positive effect of the randomization-based curriculum for understanding confidence intervals, with students in the SBI course scoring 7.1 percentage points higher than the non-SBI course, but not for understanding tests of significance. Garfield et al. (2012) provide evidence that students have positive attitudes toward taking the CATALST course and performed well on items that assessed statistical literacy and reasoning, especially with respect to understanding modeling and simulation. Beckman, delMas and Garfield (in







press) looked at the effects of the CATALST course on students’ ability to answer near and far transfer assessment items. The study was based on responses of 729 undergraduates from eight different institutions in the United States. The sample consisted of three different groups of students across 19 different class sections: 138 (4 class sections) who completed the CATALST course at the University of Minnesota, 151 (5 class sections) who completed the CATALST course at one of five other institutions, and 440 (10 class sections) who completed a traditional, non-CATALST course at one of the latter five institutions or at one of two other institutions. The assessment consisted of 21 forced-choice items, 16 that were adapted from the Comprehensive Assessment of Outcomes in a first course in Statistics (CAOS) test (delMas, Garfield, Ooms & Chance, 2007) and five new items that assessed understanding of simulation-based methods and statistical significance. Near transfer items assessed content taught directly in the CATALST course, whereas far transfer items assessed content that did not receive direct instruction, but experience in the course could lead to understanding of the assessed content. A linear mixed effects model with fixed effects for curriculum group and type of transfer and random effects for class section was fit to the assessment data. The fixed effects for curriculum group and type of transfer were statistically significant as well as the interaction between the two fixed effects (p < 0.01 for all effects). Results indicated that the non-CATALST group had the lowest mean performance, with comparable performance on the near and far transfer items (53% correct), whereas both of the CATALST groups performed about 7 percentage points higher on the near transfer items than the far transfer items. Across all items, the CATALST students at the University of Minnesota performed about 9 percentage points higher than the CATALST students at other institutions, and the latter group performed about 7.5 percentage points higher than the nonCATALST students. Using the CAOS test (delMas et al, 2007), Tintle and colleagues provide evidence that the ISI curriculum is associated with improvements in student understanding of tests of significance, simulation and the purpose of randomization (Tintle et al., 2011) and higher levels of post-course retention, especially with respect to understanding data collection, study design and tests of significance (Tintle et. al., 2012) when compared to students enrolled in non-SBI courses. More recently, Chance, Wong and Tintle (2016) report on a large scale study of students enrolled in the ISI course taught by 37 instructors across a variety of institutions in the United States. The purpose of the observational study was to measure the effect of different levels of experience with teaching the ISI curriculum and both institutional and student characteristics on learning outcomes. The outcome measure was a 30 item forced choice assessment based on items from the CAOS test and the Goals and Outcomes Associated with Learning Statistics (GOALS; Garfield et al., 2012). The assessment, which covered the areas of descriptive statistics, data collection, confidence intervals, tests of significance, and sampling variability, was administered as a pretest and a posttest. A complex linear mixed effects model that included pretest scores, student characteristic and instructor characteristic was fit using pretest to posttest change scores as the dependent variable. Some of the findings were not surprising: positive gain scores were associated with higher confidence in learning statistics and higher academic preparation. Somewhat surprising, there was not strong evidence that students of instructors with more experience teaching ISI had higher gains, indicating that the ISI curriculum was effective for even first time instructors. Similar to previous results, the largest pretest to posttest gains were in understanding sampling variability (6 to 11 percentage point increase), confidence intervals (9 to 14 percentage point increase) and tests of significance (10 to 12 percentage point increase). 4. Discussion The studies reviewed above provide evidence that teaching with SBI instructional methods is feasible and that they can improve student conceptual understanding of sampling variability, confidence intervals and hypothesis testing. However, most of the evidence is based on observational studies. There is a need for larger scale studies that use experimental designs with random assignment to type of curriculum (e.g. non-SBI or SBI curriculums; comparison of different SBI curriculums). Such studies will require significant funding and detailed design to manage implementations across numerous institutions, instructors, and classrooms.







References Beckman, M. D., delMas, R. C. & Garfield, J. (in press). Cognitive transfer for a simulation-based introductory statistics curriculum. Statistics Education Research Journal. Cobb, G. W. (2007). The introductory statistics course: A Ptolemaic curriculum. Technology Innovations in Statistics Education, 1(1), 1-15. http://escholarship.org/uc/item/6hb3k0nz#page-1 Coakley, C. W., (1996). Suggestions for your nonparametric statistics course. Journal of Statistics Education, 4(2). Retrieved from http://www.amstat.org/publications/jse/ v4n2/coakley.html delMas, R., Garfield, J., Ooms, A., & Chance, B. (2007). Assessing students’ conceptual understanding after a first course in statistics. Statistics Education Research Journal, 6(2), 28-58. http://iase-web.org/documents/SERJ/SERJ6(2)_delMas.pdf Efron, B. (2000). The bootstrap and modern statistics. Journal of the American Statistical Association, 95(452), 1293-1296. Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York: Chapman and Hall. Ernst, M. D. (2004). Permutation methods: A basis for exact inference. Statistical Science, 19(4), 676685. Garfield, J. & delMas, R. (2010). A website that provides resources for assessing students’ statistical literacy, reasoning, and thinking. Teaching Statistics, 32(1), 2-7. Garfield, J., delMas, R. & Zieffler, A. (2012). Developing statistical modelers and thinkers in an introductory, tertiary-level statistics course. ZDM: The International Journal on Mathematics Education, 44 (7), 883-898. http://dx.doi.org/10.1007/s11858-012-0447-5. Garthwaite, P. H. (1996). Confidence intervals from randomization tests. Biometrics, 52, 1387-1393. Giesbrecht, N., Sell, Y., Scialfa, C., Sandals, L., & Ehlers, P. (1997). Essential topics in introductory statistics and methodology courses. Teaching of Psychology, 24(4), 242-246. Gigerenzer, G., and Hoffrage, U. (1995). How to Improve Bayesian Reasoning Without Instructions: Frequency Formats. Psychological Review, 102, 684-704. Konold, C., & Miller, C. (2015). TinkerPlots™ Version 2.3 [computer software]. Learn Troop. http://www.tinkerplots.com/ Lock, R. H., Lock, P. F., Morgan, K. L., Lock, E. F., & Lock, D. F. (2017). Statistics: Unlocking the power of data (2nd Edition). Hoboken, NJ: Wiley. Maurer, K., & Lock, D. (2016). Comparison of Learning Outcomes for Simulation-based and Traditional Inference Curricula in a Designed Educational Experiment. Technology Innovations in Statistics Education, 9(1). http://escholarship.org/uc/item/0wm523b0 Rossman, A. J. (2008). Reasoning about informal statistical inference: One statistician’s view. Statistics Education Research Journal, 7(2), 5-19. Retrieved from http://iaseweb.org/documents/SERJ/SERJ7(2)_Rossman.pdf Simon, J. L., Atkinson, D. T., & Shevokas, C. (1976). Probability and statistics: Experimental results of a radically different teaching method. American Mathematical Monthly, 83(9), 733-739. Tintle, N., Chance, B. L., Cobb, G. W., Rossman, A. J., Roy, S., Swanson, T., & VanderStoep, J. (2016). Introduction to statistical investigations. Hoboken, NJ: Wiley. Tintle, N., Topliff, K., Vanderstoep, J., Holmes, V., & Swanson, T. (2012). Retention of statistical concepts in a preliminary randomization-based introductory statistics curriculum. Statistics Education Research Journal, 11(1), 21-40. Retrieved from http://iaseweb.org/documents/SERJ/SERJ11(1)_Tintle.pdf Tintle, N., Vanderstoep, J., Holmes, V., Quisenberry, B., & Swanson, T. (2011). Development and assessment of a preliminary randomization-based introductory statistics curriculum. Journal of Statistics Education, 19(1). Retrieved from http://www.amstat.org/publications/jse/ v19n1/tintle.pdf Zieffler, A. & Catalysts for Change (2017). Statistical thinking: A simulation approach to uncertainty (4th Edition). Minneapolis, MN: Catalyst Press.

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.