Ch 10. Hypothesis Tests / SWT - BrownMath.com [PDF]

One-Tailed or Two-Tailed? Pick the Right Hypotheses; p < α in Two-Tailed Test: What Does it Tell You? 10B3. What Doe

3 downloads 23 Views 356KB Size

Recommend Stories


Ch 5. Probability / SWT - BrownMath.com [PDF]
Jan 13, 2015 - Exercises for Chapter 5 ... If you're learning independently, you can skip the sections marked “Optional” and still understand the chapters that follow. ... because there are too many variables or because you don't know enough: the

Chi-Square Hypothesis Tests
Don't fear change. The surprise is the only way to new discoveries. Be playful! Gordana Biernat

Ch 10 SecRev.cwk
If you feel beautiful, then you are. Even if you don't, you still are. Terri Guillemets

Ch 10 Lecture Notes
You miss 100% of the shots you don’t take. Wayne Gretzky

Ch. 10 Checkpoints
What we think, what we become. Buddha

OLS with One Regressor: Hypothesis Tests
Learn to light a candle in the darkest moments of someone’s life. Be the light that helps others see; i

bayesian and frequentist hypothesis tests of heteroscedasticity
Every block of stone has a statue inside it and it is the task of the sculptor to discover it. Mich

Local Private Hypothesis Testing: Chi-Square Tests
Kindness, like a boomerang, always returns. Unknown

Ch 10 Crossword Puzzle Review
It always seems impossible until it is done. Nelson Mandela

Ch 10 Day 3 Notes.notebook
We must be willing to let go of the life we have planned, so as to have the life that is waiting for

Idea Transcript


BrownMath.com Õ Stats w/o Tears Õ 10. Hypothesis Tests

Stats without Tears

10. Hypothesis Tests Updated 1 Jan 2016 Copyright © 2002–2017 by Stan Brown View or Print:

These pages change automatically for your screen or printer. Underlined text, printed URLs, and the table of contents become live links on screen; and you can use your browser’s commands to change the size of the text or search for key words.

Summary:

You want to know if something is going on (if there’s some effect). You assume nothing is going on (null hypothesis), and you take a sample. You find the probability of getting your sample if nothing is going on (p-value). If that’s too unlikely, you conclude that something is going on (reject the null hypothesis). If it’s not that unlikely, you can’t reach a conclusion (fail to reject the null).

Contents:

10A. Testing a Proportion (Binomial Data) 10A1. Example 1: Swain v. Alabama · Step 1: Hypotheses · Step 2: Significance Level · Step RC: Requirements Check · Steps 3/4: Test Statistic and p-Value · Step 5: Decision Rule · Step 6: Conclusion (in English) 10A2. Example 2: Cancer Screening 10A3. Example 3: Small Samples 10B. Sharp Points 10B1. Type I and Type II Errors 10B2. One-Tailed or Two-Tailed? · Pick the Right Hypotheses · p < in Two-Tailed Test: What Does it Tell You? 10B3. What Does the p-Value Mean? 10B4. Practical and Statistical Significance 10B5. Conclusions: Write ’em Right! · When p < , you reject H 0 and accept H1 . · When p > , you fail to reject H 0 . 10C. Testing a Mean (Numeric Data) 10C1. Example 12: Bank Deposits 10C2. Example 13: Smokers and Retirement 10D. Confidence Interval and Hypothesis Test 10E. Testing a Non-Random Sample What Have You Learned? Exercises for Chapter 10 Problem Set 1 Problem Set 2

10A. Testing a Proportion (Binomial Data) Remember the Swain v. Alabama [URL: https://BrownMath.com/swt/chap08.htm#c08_Swain] example? In a county that was 26% African American, Mr. Swain’s jury pool of 100 men had only eight African Americans. In that example, you assumed that selection was not racially biased, and on that basis you computed the probability of getting such a low proportion. You found that it was very unlikely. This disconnect between the data and the claim led you to reject the claim. You didn’t know it, but you were doing a hypothesis test. This is the standard way to test a claim in statistics: assume nothing is going on, compute the probability of getting your sample, and then draw a conclusion based on that probability. In this chapter, you’ll learn some formal methods for doing that. BTW: The basic procedure of a hypothesis test or significance test is due to Jerzy Neyman (1894–1981), a Polish American, and Egon Pearson (1895–1980), an Englishman. They published the relevant paper in 1933.

We’re going to take a seven-step approach to hypothesis tests. The first examples will be for binomial data, testing a claim about a population proportion. Later in this chapter you’ll use a similar approach with numeric data to test a claim about a population mean. In later chapters you’ll learn to test other kinds of claims, but all of them will just be variations on this theme.

10A1. Example 1: Swain v. Alabama

Step 1: Hypotheses Your first task is to turn the claim into algebra. The claim may be that nothing is going on, or that something is going on. You always have two statements, called the null and alternative hypotheses. Definition:

The null hypothesis, symbol H0 , is the statement that nothing is going on, that there is no effect, “nothin’ to see here. Move along, folks!” It is an equation, saying that p, the proportion in the population (which you don’t know), equals some number.

Definition:

The alternative hypothesis, symbol H1 , is the statement that something is going on, that there is an effect. It is an inequality, saying that p is different from the number mentioned in H0 . (H1 could specify , or just ≠.)

The hypotheses are statements about the population, not about your sample. You never use sample data in your hypotheses. (In real life you can’t make that mistake, since you write your hypotheses before you gather data. But in the textbook and the classroom, you always have sample data up front, so don’t make a rookie mistake.) You must have the algebra (symbols) in your hypotheses, but it can also be helpful to have some English explaining the ultimate meaning of each hypothesis, or the consequences if each hypothesis is true. Here you want to know whether there’s racial bias in jury selection in the county. You don’t want to know if the proportion of African Americans in Mr. Swain’s jury pool is less than 26%: obviously it is. You want to know if it’s too different — if the difference is too great to be believable as the result of random chance. Write your hypotheses this way: (1) H0 : p = 0.26, there’s no racial bias in jury selection H1 : p < 0.26, there is racial bias in jury selection Obviously those can’t both be true. How will you choose between them? You’ll compute the probability of getting your sample (or a more unexpected one), assuming that the null hypothesis H0 is true, and one of two things will happen. Maybe the probability will be low. In that case you rule out the possibility that random chance is all that’s happening in jury selection, and you conclude that the alternative hypothesis H1 is true. Or maybe the probability won’t be too low, and you’ll conclude that this sample isn’t unusual (unexpected, surprising) for the claimed population. The number in your null hypothesis H0 , with binomial data, is called p o because it’s the proportion as given in H0 . (You may want to refer to the Statistics Symbol Sheetto help you keep the symbols straight.) BTW: What exactly is p? Yes, it’s the population proportion being tested, but what’s the population? It can’t be people in the county, or men in the county, or African-American men in the county.

In fact it’s all people serving on Talladega County jury pools past, present and future. If there’s racial bias, then African Americans are less likely to be selected than whites, and — probability of one, proportion of all — therefore the overall population of jury pools has less than 26% African Americans. If there’s no racial bias, then in the long run the overall population of jury pools has the same 26% of African Americans as the county. BTW: Although a hypothesis test is officially about the population, in cases like this one it’s okay to think of it as answering a simpler question: Is the difference between the claim of no racial bias and the reality of this sample significant, or could it be explained away as the result of random chance? The hypotheses are the same either way, the calculations are the same, and the conclusions are the same.

This is why a hypothesis test is also called a significance test or a test of significance. Step 2: Significance Level Okay, you’re looking to figure out if this sample is inconsistent with the null hypothesis. In other words, is it too unlikely, if the null hypothesis H0 is true? But what do you mean by “too unlikely”? Back in Chapter 5, we talked about unusual events, with a threshold of 5% or 0.05 for such events. We’ll use that idea in hypothesis testing and call it a significance level. Definition:

The significance level, symbol (the Greek letter alpha), is the chance of being wrong that you can live with. By convention, you write it as a decimal, not a percentage.

(2) = 0.05 A significance level of 0.05 is standard in business and science. If you can’t tolerate a 5% chance of being wrong — if the consequences are particularly serious — use a lower significance level, 0.01 or 0.001 for example. (0.001 is common if there’s a possibility of death or serious disease or injury.) If the consequences of being wrong are especially minor, you might use a higher significance level, such as 0.10, but this is rare in practice. In a classroom setting, you’re usually given a significance level to use. BTW: Later in this chapter, you’ll see that the significance level is actually concerned with a particular way of being wrong, a Type I error.

Step RC: Requirements Check Back in Chapter 8, you learned the CLT’s requirements for binomial data [URL: https://BrownMath.com/swt/chap08.htm#c08_SampDistPropShape]: random sample not larger than 10% of population, and at least 10 successes and 10 failures expected if the null hypothesis is true. You compute expected successes as np o by using p o , which is the number from H0 . Expected failures are then sample size minus expected successes, n−np o in symbols. Steps 3 and 4 need the sampling distribution of the proportion to be a ND, so you must check the requirements as part of your hypothesis test. (RC)

Random sample? Yes, according to the county. 4 10n = 10×100 = 1000. We don’t know the number of adult males in the county, but it must be greater than 1000, surely. (“I know that, and don’t call me Shirley.”) 4 Expected successes = np o = 100×.26 = 26; expected failures are 100−26 = 74; both are ≥ 10. 4

You might wonder about the first test. “The county may say it’s random, but I don’t believe it. Isn’t that why we’re running this test?” Good question! Answer: Every hypothesis test assumes the null hypothesis is true and computes everything based on that. If you end up deciding that the sample was too unlikely, in effect you’ll be saying “I assumed nothing was going on, but the sample makes that just too hard to believe.” This same idea — the null hypothesis H0 is innocent till proven guilty — explains why you use 0.26 (p o ) to figure expected successes and failures, not 0.08 (p ). Again, the county claims that there’s no racial bias. If that’s true, if there’s no funny business going on, then in the long run 26% of members of jury pools should be African American. Comment: Usually, if requirements aren’t met you just have to give up. But for one-population binomial data, where the other two requirements are met but expected successes or failures are much under 10, you can use MATH200A part 3 to compute the p-value directly. There’s an example in “Small Samples”, below. Steps 3/4: Test Statistic and p-Value This is the heart of a hypothesis test. You assume that the null hypothesis is true, and then use what you know about the sampling distribution [URL: https://BrownMath.com/swt/chap08.htm] to ask: How likely is this sample, given that null hypothesis? Definition:

A test statistic is a standardized measure of the discrepancy between your null hypothesis H0 and your sample. It is the number of standard errors that the sample lies above or below H0 .

You can think of a test statistic as a measure of unbelievability, of disagreement between H0 and your sample. A sample hardly ever matches your null hypothesis perfectly, but the closer the test statistic is to zero the better the agreement, and the further the test statistic is from 0 the worse the sample and the null hypothesis disagree with each other. BTW: Because you showed that the sampling distribution is normal and the standard error of the proportion [URL: https://BrownMath.com/swt/chap08.htm#c08_SampDistPropSpread] is implicitly known, this is a z test. The test statistic is z = (p−po) / p where

Definition:

, but as you’ll see your calculator computes everything for you.

The p-value is the probability of getting your sample, or a sample even further from H0 , if H0 is true. The smaller the p-value, the stronger the evidence against the null hypothesis.

Inferential Statistics: Basic Cases [URL: https://BrownMath.com/swt/casesbas.htm] tells you that binomial data in one population are Case 2. This is a hypothesis test of population proportion, and you use 1-PropZTest on your calculator. To get to that menu selection, press [STAT] [] [5]. Enter p o from the null hypothesis H0 , followed by the number of successes x, the sample size n, and the alternative hypothesis H1 . Write everything down before you select Calculate. When you get to the output screen, check that your alternative hypothesis H1 is shown correctly at the top of the screen, and then write down everything that’s new.



(3/4) 1-PropZTest .26, 8, 100,


Step 5: Decision Rule There are two and only two possibilities, and all you have to do is pick the correct one based on your p-value and your : p < . Reject H0 and accept H1 . or p . Fail to reject H 0 .

Because this textbook helps you, please donate at BrownMath.com/donate.

Caution! There are lots of p’s in problems involving population proportions (Case 2), so make sure you select the right one. The p-value is the first p on the 1-PropZInt output screen. You can add the numbers, if you like — p < (0.000 020 < 0.05) — but the symbols are required. (5) p < . Reject H 0 and accept H1 . What are you saying here? The p-value was very small, so that means the chance of getting this sample, if there’s no racial bias, was very small. Previously, you set a significance level of 0.05, meaning you would consider this sample too unlikely if its probability was under 5%. Its probability is under 5%, so the sample and the null hypothesis contradict each other. The sample is what it is, so you can’t reject the sample. Therefore you reject H0 and accept H1 — you declare that there is racial bias. Another way to look at it: Any sample will vary from the population because random selection is always operating to produce sampling error [URL: https://BrownMath.com/swt/chap01.htm#c01_ErrorsSampling]. But the difference between this sample and the supposed population proportion is just too great to be produced by random selection alone. Something else must be going on also. That something else is the alternative hypothesis H1 . Definition:

When the p-value is below , the sample is too unlikely to come from ordinary sample variability alone, and you have a significant result, or your result is statistically significant.

You always select a significance level before you know the p-value. If you could first get the p-value and then specify a significance level, you could get whichever result you wanted, and there would be no point to doing a hypothesis test at all. Choosing up front keeps you honest. Step 6: Conclusion (in English) Since you accepted H1 in the previous step, that’s your conclusion. If you have already written it in English as part of the hypotheses, as I did, then most of your work is already done. You do need to add the significance level or the p-value, so your conclusion will look something like one of these: (6) The 8% proportion of African American men in Mr. Swain’s jury pool is significantly below the expected 26%, and this is evidence at the 0.05 level of significance of racial bias in the selection. or (6) The 8% proportion of African American men in Mr. Swain’s jury pool is significantly below the expected 26%, and this is evidence of racial bias in the selection (p = 0.000 020). If you’re publishing your hypothesis test, you’ll want to write a thorough conclusion that still makes sense if it’s read on its own. But in class exercises you don’t have to write so much. It’s enough to write “At the 0.05 significance level, there is racial bias in jury selection” or “There is racial bias in jury selection (p = 0.000 020)”.

10A2. Example 2: Cancer Screening The Colorectal Cancer Screening Guidelines (CDC 2014 [see “Sources Used” at end of book]) recommend a colonoscopy every ten years for adults aged 50 to 75. A public-health researcher believes that only a minority are following this recommendation. She interviews a simple random sample of 500 adults aged 50–75 in Metropolis (pop. 6.4 million) and finds that 235 of them have had a colonoscopy in the past ten years. At the 0.05 level of significance, is her belief correct? Solution: The population is adults aged 50–75 in Metropolis. You want to know whether a minority of them — under 50% — follow the colonoscopy guideline. Each person either does or does not, so you have binomial data, a test of proportion (Case 2 in Inferential Statistics: Basic Cases [URL: https://BrownMath.com/swt/casesbas.htm]). Try to write out the hypothesis test yourself before you look at mine below. Reminder: Even though you already have the sample data in the problem, when you write the hypotheses, ignore the sample. In principle, you write the hypotheses, then plan the study and gather data. If you use any of the sample data in the hypotheses, something is wrong. You should have written something pretty close to this: (1)

H0 : p = 0.5, half the seniors of Metropolis follow the guideline H1 : p < 0.5, less than half follow the guideline

(2)

= 0.05

(RC)

Random sample? Yes. 10n ≤ N? Yes, 10n = 10×500 = 5000, surely less than the number of adults aged 50–75 in a population of 6,400,000. At least 10 successes and 10 failures expected? Yes, np o = 500×.5 = 250, and n−np o = 500−250 = 250.

(3/4) 1-PropZTest: p o =.5, x=235, n=500, pp o outputs: z=−1.34, pval=0.0899, p =0.47 (5)

p . Fail to reject H 0 .

(6)

At the 0.05 level of significance, it’s impossible to say whether less than half of Metropolis seniors aged 50–75 follow the CDC guideline for a colonoscopy every ten years or not. [Or, It’s impossible to say whether less than half of Metropolis seniors aged 50–75 follow the CDC guideline for a colonoscopy every ten years or not (p = 0.0899).]

Important: When p is greater than , you fail to reach a conclusion. In this situation, you must use neutral language. You mention both possibilities without giving more weight to either one, and you use words like “impossible to say” or “can’t determine”. This is unsatisfying, frankly. You go through all the trouble of gathering data and then you end up with a non-conclusion. Can anything be salvaged from this mess? Yes, you can do a confidence interval. This at least will let you set bounds on what percent of all seniors follow the guidelines. You’ve already tested requirements as part of the hypothesis test, so go right into your calculations and conclusion. You’re free to pick any confidence level you wish, but 95% is most usual. 1-PropZInt, 235, 500, .95 outputs: (.42625, .51375) 42.6% to 51.4% of Metropolis seniors aged 50–75 follow the CDC guideline on screening for colorectal cancer. In a classroom setting, or on regular homework, if you’re assigned a hypothesis test do that and don’t feel obligated to do a confidence interval also. But in real life, and on labs and projects for class, you’ll usually want to do both.

10A3. Example 3: Small Samples What if your sample is so small that expected successes np o or expected failures n−np o are under 10? You can no longer use 1-PropZTest, which assumes that the sampling distribution of the proportion is ND, but you can compute the binomial probability directly as long as the other two requirements are still met (SRS and 10n≤N). Only the calculation of the p-value changes. Example: In 2001, 9.6% of Fictional County motorists said that fuel efficiency was the most important factor in their choice of a car. For her statistics project, Amber set out to prove that the percentage has increased since then. She interviewed 80 motorists in a systematic sample of those registering vehicles at the DMV, and 13 of them said that fuel efficiency was the most important factor in their choice of a car. Test her hypothesis, at the 0.05 significance level. Please write out your hypothesis test before you look at mine. (1)

H0 : p = 0.096, percentage has not increased H1 : p > 0.096, percentage has increased

(2)

= 0.05

(RC)

SRS? Systematic sample can be analyzed like a random sample. 4 10n≤N? 10×80 = 800, less than number of car owners in any county. 4 Expected successes are np o = 80×.096 = 7.7, too far below 10 to live with. 8 The sampling distribution of p doesn’t follow the normal model, so you can’t use 1-PropZTest. But the other two requirements are met, so you can proceed, calculating the binomial probability directly.

(3/4) MATH200A/Binomial prob: n=80, p=0.096, x=13 to 80; p-value = 0.0410 (If you don’t have the program, use 1−binomcdf(80,0.096,12) = 0.0410.) [Why 13 to 80? H1 contains >, so you test the probability of getting the sample you got, or a larger one, if H0 is true. If H1 contained . The significance level was given in the problem. (Problems will usually give you an to use.) (2) = 0.05 Next is the requirements check. Even though it doesn’t have a number, it’s always necessary. In this case, n = 20, which is less than 30, so you have to test for normality and verify that there are no outliers. Enter your data in any statistics list (I used L5), and check your data entry carefully. Use the MATH200A program “Normality chk” to check for a normal distribution and “Box-whisker” to verify that there are no outliers.

You don’t need to draw the plots, but do write down r and crit and show the comparison, and do check for outliers. (For what to do if you have outliers, see Chapter 3.) (RC)

Random sample: given. 10n = 10×20 = 200, and the bank had better have more deposits than that or it can’t afford to pay you for your work! Normality: yes. From MATH200A part 4, r(0.9864) > crit(0.9503). Outliers: none (MATH200A part 2).

Now it’s time to compute the test statistic (t) and the p-value. On the T-Test screen, you have to choose Data or Stats just as you did on the TInterval screen. You have the actual data, so you select Data on the T-Test screen, instead of Stats. Then the sample mean, sample SD, and sample size are shown on the output screen, so you write them down as part of your results. Always write down x , s, and n.

(3/4) T-Test: µ o =200, List=L5, Freq=1, µ≠µ o results: t=−2.33, p=0.0311, x =189.40, s=20.37, n=20 The decision rule is the same for every single hypothesis test, regardless of data type. In this case: (5) p < . Reject H 0 and accept H1 . And as usual, you can write your conclusion with the significance level or the p-value: (6) At the 0.05 level of significance, management is incorrect and the average of all cash deposits is different from $200.00. In fact, the true average is lower than $200.00. Or, (6) Management is incorrect, and the average of all cash deposits is different from $200.00 (p = 0.0311). In fact, the true average is lower than $200.00. Remember what happens when you do a two-tailed test (≠ in H1 ) and p turns out less than : After you write your “different from” conclusion, you can go on to interpret the direction of the difference. See p < in TwoTailed Test. In a classroom exercise, if you were asked to do a hypothesis test you would do a hypothesis test and only a hypothesis test. But in real life, and in the big labs for class, it makes sense to answer the obvious question: If the true mean is less than $200.00, what is it? You don’t have to check requirements for the CI, because you already checked them for the HT. TInterval L5, 1, .95 outputs: (179.86, 198.93) With 95% confidence, the average of all cash deposits is between $179.86 and $198.93.

10C2. Example 13: Smokers and Retirement Here’s an example where you have statistics without the raw data. It’s adapted from Sullivan (2011, 483) [see “Sources Used” at end of book]. According to the Centers for Disease Control, the mean number of cigarettes smoked per day by individuals who are daily smokers is 18.1. Do retired adults who are daily smokers smoke less than the general population of daily smokers? To answer this question, Sascha obtains a random sample of 40 retired adults who are current daily smokers and record the number of cigarettes smoked on a randomly selected day. The data result in a sample mean of 16.8 cigarettes and a SD of 4.7 cigarettes. Is there sufficient evidence at the = 0.01 level of significance to conclude that retired adults who are daily smokers smoke less than the general population of daily smokers? Solution: Start with the hypotheses. You’re comparing the unknown mean µ for retired smokers to the fixed number 18.1, the known mean for smokers in general. Since the data type is numeric (number of cigarettes smoked), and there’s one population, and you don’t know the SD of the population, this is Case 1, test of population mean, from Inferential Statistics: Basic Cases [URL: https://BrownMath.com/swt/casesbas.htm]. (1)

H0 : µ = 18.1, retired smokers smoke the same amount as smokers in general H1 : µ < 18.1, retired smokers smoke less than smokers in general Comment: The claim is a population mean of 18.1, so you use 18.1 in your hypotheses. Using the sample mean of 16.8 in Step 1 is a rookie mistake, one of the Top 10 Mistakes of Hypothesis Tests [URL: https://BrownMath.com/stat/topten.htm]. Never use sample data in your hypotheses. Comment: Why does H1 have < instead of ≠? The short answer is: that’s what the problem says to do. In the real world, you would do a two-tailed test (≠) unless there’s a specific reason to do a one-tailed test (< or >); see One-Tailed or Two-Tailed? (earlier in this document). Presumably there’s some reason why they are interested only in the case “retired smokers smoke less” and not in the case “retired smokers smoke more”.

(2)

= 0.01

(RC)

Random sample (given). n > 30. 10n = 10×40 = 400, less than the total number of retired smokers. Therefore the sampling distribution is normal.

(3/4)

T-Test: µ o =18.1, x =16.8, s=4.7, n=40, µ . Fail to reject H 0 .

(6)

At the 0.01 level of significance, we can’t determine whether the average number of cigarettes smoked per day by retired adults who are current smokers is less than the average for all daily smokers or not. Or, We can’t tell whether the average number of cigarettes smoked per day by retired adults who are current smokers is less than the average for all daily smokers or not (p = 0.0440).

When you fail to reject H0 , you cannot reach any conclusion. You must use neutral language in your non-conclusions. Please review When p > , you fail to reject H 0 earlier in this chapter.

10D. Confidence Interval and Hypothesis Test Summary:

You can use a confidence interval to conclude whether results are statistically significant. A hypothesis test (HT) and confidence interval (CI) are two ways of looking at the same thing: what possibilities for the population mean or proportion are consistent with my sample? A 95% CI is the flip side of a 0.05 two-tailed HT. More generally, a 1− CI is the complement of an two-tailed HT.

Example 14: The baseline rate for heart attacks in diabetes patients is 20.2% in seven years. You have a new diabetes drug, Effluvium, that is effective in treating diabetes. Clinical trials on 89 patients found that 27 (30.3%) had heart attacks. The 95% confidence interval is 20.8% to 39.9% likelihood of heart attack within seven years for diabetes patients taking Effluvium. What does this tell you about the safety of Effluvium? Solution: Okay, you’re 95% confident that Effluvium takers have a 20.8% to 39.9% chance of a heart attack within seven years. If you’re 95% confident that their chance of heart attack is inside that interval, then there’s only a 5% or 0.05 probability that their chance of heart attack is outside the interval, namely 39.9%. But 20.2% is outside the interval, so there’s less than a 0.05 chance that the true probability of heart attack with Effluvium is 20.2%. CI and HT calculations both rely on the sampling distribution. The open curve centered on 20.2% shows the sampling distribution for a hypothetical population proportion of 20.2%. Only a very small part of it extends beyond 30.3%, the proportion of heart attacks you actually found in your sample. The chance of getting your sample, given a hypothetical proportion p o in the population, is the p-value. If p o = 20.2%, your sample with p = 30.3% would be unlikely (p-value below 0.05). You would reject the null hypothesis and conclude that Effluvium takers have a different likelihood of heart attack from other diabetes patients, at the 0.05 significance level. Further, the entire confidence interval is above the baseline value, so you know that Effluvium increases the likelihood of heart attack in diabetes patients. At significance level 0.05, a two-tailed test against any value outside the 95% confidence interval (the shaded curve) would lead to rejecting the null hypothesis. And you can say the same thing for any other significance level and confidence level 1−. What if the interval does include the baseline or hypothetical value? Then you fail to reject the null hypothesis. Example 15: A machine is supposed to be turning out something with a mean value of 100.00 and SD of 6.00, and you take a random sample of 36 objects produced by the machine. If your sample mean is 98.4 and SD is 5.9, your 95% confidence interval is 96.4 to 100.4. Now, can you make any conclusion about whether the machine is working properly? Solution: Well, you’re 95% confident that the machine’s true mean output is somewhere between 96.4 and 100.4. With this sample, you can rule out a true population mean of 100.4, at the 0.05 significance level; but you can’t rule out a true population mean between 96.4 and 100.4 at = 0.05. A hypothesis test would fail to reject the hypothesis that µ = 100. You can’t determine whether the true mean output of the machine is equal to 100 or not.

When µ o or p o is inside the 1− CI, the two-tailed p-value is > . Your sample does not contradict H 0 and you fail to reject H0 . When µ o or p o is outside the 1− CI, the two-tailed p-value is < . Your sample contradicts H 0 , and you reject H0 . Leaving the symbols aside, when you test a null hypothesis your sample either is surprising (and you reject the null hypothesis) or is not surprising (and you fail to reject the null). Any null hypothesis value inside the confidence interval is close enough to your sample that it would not get rejected, and any null hypothesis value outside the interval is far enough from the sample that it would get rejected. Special Note for Binomial Data For numeric data, the CI and HT are exactly equivalent. But for binomial data, the CI and HT are only approximately equivalent. Why? Because with binomial data, the HT uses a standard error derived from p o in the null hypothesis, but the CI uses a standard error derived from p , the sample proportion. Since the standard errors are slightly different, right around the borderline they might get different answers. But when the hypothetical p o is a fair distance outside the CI, as it was in the drug example, the p-value will definitely be less than . What about One-Tailed Tests? Good question! A confidence interval is symmetric (for the cases you study in this course), so it’s intrinsically two-tailed. A one-tailed HT for < or > at = 0.01 corresponds to a two-tailed HT for ≠ at = 0.02, so the CI for a one-tailed HT at = 0.01 is a 98% CI, not a 99% CI. The confidence level for a one-tailed is 1−2 , not 1−. Correspondence between Significance Level and Confidence Level

0.05

0.01

0.001

tails

C-Level

1

1−2×.05 = 90%

2

1−.05 = 95%

1

1−2×.01 = 98%

2

1−.01 = 99%

1

1−2×.001 = 99.8%

2

1−.001 = 99.9%

If the baseline value is outside the confidence interval, you can say (at the appropriate significance level) that the true value of µ or p is different from the baseline, and then go on to say whether it’s bigger or smaller, so you get your one-tailed result. On the other hand, if the baseline value is inside the confidence interval, you can’t say whether the true µ or p is equal to the baseline or different from it, and if you can’t say whether they’re different then you can’t say which one is bigger than the other.

10E. Testing a Non-Random Sample Though most hypothesis tests are to find out something about a population, sometimes you just want to know whether this sample is significantly different from a population. In this case, you don’t need a random sample, but the other requirements must still be met. Example 16: At Wossamatta University, instructors teach the statistics course independently but all sections take the same final exam. (There are several hundred students.) One semester, the mean score on the exam is 74. In one section of 30 students, the mean was 68.2 and the SD was 10.4. The students felt that they had not been adequately prepared for the exam by the instructor. Can they make their case? Solution: In effect, they are saying that their section performance was significantly below the performance of students in the course overall. This is a testable hypothesis. But the hypothesis is not about the population that these 30 students were drawn from; we already know about that population. Instead, it is a test whether this sample, as a sample, is different from the population. (1)

H0 : This section’s mean was no different from the course mean. H1 : This section’s mean was significantly below the course mean.

(2)

= 0.05

(RC)

(Omit the requirement for a random sample.) 10n = 10×30 = 300 is less than the “several hundred students” in the course. Sample size is ≥30, so the sampling distribution is normal.

(3/4) TTest: µ = 74, x = 68.2, s = 10.4, n = 30, µ < µ o Outputs: t = −3.05, p-value = 0.0024 (5)

p < . Reject H 0 and accept H1 .

(6)

This section’s average exam score was less than the overall course average (p-value = 0.0024).

Okay, there was a real difference. This section’s mean exam score was not only below the average for the whole course, but too far below for random chance to be enough of an explanation. But did the students prove their case? Their case was not just that their average score was lower, but that the difference was the result of poor teaching. Statistics can’t answer that question so easily. Maybe it was poor teaching; maybe these were weaker students; maybe it was environmental factors like classroom temperature or the time of day; maybe it was all of the above.

What Have You Learned? Key ideas:

(The online book has live links.) You don’t know the proportion or mean of a population. You want to test whether it is different from some baseline number. You take a sample, and then compute how likely that sample would be if the true proportion or mean in the population is equal to that baseline. If the sample is too unlikely, you reject the null hypothesis and conclude that the true proportion or mean must be different from that baseline number. Know the seven steps of hypothesis tests. Know them by heart, and write them on your cheat sheet if you need to. Know whether you have binomial or numeric data. This totally determines which type of test you will do, so think before you act! When you have numeric data, you test for the mean of a population (hypotheses about µ). When you have binomial data in a count of successes, you test for the proportion in a population (hypotheses about p). Understand one-tailed versus two-tailed tests. When should you use which one? How do you interpret the results in step 6? Understand the significance level . Know how to pick an appropriate level. Understand the p-value. It’s the probability, if H0 is true, of getting the sample you got (or one even further away from H0 ). Know how to write conclusions (if p-value < ) or non-conclusions (if p-value > ). Understand Type I and Type II errors. Describe what each one means in specific situations. Understand the relationship between a confidence interval and a hypothesis test. How can you relate the endpoints of a CI to whether you do or don’t have a statistically significant result, so that H0 would or wouldn’t be rejected?

Study aids:

Inferential Statistics: Basic Cases Interactive: Triage: Which Inferential Stats Case Should I Use? Seven Steps of Hypothesis Tests Top 10 Mistakes of Hypothesis Tests Statistics Symbol Sheet

Because this textbook helps you, please donate at BrownMath.com/donate.

Exercises for Chapter 10 Write out your solutions to these exercises, showing your work for all computations. Then check your solutions against the solutions page and get help with anything you don’t understand. Caution! If you don’t see how to start a problem, don’t peek at the solution — you won’t learn anything that way. Ask your instructor or a tutor for a hint. Or just leave it and go on to a different problem for now. You may find when you return to that “impossible” problem that you see how to do it after all.

Problem Set 1

1 2 3 4

List the seven steps of every hypothesis test.

5

You are testing whether the new accelerant makes your paint dry faster. (You have already eliminated the possibility that it makes your paint dry slower.) (a) What conclusion would be a Type I error? What wrong action would a Type I error lead you to take? (b) What conclusion would be a Type II error? What wrong action would a Type II error lead you to take?

Why must you select a significance level before computing a p-value? Explain the p-value in your own words. You’ve tested the hypothesis that the new accelerant makes a difference to the time to dry paint, using = 0.05. What is wrong with each conclusion, based on the p-value? Write a correct conclusion for that p-value. (a) p = 0.0214. You conclude, “The accelerant may make a difference, at the 0.05 significance level.” (b) p = 0.0714. You conclude, “The accelerant makes no difference, at the 0.05 significance level.”

6 What can you do to make a Type I error less likely at a given sample size? What’s the unfortunate side effect of that? 7 Explain in your own words the difference between “accept H0 ” (wrong) and “fail to reject H0 ” (correct) when your p-value is > . 8 The engineering department claims that the average battery lifetime is 500 minutes. Write both hypotheses in symbols. 9 Suppose H0 is “the directors are honest” and H1 is “the directors are stealing from the company.” Write conclusions, in Statistics and in English, if … 10 (a) if p = 0.0405 and = 0.01 Are Type I and Type II errors actually mistakes? What one thing can you do to prevent both of them, or at least make them both less likely?

(b) if p = 0.0045 and = 0.01

11

In your hypothesis test, H0 is “the defendant is innocent” and H1 is “the defendant is guilty”. The crime carries the death penalty. Out of 0.05, 0.01, and 0.001, which is the most appropriate significance level, and why?

12

When Keith read the AAA’s statement that 10% of drivers on Friday and Saturday nights are impaired, he believed the proportion was actually higher for TC3 students. He took a systematic sample of 120 students and, on an anonymous questionnaire, 18 of them admitted being alcohol impaired the last Friday or Saturday night that they drove. Can he prove his point, at the 0.05 significance level?

13

In 2006–2008 there was controversy about creating a sewer district in south Lansing, where residents have had their own septic tanks for years. The Sewer Committee sent out an opinion poll to every household in the proposed sewer district. In a letter to the editor, published 3 Feb 2007 in the Ithaca Journal, John Schabowski wrote, in part: The Jan. 4 Journal article about the sewer reported that “only” 380 of 1366 households receiving the survey responded, with 232 against it, 119 supporting it, and 29 neutral. ... The survey results are statistically valid and accurate for predicting that the sewer project would be voted down by a large margin in an actual referendum.

Can you do a hypothesis test to show that more than half of Lansing households in the proposed district were against the sewer project? (You’re trying to show a majority against, so combine “supporting” and “neutral” since those are not against.)

14

Esperanza wanted to determine whether more than 40% of grocery shoppers — specifically, the primary grocery shoppers in their households — regularly use manufacturers’ coupons. She conducted a random telephone survey and contacted 500 people. (For this exercise, let’s assume that telephone subscribers are representative of grocery shoppers.) Of the 500 she contacted, 325 do the grocery shopping in their households. Of those 325, 182 said they regularly use manufacturers’ coupons. (a) What is the size of the sample? (Think before you answer!) (b) What is the population, and how large is it? (c) What does the number 182 represent? (d) Don’t do a hypothesis test. But if you did, what would p o be? (e) Is it a source of bias that she considered only each household’s primary grocery shopper?

15

Doubting Thomas remembered the Monty Hall example from Chapter 5, but he didn’t believe the conclusion that switching doors would improve the chance of winning to 2/3. (It’s okay if you don’t remember the example. All the facts you need are right here.) Thomas watched every Let’s Make a Deal for four weeks. (Though this isn’t a random sample, treat it as one. There’s no reason why the show should operate differently in these four weeks from any others.) In that time, 30 contestants switched doors, and 18 of them won. (a) At the 0.05 significance level, is it true or false that your chance of winning is 2/3 if you switch doors? (b) At the 95% confidence level, estimate your chance of winning if you switch doors. (c) If you don’t switch doors, your chance of winning is 1/3. Using your answer to (b), is switching doors definitely a good strategy, or is there some doubt?

16

Most of us have spam filters on our email. The filter decides whether each incoming piece of mail is spam. Heather trusts her spam filter, and she sets it to just delete spam rather than save it to a folder. (a) What would Heather’s spam filter do if it makes a Type I error? What would it do if it makes a Type II error? (b) Which is more serious here, a Type I error or a Type II error? Should the significance level be set higher or lower?

17

Rosario read in Chapter 6 [URL: https://BrownMath.com/swt/chap06.htm#c06_GeometricDist] that 30.4% of US households own cats. She felt like dogs were a lot more visible than cats in Ithaca, so she decided to test whether the true proportion of cat ownership in Ithaca was less than the national proportion. She took a systematic sample of Wegmans shoppers one day, and during the same time period a friend took a systematic sample of Tops shoppers. (They counted groups shopping together, not individual shoppers, so they didn’t have to worry about getting the same household twice.) Together, they accumulated a sample of 215 households, and of those 54 owned cats. Did she prove her case, at the 0.05 significance level?

Problem Set 2

18

What is wrong with each pair of hypotheses? Correct the error. (a) H0 = 14.2; H1 > 14.2 (b) H0 : µ < 25; H1 : µ > 25 (c) You’re testing whether batteries have a mean life of greater than 750 hours. You take a sample, and your sample mean is 762 hours. You write H0 :µ=762 hr; H1 :µ>762 hr. (d) Your conventional paint takes 4.3 hours to dry, on average. You’ve developed a drying accelerant and you want to test whether adding it makes a difference to drying time. You write H0 : µ=4.3 hr; H1 : µ < 4.3 hr.

19

This year, water pollution readings at State Park Beach seem to be lower than last year. A sample of 10 readings was randomly selected from this year’s daily readings: 3.5 3.9 2.8 3.1 3.1 3.4 3.2 2.5 3.5 3.1 Does this sample provide sufficient evidence, at the 0.01 level, to conclude that the mean of this year’s pollution readings is significantly lower than last year’s mean of 3.8?

20

Dairylea Dairy sells quarts of milk, which by law must contain an average of at least 32 fl. oz. You obtain a random sample of ten quarts and find an average of 31.8 fl. oz. per quart, with SD 0.60 fl. oz. Assuming that the amount delivered in quart containers is normally distributed, does Dairylea have a legal problem? Choose an appropriate significance level and explain your choice.

21

You’re in the research department of StickyCo, and you’re developing a new glue. You want to compare your new glue against StickyCo’s best seller, which has a bond strength of 870 lb/in². You take 30 samples of your new glue, at random, and you find an average strength of 892.2 lb/in², with SD 56.0. At the 0.05 significance level, is there a difference in your new glue’s strength?

22

New York Quick Facts from the Census Bureau (2014b) [see “Sources Used” at end of book] says that 32.8% of residents of New York State aged 25 or older had at least a bachelor’s degree in 2008–2012. Let’s assume the figure hasn’t changed today. You conduct a random sample of 120 residents of Tompkins County aged 25+, and you find that 52 of them have at least a bachelor’s degree. (a) Construct a 95% confidence interval for the proportion of Tompkins County residents aged 25+ with at least a bachelor’s degree. (b) Don’t do a full hypothesis test, but use your answer for (a) to determine whether the proportion of bachelor’s degrees in Tompkins County is different from the statewide proportion, at the 0.05 significance level.

23

You’re thinking of buying new Whizzo bungee cords, if the new ones are stronger than your current Stretchie ones. You test a random sample of Whizzo and find these breaking strengths, in pounds: 679 599 678 715 728 678 699 624 At the 0.01 level of significance, is Whizzo stronger on average than Stretchie? (Stretchies have mean strength of 625 pounds.)

24

For her statistics project, Jennifer wanted to prove that TC3 students average more than six hours a week in volunteer work. She gathered a systematic sample of 100 students and found a mean of 6.75 hours and SD of 3.30 hours. Can she make her case, at the 0.05 significance level?

25 26

As a POW in World War II, John Kerrich flipped a coin 10,000 times and got 5067 heads. At the 0.05 level of significance, was the coin fair?

People who take aspirin for headache get relief in an average of 20 minutes (let’s suppose). Your company is testing a new headache remedy, PainX, and in a random sample of 45 headache sufferers you find a mean time to relief of 18 minutes with SD of 8 minutes. (a) Construct a 95% confidence interval for the mean time to relief of PainX. (b) Don’t do a full hypothesis test, but use your answer for (a) to determine at the 0.05 significance level whether PainX offers headache relief to the average person in a different time than aspirin. Updates and new info: https://BrownMath.com/swt/

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.