Hypothesis Tests - Mathematical and Computer Sciences - Heriot-Watt ... [PDF]

calculate confidence interval for population mean based on a sample mean from a small sample. ¡ use a paired t test c. ¢

29 downloads 16 Views 708KB Size

Recommend Stories


mathematical sciences
We can't help everyone, but everyone can help someone. Ronald Reagan

Chi-Square Hypothesis Tests
Don't fear change. The surprise is the only way to new discoveries. Be playful! Gordana Biernat

bayesian and frequentist hypothesis tests of heteroscedasticity
Every block of stone has a statue inside it and it is the task of the sculptor to discover it. Mich

school of mathematical sciences - UKM [PDF]
Reilly, F.K. & Brown K.C. 2000. Investment Analysis and Portfolio Management.6th Ed. New York : South-Western Thomson. John L., Donald L., Jerald E. & Dennis W. 2007. Managing Investment Portfolios: A. Dynamic Process.London: John Wiley. STQA6024 Sec

Islamic Mathematical Sciences
You miss 100% of the shots you don’t take. Wayne Gretzky

[PDF] Mathematical Methods in the Physical Sciences
Respond to every call that excites your spirit. Rumi

Computer Sciences Department
Raise your words, not voice. It is rain that grows flowers, not thunder. Rumi

Computer Sciences Department
Almost everything will work again if you unplug it for a few minutes, including you. Anne Lamott

Computer Sciences Department
Before you speak, let your words pass through three gates: Is it true? Is it necessary? Is it kind?

innovative trends in mathematical sciences
Just as there is no loss of basic energy in the universe, so no thought or action is without its effects,

Idea Transcript


1

Topic 5

Hypothesis Tests

Contents 5.1 Introduction to Tests of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . .

2

5.1.1 5.1.2 5.1.3 5.2 Single

. . . .

3 4 5 6

5.3 Single proportion - large samples . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Difference of two means - large samples . . . . . . . . . . . . . . . . . . . . . . 5.5 Difference of two proportions - large samples . . . . . . . . . . . . . . . . . . .

10 12 14

5.6 Small Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Single mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Confidence Intervals with Small Samples . . . . . . . . . . . . . . . . .

17 19 21

5.6.3 Difference of 2 Means from Small Samples . . 5.6.4 Paired t test . . . . . . . . . . . . . . . . . . . . 5.7 The Chi-Squared Distribution . . . . . . . . . . . . . . 5.7.1 Checking for Association - Hair and Eye Colour

. . . .

21 24 26 27

5.7.2 Limitations of Chi-squared test . . . . . . . . . . . . . . . . . . . . . . . 5.7.3 Goodness of Fit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Coursework 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30 32 36

5.9 Summary and assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

Type 1 and 2 Errors . . . . . . One-tailed and two-tailed tests Different Significance Levels . mean - large samples . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

Learning Objectives identify situations in experimentation where a hypothesis test will produce a useful result appreciate the ideas of null and alternative hypotheses use the standardised Normal distribution in hypothesis tests involving large samples use the student’s t distribution in hypothesis tests involving small samples explain Type 1 and Type 2 Errors use the formulae for standard error and test statistic in the cases of

2

a) single mean - large samples b) single proportion - large samples c) difference between two means - large samples d) difference between proportions - large samples e) single mean - small samples



f) difference between two means - large samples



decide when to use a One or Two Tailed Test



appreciate the concept of degrees of freedom



c

calculate confidence interval for population mean based on a sample mean from a small sample use a paired t test

H ERIOT-WATT U NIVERSITY 2003

5.1. INTRODUCTION TO TESTS OF HYPOTHESIS

5.1

3

Introduction to Tests of Hypothesis

In the last Topic it was seen that a sample could be used to infer a confidence interval for the mean of the population that it was taken from. A very useful fact is that this method can be turned on its head and instead of being used to estimate a property of the population, a sample can help prove whether it is likely that a population has a particular mean value (or proportion). In this chapter most of the worked examples will start off by suggesting a hypothesis (assumption) and effectively either proving it to be true or deciding that it is false. Imagine that a company states in its sales pitch that a particular model of its mobile phones lasts for 150 hours before it is required to be next charged up. If you were thinking of buying one, you may like to obtain some proof that this assertion is true. One way of doing this is to take a sample of phones, make a number of measurements and then calculate the mean number of hours between charging. It would be impossible to do this for every phone produced since the population is so large, so the best that can be done is to calculate a sample mean. Suppose that a sample of 40 was taken and this produced a mean value of 147.4 hours. Does this mean that the manufacturer’s claim has been disproved? Clearly 147.4 is less than 150 so it looks as if the manufacturer is over-estimating the time between charging. However, it must be appreciated that this was just one sample; it was shown in the last topic that if another sample was taken it might give a very different result (for example, it could give a value of 152.3 hours, in which case the phones are doing better than the manufacturers claim!). The method of hypothesis testing starts by making an assertion about the population, usually an assumption that the mean is equal to a stated result. In this case it is hypothesised that the population mean, , for the mobile phones is 150 hours. The Central Limit Theorem will next be used, and to do this a value for the standard deviation is required. Assume that in this case the population standard deviation is 12 hours.



From the last topic, 95% of all sample means lie between



 

  

The term (which is the standard deviation of the sample means) is often called the Standard Error (S.E.). In this case it is equal to 1.90. So the upper and lower bounds calculate as 150



3.72.

So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only now that the sample mean value needs to be used - recall that this was calculated as 147.4 hours. This is within the range of values that 95% of sample means are expected to fall between, so it has not been possible to disprove the hypothesis that the mean is 150. In other words there is only a 5% chance that the population mean is not 150. This is known as a significance test with level 0.05. There is no evidence to dispute the manufacturer’s claim at the 5% level. The supposition that the population mean is equal to 150 can be written as H0 :



= 150

This is called the Null Hypothesis.

c

H ERIOT-WATT U NIVERSITY 2003

5.1. INTRODUCTION TO TESTS OF HYPOTHESIS

To decide whether or not this assertion is true, it is necessary to have a comparison with an alternative hypothesis (so that one or the other will be true). This is written as H1 :

 

150

It is usual to then draw a Normal distribution curve and shade in the appropriate significance level (here 5%). The whole calculation can then be expressed more briefly in a diagram as:

Since the sample mean, 147.4, is not in the shaded region H 0 is accepted. There is no evidence at the 0.05 level of significance that the population mean is not 150.

5.1.1

Type 1 and 2 Errors

Since probabilities are used in the hypothesis tests, there is always the chance of an error in the conclusion being made. In the mobile phone example it is only being said that the sample mean value is consistent with a population mean value with 95% confidence. There is a 5% chance that the population mean value is not 150 hours. If the population mean is, in fact, not 150 hours but the hypothesis test resulted in accepting H 0 , it is said that a Type 2 Error has occurred. Conversely, if H 0 is actually true, but the sample mean resulted in it being rejected, it is said that a Type 1 Error has been made. This can be summarised in the table below.

c

H ERIOT-WATT U NIVERSITY 2003

4

5.1. INTRODUCTION TO TESTS OF HYPOTHESIS

Decision Accept H0 Reject H0

5.1.2

State of Nature H0 is true correct decision Type 1 Error

5

H0 is false Type 2 Error correct decision

One-tailed and two-tailed tests

In the mobile phone example, recall that the diagram of the normal distribution curve had a 5% area shaded and this was split between both "tails". This will always be the case when the alternative hypothesis has a "not equal to" sign and is called a two-tailed test (for obvious reasons!).





In some hypothesis tests, the alternative hypothesis is given as " is less than" or " is greater than" some value. In cases like this, only one side of the normal distribution curve is shaded and, not surprisingly, the test is called a one-tailed test. The example will now be re-worked as a one-tailed test. A competing mobile phone manufacturer wishes to prove that the time between charging for his rival’s phone is less than 150 hours. The hypotheses (plural of hypothesis) now become

 H1 : 

H0 :

150 150

The normal distribution curve in this case now has only one side shaded

 ! "$#&%')(*

,+

To calculate the "cut-off" point, this time it is not that is used, but (From tables, the value of 1.64 gives an area under the normal distribution of approximately 0.05, whereas 1.96 gave 0.025)

 ".-$#%')(/

This means that the lower bound is

Cc

10 23+4 ".-5# 9 68:= ?-@"A B !

H ERIOT-WATT U NIVERSITY 2003

5.1. INTRODUCTION TO TESTS OF HYPOTHESIS

6

Using the same sample value as before since the sample mean, 147.4, is not in the shaded region, again the null hypothesis is accepted. There is no evidence at the 0.05 level of significance that the population mean is less than 150.

D

Notice that for one-tailed tests with " " in the alternative hypothesis it is the left hand side of the Normal distribution curve that is shaded, whilst if it is " " in the alternative hypothesis the right hand side is shaded.

5.1.3

E

Different Significance Levels

The significance level of 5% (or 0.05 in decimals) has been used in the mobile phone example. This is a very common value to use but it is not the only one that can be employed. It implies that there is a 5% chance of making a mistake. However, if it is necessary for the margin of error to be less (in medical matters, say) then this can be reduced to 1% or even 0.1% (or, indeed any other value). Changing the significance level will have an effect on the "cut-off" point. For example, in the mobile phone example for a two-tailed test and a significance level of 1%, the upper and lower bounds would be calculated as 150

F

2.58 x S. E. i.e. from 145.10 to 154.90

The lower the significance level, the more difficult it is to prove the alternative hypothesis (which is often what you hope to do). If an alternative hypothesis is proved at the 5% level it is said to be significant; a level of 1% is termed highly significant whilst a 0.1% level is deemed to be a highly significant result. To help in calculations at different significance levels, for the general result the appropriate z values are given in the table below

Sc

H ERIOT-WATT U NIVERSITY 2003

GHFJILKNMPO'Q)R/Q ,

5.2. SINGLE MEAN - LARGE SAMPLES

T

T

U

.50 .45 .40 .35 .30

Z 0.0000 0.1257 0.2533 0.3853 0.5244

.25 .20 .15 .10 .05

0.6745 0.8416 1.0364 1.2816 1.6449

T

U

.020 .019 .018 .017 .016

Z 2.0537 2.0749 2.0969 2.1201 2.1444

.015 .014 .013 .012 .011

2.1701 2.1973 2.2262 2.2571 2.2904

T

U

U

.050 .048 .046 .044 .042

Z 1.6449 1.6646 1.6849 1.7060 1.7279

.030 .029 .028 .027 .026

Z 1.8808 1.8957 1.9110 1.9268 1.9431

.040 .038 .036 .034 .032

1.7507 1.7744 1.7991 1.8250 1.8522

.025 .024 .023 .022 .021

1.9600 1.9774 1.9954 2.0141 2.0335

T

U

T

U

.010 .009 .008 .007 .006

Z 2.3236 2.3656 2.4089 2.4573 2.5121

.050 .010 .001 .0001 .00001

Z 1.6449 2.3263 3.0902 3.7190 4.2649

.005 .004 .003 .002 .001

2.5758 2.6521 2.7478 2.8782 3.0902

.025 .005 .0005 .00005 .000005

1.9600 2.5758 3.2905 3.8906 4.4172

T

5.2

7

: significance level

Single mean - large samples

Hypothesis tests can be carried out on many different types of experimental data but the method of implementation is always the same. The main points to note are that the analysis should always begin by stating the null and alternative hypotheses, an appropriate measure of standard error should then be calculated and finally the sample value should be plotted on the appropriate distribution curve - depending on where it lies the null or alternative hypothesis will be accepted. Comparisons with the Normal distribution curve are only valid if the sample size is greater than 30; when this is the case the sample is categorised as large. Small samples will be considered later. The formula for the Standard Error in problems involving one large sample comes straight from the Central Limit Theorem given earlier.

V'W)X*WAY [ Z \ ]c

H ERIOT-WATT U NIVERSITY 2003

5.2. SINGLE MEAN - LARGE SAMPLES

Examples 1. The time between server failures in an organisation is recorded for a sample of 32 failures and the mean value calculates as 992 hours. The organisation works on the assumption that the mean time between server failures is 1000 hours with a standard deviation of 20. Is it justified to use this figure of 1000 hours? Use a significance level of 0.05. H0 :

^

= 1000

^` _ 1000 a'b)c*b ` g dfh e `ji b k i l d

H0 :

This is a two-tailed test with 2.5% shaded on each side of the Normal distribution so the cut-off points are given by 1000 1.96 x 3.536 , i.e. 993.070 and 1006.930.

m

This is shown on the diagram below, together with the sample mean of 992.

Since the sample mean is in the shaded area, the null hypothesis is rejected and so the alternative hypothesis accepted. This means that there is evidence at the 5% level that the population mean is not 1000, so the organisation might like to review their specification for the server which, in fact, is performing better than they indicate. It is often the case when performing hypothesis tests that a test statistic is calculated from the sample value and this is compared with the standardised normal curve. This is doing exactly the same thing that was shown in Chapter 2 when converting Normal distributions into a form that could be compared with the tables. In this case, the test statistic is

n ` o us prt vwq t

This gives z = -2.26 This is now compared with the standardised Normal curve

xc

H ERIOT-WATT U NIVERSITY 2003

8

5.2. SINGLE MEAN - LARGE SAMPLES

9

It is clearly seen that the test statistic falls in the shaded area so H 1 is accepted as before. The two previous diagrams show that the two methods are identical but simply involve considering different scales. 2. It is suspected that in a particular experiment the method used gives an underestimate of the boiling point of a liquid. 50 determinations of the boiling point of water were made in an experiment in which the standard deviation was known to be 0.9 degrees C. The mean value is calculated to be 99.6C. The correct boiling point of water is 100 degrees C. Use a significance level of 0.01. Since it would be desirable to prove that the population mean is less than 100, it is sensible to use a one-tailed test with alternative hypothesis 100. So the hypotheses become

y{z

y| H0 : yz H0 :

100 100

The standard error is

~ } $€jƒ‚…„1†u‡

Test statistic = 99.6 - 100/0.127 = -3.15 The standardised normal distribution curve gives a z value of -2.33 for an area of 0.01. Thus the diagram, with test statistic marked in, has the appearance:

ˆc

H ERIOT-WATT U NIVERSITY 2003

5.2. SINGLE MEAN - LARGE SAMPLES

10

Since the test statistic is in the shaded region, H 1 is accepted. There is evidence at the 1% level that the population mean is less than 100. It is logical to assume, then, that the method of the experiment is underestimating the boiling point. In the above examples the standard deviation of the population mean was known. Often this will not be the case so as long as the samples are large ( 30) it is acceptable to estimate this value by using the sample values (as was done in Topic 3 with the confidence intervals).

‰

Hypothesis testing Q1: A particular questionnaire is designed so that it can be completed in 2 minutes. Over a number of days a researcher measures the time taken by everyone who fills in the form. The results are given in the table below (and can be downloaded here). Take a random sample and carry out a hypothesis test to check whether the 2-minute expected completion time is valid. Times are given in minutes. 2.44 2.71 2.46 2.53 1.76 2.52 2.60 2.26 1.87 1.80

Šc

2.20 2.48 2.14 2.71 1.89 2.97 2.32 1.46 2.16 2.16

1.49 2.99 2.95 2.33 2.54 1.37 2.01 2.04 2.21 1.93

2.39 1.92 2.19 2.12 2.62 0.91 1.25 1.53 2.19 2.29

H ERIOT-WATT U NIVERSITY 2003

2.59 2.59 1.67 2.12 1.86 2.99 1.79 1.78 2.08 2.49

2.63 3.21 2.31 2.08 2.05 2.87 1.84 1.87 2.24 1.26

2.20 1.92 2.38 1.73 1.03 2.22 2.03 1.98 2.40 2.34

2.66 1.73 1.52 2.41 2.49 2.58 2.10 1.72 1.73 2.12

5.3. SINGLE PROPORTION - LARGE SAMPLES

2.16 1.60 1.90 2.04 1.77 2.02 2.19 1.49 1.94 2.29

2.12 2.23 1.88 1.75 1.81 2.41 2.21 2.03 2.96 2.12

5.3

1.73 1.99 2.19 1.06 1.77 1.66 2.54 1.69 2.26 1.13

1.81 2.80 2.22 1.95 1.50 1.83 2.38 1.97 1.99 2.22

11

1.83 2.25 2.56 2.35 2.67 2.57 2.32 2.58 2.23 2.12

2.29 1.97 1.50 2.06 2.23 1.95 2.07 2.42 2.39 1.99

1.51 1.91 1.44 2.59 2.58 1.88 2.57 2.19 2.04 2.04

Single proportion - large samples

It was shown in Topic 3 that sample proportions also follow the theory of the Central Limit Theorem. The standard deviation of the proportions (which will now be referred

‹ ŒŽf‘rŒ ’ “

”

to as the Standard Error) was given by the formula , where is the population proportion and n is the sample size (again considered to be 30). Hypothesis tests can be carried out in much the same way as before.

•

Examples 1. A survey of the first beverage that residents of the UK take when they waken up in the morning has shown that 17% have a cup of tea. It is thought that this figure might be higher in the county of Yorkshire, so a random sample of 550 Yorkshire residents is questioned and out of that number 115 said they had tea first thing. Using a significance level of 0.05, test the idea that the tea figure is higher in Yorkshire. The population proportion is thought to be 17% (or 0.17 as a decimal) so this is the figure that must be used in the hypotheses (like in the "mean" case, where it was always the population mean that was mentioned in the null hypothesis). Since it is hoped that it can be proved that the Yorkshire figure is higher than average, the alternative hypothesis 0.17. must have the form,

”,•

The hypotheses are therefore:

”,– H1 : ”,•

H0 :

0.17 0.17

‹ ŒŽf‘rŒ ’ ‹ › œž Ÿ¡f‘¢› ›œž Ÿf’ f£ £ š¤ƒ˜)¤¦¥1§ ¤ “ š

Now calculate the Standard Error (S.E.) In this case,

—'˜)™/˜ƒš

‘rŒ ¨ ª P is the population proportion, in this case 115/550 or 0.209 š ©› « œ œ­f¬w›f®œ ,‘¢where  › ž œ    ¯ Ÿ › ¨ š › œ ›8°f› šj±A˜ž² ²

The test statistic here is comparable to the one for means.

The standardised Normal curve can be drawn as before, with the value of 2.33 being used as the cut-off point for 1% (or 0.01).

³c

H ERIOT-WATT U NIVERSITY 2003

5.3. SINGLE PROPORTION - LARGE SAMPLES

Since the test statistic for the sample proportion is in the shaded area, there is enough evidence to reject the null hypothesis and accept the alternative one. In other words, Yorkshire folk drink more tea than the National average (using a significance level of 1%). Note that it is harder to prove a fact using a significance level of 1% than it is for 5%, so it can be said that this is a highly significant result. 2. In an ESP test, a subject has to identify which of the five shapes appears on a card. In a test consisting of 100 cards, would you be fairly convinced that a subject does better than just guessing if he gets 30 correct? Test at 1% significance level and at 0.1% significance levels. If he just guesses the proportion of times he would get the answer right is 1/5 = 0.2. So it is hoped to prove that the sample corresponds to a population proportion greater than 0.2. The hypotheses are therefore:

´,µ H1 : ´,¶

H0 :

0.2 0.2

Now calculate the Standard Error (S.E.)

·'¸)¹/¸ƒº¼» ½Ž¾¿f Àr½ Á º¼» ÃÄ Å¾¿8¿fÃfÀ¢Ã ÃÄ Å

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.