Statistics for Molecular Medicine -- Statistical inference and statistical ... [PDF]

Medical University Innsbruck, Division of Genetic Epidemiology. The principles of statistical testing: Formulating Hypot

14 downloads 65 Views 843KB Size

Report

Download PDF

PNG Network

Recommend Stories

Statistical inference and resampling statistics

I tried to make sense of the Four Books, until love arrived, and it all became a single syllable. Yunus

Statistical Inference for Networks

You often feel tired, not because you've done too much, but because you've done too little of what sparks

Statistical Inference

Courage doesn't always roar. Sometimes courage is the quiet voice at the end of the day saying, "I will

Statistical Inference

Almost everything will work again if you unplug it for a few minutes, including you. Anne Lamott

FOUNDATIONS OF STATISTICAL INFERENCE

Suffering is a gift. In it is hidden mercy. Rumi

Essentials of Statistical Inference

What we think, what we become. Buddha

Noise Reduction and Statistical Inference

Your task is not to seek for love, but merely to seek and find all the barriers within yourself that

statistical models and causal inference

Knock, And He'll open the door. Vanish, And He'll make you shine like the sun. Fall, And He'll raise

Introduction to Statistical Inference

Ask yourself: When was the last time I did something nice for others? Next

Official Statistics and Statistical Ethics

The happiest people don't have the best of everything, they just make the best of everything. Anony

Idea Transcript

Statistics for Molecular Medicine -- Statistical inference and statistical tests-Barbara Kollerits & Claudia Lamina Medical University Innsbruck, Division of Genetic Epidemiology

Molekulare Medizin SS 2016

The principles of statistical testing:

Formulating Hypothesis & Test-statistics & p-values

1

Formulating Hypothesis & Statistical Tests Steps in conducting a statistical test:

■ Quantify the scientific problem from a clinical / biological perspective ■ Formulate the model assumptions (distribution of the variable of interest) ■ Formulate the problem as a statistical testing problem: Nullhypothesis versus alternative hypothesis

■ Define the „error“ you are willing to tolerate ■ Calculate the appropriate test statistic ■ Decide for the nullhypothesis or against it

Formulating Hypothesis & Statistical Tests Hypothesis Formulation:

■ Null hypothesis H0: The conservative hypothesis you want to reject ■ Alternative Hypothesis H1: The hypothesis you want to proof ■ Examples: Scientific hypothesis: A new therapy is assumed to better prevent myocardial infarctions in risk patients than the old therapy.

Scientific hypothesis: Women and men achieve equally good scores in the EMS-AT test

Statistical hypothesis: H0: new ≥ old H1: new < old

Statistical hypothesis: H0: men=women H1: men≠women

with new : the proportion of patients experiencing a MI during the study receiving the new therapy old : the proportion of patients experiencing a MI during the study receiving the old therapy

with men : mean scores for men women : mean scores for women

2

Formulating Hypothesis & Statistical Tests Possible decisions in statistical tests: Decide for Reality

H0

H1

H0

Correct decision

Wrong decision: Type I error ()

H1

Wrong decision: Type II error ()

Correct decision: Power (1-)

■ Type I and Type II error cannot be minimized simultaneously ■ Statistical tests are constructed in that way, that the probability of a Type I

error is not bigger than the significance level (typically set to 0.01 or 0.05)

Example: ■ Test the new MI-therapy on patients to a significance level of 5%. ■ In reality, H0 is true and there is no difference between therapies. ■ If the study is repeated 100 times on 100 different samples, the statistical test rejects the Nullhypothesis in maximum 5 of the100 tests.

Formulating Hypothesis & Statistical Tests The One-sample test of the mean: Gauß-Test (also called z-Test)

■ ■ ■

Situation: Compare the sample mean (sample) with a specified mean (0) Assumption: normal distribution of the sample Hypothesis: H0: sample= 0 versus H1: sample≠0

Example: From a former population-based sample, you know, that the mean of the nonfasting cholesterol level was 230. Now, you have finished the measurements in your new sample. Since you want to conduct a study on cholesterol levels, that is comparable to the old study, you want to test, if the mean on the new study is equal to the mean in the old study:

One-sample test of the mean: H0: sample= 230 vs. H1: sample ≠ 230 3

Formulating Hypothesis & Statistical Tests The One-sample test of the mean: Gauß-Test (also called z-Test)

■ ■ ■

Situation: Compare the sample mean (sample) with a specified mean (0) Assumption: normal distribution of the sample Hypothesis: H0: sample= 0 versus H1: sample≠0

Assuming normal distribution & assuming H0 is true:  

Standardization Test-statistic: If the test-statistic is “too extreme”, it is no very likely, that H0 is true  Reject H0 , of T is “too extreme”

Formulating Hypothesis & Statistical Tests ■ You cannot avoid a Type I error, but you can control it , since it can only occur, if H0 is real. Since H0 follows a distribution, the probability of each value given H0 can be calculated The probability distribution given H0 ~ N(0,1):

Rejection region

Acceptance region /2

/2

Area = 1-

-z1-/2

Rejection region

z1-/2

■ All values falling in the rejection region ( z1-/2) do not support the null hypothesis (significance level ).

4

Formulating Hypothesis & Statistical Tests ■ You cannot avoid a Type I error, but you can control it , since it can only occur, if H0 is real. Since H0 follows a distribution, the probability of each value given H0 can be calculated The probability distribution given H0 ~ N(0,1):

critical value

Rejection region

Acceptance region /2

/2

Area = 1-

-z1-/2

Rejection region

z1-/2

■ All values falling in the rejection region ( z1-/2) do not support the null hypothesis (significance level ).

Formulating Hypothesis & Statistical Tests The One-sample test of the mean revisited: Gauß-Test (also called z-Test)

■ ■ ■

Situation: Compare the sample mean (sample) with a specified mean (0) Assumption: normal distribution of the sample Hypothesis: H0: sample= 0 versus H1: sample≠0

Assuming normal distribution & assuming H0 is true  

Test-statistic:

Test decision: |T| > z1-/2 : Reject H0  Test is „significant“ to  5

Formulating Hypothesis & Statistical Tests

■

Test decision: |T| > z1-/2 : Reject H0  Test is „significant“ to 

Attention: Rejection of H0 is not a decision for H1, but a decision against H0, since no distribution can be specified, if H1 is really true.

■

The (1-/2)- Quantiles can be found in tables or can be calculated by computers, e.g. the 97.5%-Quantile of the normal distribution used for a 5%significance test (two-sided) is ~1.96

■

As for confidence intervals: If is not known & the sample size is not large enough  estimate by S

Formulating Hypothesis & Statistical Tests

■

■ ■ ■ ■

So far a statistical test gives you a test statistic. If the test statistic is more extreme than the critical values, decide against the nullhypothesis. If the test statistic is smaller than the critical values, decide for the nullhypothesis  simple yes/no decision rule This decision rule does not give you the certainty / strength of the decision Assume two two-sided z-Tests to =5%, one gives T=2 the other T=2.6 Both teststatistics are > z1-/2 (~1.96)  both are „significant“ However, 2.6 is „more extreme“ than 2, given the truth of the Nullhypothesis!

-2.6

-2-2

2 2 2.6

6

Formulating Hypothesis & Statistical Tests

■

The probability to estimate an effect, that is as extreme as the observed effect or even more extreme, under the assumption, that the null-hypothesis (= no association) is true

■

For a two sided test, the area under the curve on both tails of the distribution function have to be added

~0.025

~0.025

~0.005

-2.6

~0.005

-2

2 2.6

Formulating Hypothesis & Statistical Tests

■

The probability to estimate an effect, that is as extreme as the observed effect or even more extreme, under the assumption, that the null-hypothesis (= no association) is true

■

For a two sided test, the area under the curve on both tailes of the distribution function have to be added

P-value ~ 0.05 ~ ~ 0.025

~ ~ 0.025

~0.005

-2.6

~0.005

-2

2 2.6

7

Formulating Hypothesis & Statistical Tests

■

The probability to estimate an effect, that is as extreme as the observed effect or even more extreme, under the assumption, that the null-hypothesis (= no association) is true

■

For a two sided test, the area under the curve on both tailes of the distribution function have to be added

P-value ~ 0.05

P-value ~ 0.01 ~ ~ 0.025 v

~ ~ 0.025

~0.005

-2.6

~0.005

-2

2 2.6

Formulating Hypothesis & Statistical Tests

■

The P-value p is a measure of certainty against the nullhypothesis.

Example: A one sample z-Test comparing the sample mean to 0: H0: sample= ; H1: sample≠  results in a test statistic T=2.6, which corresponds to a p-value of 0.01. A popular interpretation, but wrong: „The probability, that the sample mean is different from 0 is 1%“ The sample mean does not have a probability. It is 0 or not! Correct interpretation: „A different random sample is drawn 100 times from the population of interest. The population mean is 0 (=Nullhypothesis). Maximum 1 of the 100 experiments results in a teststatistic, which is ≥ |2.6|“  The randomness lies in the sample! If p z1-/2 p z1-/2 : 6.3 > 1.96

 reject H0

p < 2.97e-10 < 0.05  reject H0 : 230 not in [234.89, 239.31]  reject H0  Sample mean significantly different from specified mean

Formulating Hypothesis & Statistical Tests: Exercise The manufacturer of a laboratory measurement device claims that one measurement will take 5 minutes on average. You want to test that statement with 10 measurements. (Mean = 5.3, standard deviation = 0.3) What hypothesis do you want to test ? H0:

versus H1: Test decision:

Determine the test-statistic T = What is the critical value (significance level 5%) ?

Quantiles of the standard normal distribution:

p

0.95

0.975 0.995

zp

1.64

1.96

2.58 10

The most common statistical tests:

Testing measures of location

The most common statistical tests Quantitative / Continuous outcome variable

Qualitative / Categorical outcome variable

Normal distribution

Any other distribution

Expected frequency in each cell of the crosstable „high“

Expected frequency in each cell of the crosstable „low“

Compare 2 groups

t-test

Wilcoxon-test / Mann-Whitney UTest

Chi-Square

Fishers exact test

Compare >2 groups

Analysis of Variance (ANOVA)

Kruskal-WallisTest

Chi-Square

Fishers exact test

Testing measures of location: Does the mean/median differ between groups

Testing frequencies in a crosstable: Are the rows and columns independent from each other? 11

Testing measures of location The One-sample t-test (the “standard test” for mean comparisons): ■ Situation: Compare the sample mean (sample) with a specified mean (0) ■ Assumption: normal distribution of the sample, is not known ■ Hypothesis: H0: sample= 0 versus H1: sample ≠ 0 ■ Teststatistic:

■ Test decision for a two sided test: |T| > tn-1,1-/2 : Reject H0 ■ Test decision for a one sided test: |T| > tn-1,1- : Reject H0 ■ The Gauß-Test is the general form of the t-test, if  were known ■ The t-test approximates the Gauß-Test ■ Quantiles of the t-distribution needed to decide for or against the Nullhypothesis ■ But in practice: Statistical programs give out p-values

Testing measures of location Comparing the normal and t-distribution:

Quantiles t-distribution:

0.95-quantile: critical value for a one-sided test to the significance level 5%, n=16 Example: t0.95(15) = 1.753 0.975-quantile: critical value for a two-sided test to the significance level 5%, n=823 Example: t0.975(822) ~ 1.96 Quantiles of the standard normal distribution

df (= n‐1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 120 ∞

0.95 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.658 1.645

0.975 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 1.980 1.960

12

Testing measures of location Determine the following critical values:

One-sided or twosided test

Significance level 

n

two

0.05

>> 120

one

0.05

>> 120

two

0.05

15

one

0.05

28

two

0.01

10

one

0.01

20

two

0.01

>> 120

Critical value

df (= n‐1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 120 ∞

0.95 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.658 1.645

0.975 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 1.980 1.960

0.99 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.358 2.326

0.995 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.617 2.576

Testing measures of location The two-sample t-test for unpaired samples: ■ Situation: Compare the means (1, 2) of two unpaired samples ■ Assumption: normal distribution of both samples, is not known Here: Equal  assumed, but there are methods (Welch t-test) for unequal  Hypothesis: ■ Teststatistic:

with the pooled variance ■ Test decision for a two sided test: |T| > tn1+n2-2,1-/2 : Reject H0 ■ Test decision for a one sided test: |T| > tn1+n2-2,1- : Reject H0 13

Testing measures of location Example: A biotech company claims that their new biomarker XY can distinguish diseased from non-diseased; A pilot study on 10 diseased and 10 healthy persons gives the following results: Labparameter XY in Diseased

Labparameter XY in Healthy

8.70

3.36

11.28

18.35

13.24

5.19

8.37

8.35

12.16

13.1

11.04

15.65

10.47

4.29

11.16

11.36

4.28

9.09

19.54

(missing)

X

11.024

9.86

S2

15.227

27.038

_

Testing measures of location Example: A biotech company claims that their new biomarker XY can distinguish diseased from non-diseased; A pilot study on 10 diseased and 10 healthy persons gives the following results: Labparameter XY in Diseased

Labparameter XY in Healthy

8.70

3.36

11.28

18.35

13.24

5.19

8.37

8.35

12.16

13.1

11.04

15.65

10.47

4.29

11.16

11.36

4.28

9.09

19.54

(missing)

X

11.024

9.86

S2

15.227

27.038

_

s2 = (9*15.227 + 8*27.038)/17 = 20.78512

T=

(11.024 – 9.86)_____ ________________ __ = 0.556

√(20.78512*(1/10+1/9))

~t(17)

Critical value of a t(17)-distribution to α = 5% (two-sided test) = 2.11 0.556 < 2.11  XY does not differ between diseased and non-diseased P-value = 0.29 14

Testing measures of location: Exercise Patient ID

Physically active

Cholesterol level

1

0

195

2

0

159

3

0

166

4

0

244

5

0

169

6

0

168

7

0

222

8

0

238

9

0

216

10

0

180

11

1

146

12

1

145

13

1

147

14

1

208

15

1

182

16

1

145

17

1

187

18

1

206

19

1

218

1

161

20

Comparing the cholesterol levels between physically active (=1) and inactive (=0) patients: Meaninactive =

Meanactive =

What hypothesis do you want to test ? H0:

versus H1:

The following information is given: Sinactive= 31.94 ; Sactive= 29.27 Pooled S2 = 938.48 Test-statistic T = Critical value (two-sided test,  = 5%):  Test decision ?

Testing measures of location The two-sample t-test for paired samples:

■

Situation: Compare the means of two paired samples, e.g. compare the means of variables in the same patients before a treatment and after the treatment

■ ■

Assumption: normal distribution of both samples, is not known Hypothesis: H0: before= after versus H1: before≠after Calculate d = xbefore-xafter for each patient  new Hypothesis: H0: The mean of the difference is 0: d =  versus H1: The mean of the difference is ≠ 0: d ≠ 

 Same situation as the one-sample t-test T-tests can also be used approximatively for any distribution, that is not too skewed. 15

Testing measures of location Example: A wannabe health guru claims, that he has invented the perfect weight loss method; A pilot study on 10 obese individuals gives the following results: ID

kg at baseline

kg after 6 months

Difference

1

108

90

18

2

97

97

0

3

88

91

-3

4

120

111

9

5

98

94

4

6

95

91

4

7

87

82

5

8

85

77

8

9

99

103

-4

_

134

127

7

X

101.1

96.3

4.8

S_2

242.767

209.122

41.07  s = 6.41

10

Testing measures of location Example: A wannabe health guru claims, that he has invented the perfect weight loss method; A pilot study on 10 obese individuals gives the following results: Paired t-test = one-sample t-test on the difference:

ID

kg at baseline

kg after 6 months

Difference

1

108

90

18

2

97

97

0

3

88

91

-3

T = √10*(4.8 – 0)/6.41 = 2.368

4

120

111

9

5

98

94

4

t0.975(9) = 2.262 (two-sided)

6

95

91

4

7

87

82

5

8

85

77

8

9

99

103

-4

Since you want to prove, that kg(before)>kg(after):

_

134

127

7

one-sided test more appropriate (more power)

X

101.1

96.3

4.8

S2

242.767

209.122

41.07  s = 6.41

 t0.95(9) = 1.833 , p = 0.021 (=0.5*two-sided p)

10

H0 can be rejected (p = 0.042)

An unpaired t-test would have yielded nonsignificant results: p = 0.24 (one-sided test) 16

Testing measures of location Analysis of Variance (ANOVA)

■ ■ ■

Situation: Compare the means of k samples (k>2) Assumption: normal distribution of the population,  =…= k Hypothesis: H0: 1= 2 =… = k versus H1: i ≠ j (i ≠ j): At least two of the means differ

How to construct the teststatistic ?

Testing measures of location Analysis of Variance (ANOVA)

■ ■ ■

Situation: Compare the means of k samples (k>2) Assumption: normal distribution of the population,  =…= k Hypothesis: H0: 1= 2 =… = k versus H1: i ≠ j (i ≠ j): At least two of the means differ

Idea from the two-sample t-Test: Difference /Variability between the groups Relate the variability between the groups to the variability within the groups Variability within the group 17

Testing measures of location Analysis of Variance (ANOVA): Variance partitioning

■ ■ ■

There are k groups (j=1,…k) X1j, X2j,… Xnjj are the observed values of the variable of interest for i=1,…nj patients in the jth group is the overall mean of the variable,

are the means within the groups

SST= Sum of squares Total

SSE/SSR

SSM

SSE= Sum of squares Error or SSR = Sum of squares Residual ~ Variance „within“

Group 1 Group 2 Group 3

| | |

1

| | |

2

| | | all observations xij

3

SSM= Sum of squares Model ~ Variance „between“

Testing measures of location Analysis of Variance (ANOVA): Variance partitioning

SST= Sum of squares Total SSE= Sum of squares Error or SSR = Sum of squares Residual ~ Variance „within“

SSM= Sum of squares Model ~ Variance „between“

Mean of Squares:

18

Testing measures of location ■ Test statistic: ■ Test decision for a two sided test: F > Fk-1, n-k, 1- : Reject H0 ■ Since F is always positive, there are no one-sided tests ■ If H0 is rejected, you can tell, that there are at least two groups, which differ from each other significantly. You can‘t tell, which groups differ!

 perform pairwise t-tests after overall F-Test (see Closed test procedure) Example: There are 3 different medications (Med1, Med2, Med3), which are intended to increase the HDL-cholesterol levels in patients 1. perform ANOVA as an overall test, if there is a difference between the groups 2. If the F-Test was significant, you know, that there is a difference 3. Test Med1 against Med2, Med1 against Med3, Med2 against Med3

Testing measures of location All tests so far assumed a normally distributed variable  parametric tests If the assumption does not hold  nonparametric tests Parametric Tests

Nonparametric Tests

T-Test

• Wilcoxon-Test • Wilcoxon rank-sum test • Mann-Whitney U-Test

Different names for the same test

Testing, if two independent samples come from the same distribution  Testing equality of medians ANOVA

Kruskal-Wallis-Test: Testing equality of population medians between groups

Characteristics of nonparametric tests:

• Robust against outliers and skewed distributions • However: Parametric tests should be preferred over nonparametric

test, if appropriate, since they have the higher power.

19

Testing measures of location Two sample test on equality of distributions: Wilcoxon Test

■

Situation: Compare location measures of two unpaired samples X and Y, if the assumption of a t-test does not hold

■

Assumption: the form of the continuous distributions of the variables X and Y is the same  test on equality of distributions = test on equality of the medians

■ ■ ■

Hypothesis: H0: xmed= ymed versus H1: xmed ≠ ymed Test is based on the ranks What are ranks?

39

Testing measures of location Example: Wilcoxon Test Original values: X={1,2,4,6,9,9,11}; xmed =6; Y={1,3,4,5,6,7,8}; ymed =5

■ ■ ■

Sort both variables into one: 1/1,2,3,4/4,5,6/6,7,8,9/9,11 Ranking: 1.5,1.5,3,4,5.5,5.5,7,8.5,8.5,10,11,12.5,12.5,14 A teststatistic is calculated using these ranks by the computer !

Efficiency of Wilcoxon-Test (Kruskal-Wallis-Test) compared to t-test (ANOVA):

■ ■

To achieve the same power as a t-test/ANOVA, a higher sample size is needed! The parametric tests are in general more powerful (if their assumptions are fulfilled)

20

The most common statistical tests:

Testing frequencies

Testing frequencies One sample test on frequencies: -square goodness-of-fit test for categorical traits: Compare the frequencies of a categorical variable to specified proportions

■

Example: The Quality manager of a gummi bear production fabric claims, that the 5 favours do not have the same proportions in each package as claimed.  proportions under H0 1=2=3=4=5=1/5.

■

Hypothesis: H0: P(X=i) = i versus H1: P(X=i) ≠ i for one i at least, i=1,..k number of categories

■

Idea: Compare the observed numbers (O) in each category (hi) with the expected numbers (E=n i) in each category

■

Assumption: ni≥1 for all i & ni≥5 for at least 80% of the categories

Teststatistic:

Test decision for a two sided test:

: Reject H0

42

21

Testing frequencies Excursion: What are degrees of freedom (df)? Definition: The number of values that can vary freely. Example: The -square goodness-of-fit test with k categories has (k-1) degrees of freedom. Why? There is the constraint, that all proportions sum up to 1:

1, 2,…k-1 can be chosen freely. k is then fixed Example with real data: suppose you have three groups, then 1 = 0.20, 2 =0.50  3 =1-0.70 (=0.30)

43

Testing frequencies Two sample test on frequencies: -test of independence:

■

Situation: Compare the frequencies between two or more groups Or: Test, if two categorical variables X (i=1,…k) and Y (j=1,…m) depend on each other

All situations you can group into contingency tables

Y

X

1

…

m

Row sum

1

h11

…

h1m

h1.

2

h21

…

h2m

h2.

:

:

:

:

k

hk1

hkm

hk.

h.m

n

Column sum h.1

■

…

A possible scenario: Compare the number of smokers, ex-smokers and neversmokers between men and women

44

22

Testing frequencies Two sample test on frequencies: -test of independence:

■

Hypothesis: H0: X and Y are independent from each other H1: X and Y are dependent from each other (are associated)

■

Assumption: expected frequencies ≥1 for all & expected frequencies ≥5 for at least 80% of the cells  none of the cells should have a very rare expectancy  if assumption is not fulfilled  use Fishers exact test Idea to construct the teststatistic: Compare the observed numbers in each cell with the expected numbers, if H0 and therefore independence of the two factor variables is assumed

45

Testing frequencies Table of observed numbers Y

X



1

… m

1

h11

… h1m h1.

2

h21

… h2m h2. :

k 

h1. …hk. , h.1 …h.m are the margin probabilities

:

hk1

… hkm hk.

h.1

h.m

n

Current Smoker

ExSmoker

Never Smoker

Row Total

Men

144

310

268

722

Women

117

143

475

735

Column Total

261

453

743

1457

Smoking status Gender

46

23

Testing frequencies Table of expected numbers: Y

X

…

m

1

h1.h.1/n

…

h1.h.m/n

h1.

2

h2.h.1/n

…

h2.h.m/n

h2.

: k

hk.h.1/n

 Smoking status

Current Smoker

ExSmoker

Never Smoker

Row Total



1

…

h.1

:

hk.h.m/n

hk.

h.m

n

Expected number in each cell: (Row sum * Columns sum)/ Total sum

Gender Men

144

310

268

722

Women

117

143

475

735

Column Total

261

453

743

1457

Expected number in the upper left cell: 722*261/1457 = 129.336

Testing frequencies Observed:

Expected: Y

X

Y

… m

1

h11

… h1m h1.

2

h21

… h2m h2. :

k 



1

:

hk1

… hkm hk.

h.1

h.m

Oij

n

X

…

m

1

h1.h.1/n

…

h1.h.m/n

h1.

2

h2.h.1/n

…

h2.h.m/n

h2.

: k 



1

hk.h.1/n h.1

…

:

hk.h.m/n

hk.

h.m

n

Eij

Teststatistic:

Test decision:

: Reject H0

48

24

Testing frequencies Example: Observed:

Expected: Current Smoker

ExSmoker

Never Smoker

Row Total

Men

129.336

224.479

368.185

722

735

Women

131.664

228.521

374.815

735

1457

Column Total

261

453

743

1457

Current Smoker

ExSmoker

Never Smoker

Row Total

Men

144

310

268

722

Women

117

143

475

Column Total

261

453

743

Smoking status

Smoking status Gender

Gender

= (144-129.336)2/129.336 + (310- 224.479)2/224.479 + (268- 368.185)2/368.185 + + (117- 131.664)2/131.664 + (143- 228.521)2/228.521 + (475- 374.815)2/374.815 = 121.9218

((2-1)*(3-1)) = (2) = 5.99  121.9218 >> 5.99  test is significant (p = 3.3e-27)  the Null-Hypothesis, that gender and smoking status are independent can be rejected  the test itself does not tell you, however, if men smoke more than women etc.

Testing frequencies Two sample test on frequencies, if the expected number of cells is rare: Fishers exact test

■

Situation: If the assumptions of a Chi-square-Test do not hold (number of expected values in each cell ≥ 1 and number of expected values ≥ 5 in 80% of cells)

■

Idea of the test: Simulate all tables that are possible with the observed margin probabilities. Then, count all the tables that are „more extreme“ in the opposite direction of the null-hypothesis than the observed table.  p = number of more extreme tables / number of all tables  computer intensive & not really solvable „by hand“

■

Test decision: p < : Reject H0

25

Testing frequencies : Exercise In an experiment with mice, you want to test the development of insulin resistance in three different diets (D1, D2, D3). After following up the mice for several months, you observe the following: Diet D1 insulin resistant 6 not insulin resistent 4 Σ 10

Σ D2 1 9 10

D3 3 7 10

10 20 30

Calculate the expected numbers in each cell under the assumption of independence between diet and insulin resistance: Which diet results Diet Σ in more insulin D1 D2 D3 resistent mice than insulin resistant 10 not insulin resistent 20 expected, which in Σ 10 10 10 30 less? Which statistical test should be performed?

The multiple testing problem

26

The multiple testing problem The situation: Consider a dataset with 100 independent parameters, which do not play a role in the etiology of the disease of interest (what you don‘t know, of course)  100 statistical tests are performed with a significance level of =0.05  The tests are constructed in that way, that maximum 5 of 100 tests reject the null hypothesis, although it is true

The multiple testing problem The situation:

■

Consider a dataset with 100 independent parameters, which do not play a role in the etiology of the disease of interest (what you don‘t know, of course)

 100 statistical tests are performed with a significance level of =0.05  The tests are constructed in that way, that maximum 5 of 100 tests reject the null hypothesis, although it is true

 You expect 5 tests to be significant just by chance !!!

54

27

The multiple testing problem

■

The probability to get at least one Type I error increases with increasing number of tests.

■

Family-wise error rate (the error rate for the complete family of tests performed): *=1-(1-)k, with  being the comparison-wise error rate k

* (=0.05)

1

0.05

5

0.226

10

0.401

100

0.994

The probability to get one or more false discoveries (Type I error)

 The significance level has to be modified for multiple testing situations

55

The multiple testing problem The Bonferroni correction method:

■ ■

Control the comparison-wise error rate: Reject H0, if p <  Control the family-wise error rate (including k tests): Reject H0, if p < /k This is equivalent to pBonferroni=p*k <  Advantage: simple

■

Problem: Bonferroni-correction increases the probability of a type II error  the power of detecting a true association is reduced  Disadvantage: too conservative

■

k

/k (=0.05)

1

0.05

5

0.01

10

0.005

100

0.0005

0.05/5=0.01

Can be used, if tests are dependent, but is just too conservative for this case

56

28

The multiple testing problem Some modifications of the Bonferroni Method: Bonferroni-Holm-method

■

You have a list of k p-values  sort it, so that the minimal p-value (p(1)) comes first: p(1), p(2), p(3), p(4), ……. p(k)

■

If p(1) ≥ /k  stop and accept all null hypotheses If p(1) < /k  reject H0(1) and continue with the reduced list: p(2), p(3), p(4), ……. p(k)

■

If p(2) ≥ /(k-1)  stop and accept all remaining null hypothesis If p(2) < /(k-1)  reject H0(2) and continue with the reduced list: p(3), p(4), ……. p(k)

■ ■ ■

Continue, until the hypothesis with the smallest p-value cannot be rejected stop and accept all hypotheses that have not been rejected before

Less conservative than Bonferroni Can also be used, if tests are dependent There are multiple other methods, e.g. Hochberg-method etc.

In R: function

57 p.adjust()

The multiple testing problem Example: Assume you have this p-value vector (10 independent tests to =0.05): „significant“, if multiple testing is not accounted for

Unadjusted p:

0.0001, 0.001, 0.002, 0.005, 0.007, 0.01, 0.012, 0.02, 0.04, 0.1

With Bonferroni: 0.0001, 0.001, 0.002, 0.005, 0.007, 0.01, 0.012, 0.02, 0.04, 0.1 With B.-Holm:

0.0001, 0.001, 0.002, 0.005, 0.007, 0.01, 0.012, 0.02, 0.04, 0.1

E.g: Bonferroni.: 0.05 /10 = 0.005 all p values smaller than 0.005 significant E.g: B.-Holm.: First p value (0.0001): 0.0001 < (0.05/10=0.005), continue! Next p value (0.001): 0.001 < (0.05/9=0.006), continue! …until: p value (0.007): 0.007< (0.05/6=0.008), stop, because: next p value (0.01): 0.01 ≥ (0.05/5=0.01). All these methods are still too conservative if tests are not independent (e.g. highly correlated parameters)!

29

Multiples Testing: Exercise You conduct an experiment with knock-out mice and compare the mean values of 5 different parameters between wildtype (WT) and knockout (KO) mice. In the following list, the p-values for all t-tests are given. Para meter

Uncorrected p-value

1

0.04

2

0.001

3

0.012

4

0.015

5

0.1

Significant to α = 5% ?

Significant after BonferroniCorrection ?

Significant after Bonferroni-Holm Correction ?

Significance level after Bonferroni correction? α = Conclusion:

The multiple testing problem The closed testing principle

■ ■ ■

In cases, where a set of hypotheses can be tested simultaneously („overall“ or „omnibus tests“) Suppose there are 3 hypotheses H1,H2,H3 to be tested: Then H1 can be rejected at level , if all intersections including H1 can be rejected, i.e. H1 ∩ H2 ∩ H3, H1 ∩ H2, H1 ∩ H3 and H1 can all be rejected at level  Example: There are 4 different medications (Med1, Med2, Med3, Med4), which are intended to lower the HDL-cholesterol levels in patients 1. perform ANOVA as an overall test, if there is a difference between the groups 2. If the F-Test was significant, perform an ANOVA on all possible intersections 3. If there was any significant F-Test, perform pairwise t-tests for all Medications included in this significant F-Test, etc…

■

Attention: It is not sufficient (if there are more than 3 groups) to test first an ANOVA and then pairwise t-tests, if the ANOVA was significant  test also all intersections in between

60

30

The multiple testing problem

■

Example: Testing the following hypotheses:

1 = 2 = 3 = 4

1. Step ANOVA:

sign.level 

Significant 2. Step ANOVA:

1 = 2 = 3 1 = 2 = 4 1 = 3 = 4 2 = 3 = 4

for all intersections 3. Step pairwise t-tests

1 = 2

sign.level 

1 = 3 2 = 3

1 = 4 2 = 

sign.level 

Significant to family-wise error rate *=

The multiple testing problem

■

Example: Testing the following hypotheses:

1 = 2 = 3 = 4

1. Step ANOVA:

sign.level 

Significant 2. Step ANOVA:

1 = 2 = 3 1 = 2 = 4 1 = 3 = 4 2 = 3 = 4

for all intersections 3. Step pairwise t-tests

1 = 2

sign.level 

1 = 3 2 = 3

Significant to family-wise error rate *=

1 = 4 2 = 

sign.level 

But: Very complicated to perform, if there are more than 3 groups  See e.g. Tukeys Test (in R) 31

Statistics for Molecular Medicine -- Statistical inference and statistical ... [PDF]

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch