on Confidence Intervals [PDF]

It depends on the underlying distribution ... Confidence intervals use point estimates and an estimate of .... With larg

0 downloads 4 Views 163KB Size

Report

Download PDF

PNG Network

Recommend Stories

Confidence Intervals

Goodbyes are only for those who love with their eyes. Because for those who love with heart and soul

Confidence Intervals

You can never cross the ocean unless you have the courage to lose sight of the shore. Andrè Gide

Confidence Intervals: Sampling Distribution [PDF]

Sep 13, 2012 - IMPORTANT POINTS. â¢ Sample statistics vary from sample to sample. (they will not match the parameter exactly). â¢ KEY QUESTION: For a given sample statistic, what are plausible values for the population parameter? How much uncertain

Bootstrap confidence intervals

I tried to make sense of the Four Books, until love arrived, and it all became a single syllable. Yunus

Confidence Intervals for Proportions

Be like the sun for grace and mercy. Be like the night to cover others' faults. Be like running water

Estimating and Finding Confidence Intervals

If you are irritated by every rub, how will your mirror be polished? Rumi

Hypothesis Testing and Confidence Intervals

Where there is ruin, there is hope for a treasure. Rumi

Chebyshev's, CLT, and Confidence Intervals

If you are irritated by every rub, how will your mirror be polished? Rumi

Confidence Intervals for Population Forecasts

Be who you needed when you were younger. Anonymous

Scientific Evidence and Confidence Intervals

No matter how you feel: Get Up, Dress Up, Show Up, and Never Give Up! Anonymous

Idea Transcript

Estimating with Confidence, Part II

Review • We use y-bar to estimate a population mean, µ. • When sampling from a population with true mean µ, the true mean of the distribution of ybar is µ. • On the average, the mean of means from larger samples should be closer to the true mean than the mean of the means from smaller samples

• When sampling from a population with true standard deviation σ, the standard deviation of the distribution of y-bar is

• Lastly, one nearly magical property of the sample mean, y-bar, is that it is normally distributed no matter what the original distribution of y. • That is, as long as certain conditions are met, either: – the original distribution is normal, – or if the sample size is large.

Sample Size • The rule of thumb is-in most practical situations-n =30 is satisfactory. • As a practical matter though, if the original distribution is severely non-normal then it may take much more 30 samples to assure us that the sample mean will be normally distributed.

Central Limit Theorem • More formally, what we've been discussing is the implications of the Central Limit Theorem. • The CLT is the only theorem we'll cover in BST 621 (because it's that important).

CLT • Draw a simple random sample of size n from any population whatsoever with mean µ and finite standard deviation σ. • When n is large, the sampling distribution of the sample mean y-bar is approximately normally distributed with mean µ and standard deviation σ/√n (Daniel, p.134)

• It's not surprising that when sampling from a normal population the means will be normally distributed. • It's far more useful to know that no matter what the underlying distribution is, your means will be normally distributed, as long as you have sufficient n. • How large an n is required? It depends on the underlying distribution, but the rule of thumb is 30.

• However, this theorem can not save us from an ill-conceived sampling methodology. • That is, if we draw a simple random sample then we can trust that the CLT will hold. • Say we didn't do a simple random sample; are we in trouble? We're not in great danger if the data can plausibly be thought of as observations taken at random from a population. • If the data are representative, we're probably OK.

• However, there is no way to rescue a study using data collected haphazardly. • The data will have unknown biases and no fancy formula can rescue badly produced data. • Garbage in, garbage out

• Let’s assume the data are representative. • So far, our estimation methods have resulted in point estimates. • Confidence intervals are even more useful.

Confidence Intervals • Confidence intervals use point estimates and an estimate of dispersion to form interval estimates. • Recall that estimating a parameter with an interval involves three components:

CIs 1. The point estimate of µ . This is the sample mean y-bar. 2. When the population standard deviation is known to be σ, the standard error of y-bar is σ/√n. 3. The reliability coefficient, we use the 100(1- α)% z value

Reliability Coefficients • For 90% confidence, use z = 1.645. • For 95% confidence, use z = 1.96. • For 99% confidence, use z = 2.575.

General Form Estimate ± (reliability coefficient) x (standard error) This will yield two values, a lower limit and an upper limit, around the point estimate. The confidence interval will, with specified reliability, contain the true (unknown) population mean.

Known Variance • So, if we know the population standard deviation, σ , then a 95% confidence interval for the population mean is:

Examples • In our example population the known σ is 45.9194. • Using a sample of size n = 9, the first simulated experiment yielded a y-bar of 217.6: • [187.6, 247.6] • Notice that this interval covers the true mean of 205.7

Problem • There is a major problem with this method for calculating confidence intervals. • It requires knowledge of the population standard deviation, σ .

Unknown Variance • In practice, we never know σ . • The obvious solution is to use the estimated standard deviation, s, we determined from our sample. • But this does not work. The problem is that the reliability coefficient (1.96) is wrong. • It's wrong because now there are two random terms entering into the confidence interval, y-bar and s. • Both of these are subject to random fluctuation.

Solution • Gosset, a statistician who worked at the Guinness brewery, figured out the solution to this problem: the t-distribution. • But to keep from getting fired, he had to publish the work under a pseudonym "Student." • Thus, you may have seen a reference to "Student's t." • The t-distribution is very close to the z but the t distribution has wider tails, reflecting the extra variability ignored by z.

• The degrees of freedom for the tdistribution when estimating a single mean is df = n - 1. • It's no accident that this is the denominator used to calculate s, the estimated standard deviation.

New CI • So, the correct formula for the 100(1- α)% confidence interval on a population mean when estimating both the mean and standard deviation is:

• In Appendix Table E, Daniel gives the appropriate t-values for various df. • If you use this table, you want to use the value labeled t.975 for a 95% CI. • That is for a 95% CI, α = 0.05; so, (1 - α /2) = 0.975.

• Notice as n gets larger the t value gets closer to the z value.

Using JMP • JMP automatically calculates the 95% confidence interval on the mean and shows it in the Distribution of Y report window. • For instance, the Moments report from the first n = 9 cholesterol sample. • The 95% confidence interval from this sample is [173.4, 261.7].

Sample Size and Confidence • A 95% confidence interval implies that we're 95% sure that the interval covers the true (but unknown) mean. • On the other hand, it also means that 5% of the intervals we calculate will not cover the true mean. • This is true whether we use n = 2 or n = 2,000,000.

• What changes with sample size is the width of the interval. • With larger sample sizes the width of the interval is narrower; we're still going to be wrong 5% of the time but by narrower amounts. • Let's look at confidence intervals when the population is not normal

CI's for Triglyceride • Now let's look at 95% confidence intervals using the triglyceride population – the nonnormal population. • Just as before, we simulate 100 studies, each with a different sample size.

• Notice how much more variable the widths are. • The first sample's y-bar estimate was 164.6 and estimated standard deviation s = 101.2. • The second sample's y-bar estimate was 323.2 and estimated standard deviation s = 383.3. • With larger estimates, you're seeing the effect that an outlier can have.

• With larger n, the intervals are narrower. • The effect of outliers is diminished. • Here, we have sufficient sample size to trust to the Central Limit Theorem.

Summary • Sample estimates have distributions that are affected by the underlying distribution and sample size. • Estimates may be totally worthless if obtained from a haphazard "sample" with unknowable bias. • But, if the data are representative of the population then we can rely on the sample mean to estimate the center of the distribution. • The sample mean is unbiased.

Summary (cont) • Further, if the population is known to be normal, then a sample mean will also be normal. • If the population distribution has an unknown shape then, with a sufficient sample size, we can rely on the CLT and trust that a sample mean will also be normally distributed.

Assessing Normality • Use the Normal Quantile Plot in JMP to assess whether a distribution appears normal.

SD versus SE • The standard error of the sample mean will be smaller with larger samples. • The standard error describes the variability of the sample mean. • Not the variability of the sample data. • If the variability of the data is σ , then the standard error of the mean is σ/√n .

CIs • The confidence interval on the population mean, obtained from a sample of n observations is:

• Here, y-bar is the sample mean, s is the sample standard deviation, and the t-reliability coefficient is the (1 - α/2) percentile of the tdistribution with df = n - 1. • When describing a confidence interval in a sentence or table, be sure to indicate the level of confidence and the sample size.

• Always be aware that the shape of the underlying distribution and the size of your sample will directly affect the believability of your point- and interval-estimates

Example write ups • In the case where you judge that the distribution is markedly non-normal (skewed), say we begin with the following raw data:

• Since the sample was small and the distribution was skewed the distribution of the sample is described by the median and range: • A random sample of n = 20 subjects was assessed for serum triglycerides. The median triglyceride was 115 and the values ranged between 31 and 755. Half of the values were between 91.25 and 195.0.

Another example

• One example write-up: A random sample of n = 20 subjects was assessed for serum cholesterol. The average cholesterol was 201.8, SD = 53.25. We are 95% confident that the range 176.8-226.7 includes the population mean.

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch