Idea Transcript
12/23/2012
The Big Picture
STAT 101 Dr. Kari Lock Morgan 9/13/12
Confidence Intervals: Sampling Distribution
Population
SECTIONS 3.1, 3.2 • Sampling Distributions (3.1) • Confidence Intervals (3.2) Statistics: Unlocking the Power of Data
Sampling
Sample
Statistical Inference Lock5
Statistical Inference
Statistics: Unlocking the Power of Data
Lock5
Statistic and Parameter
Statistical inference is the process of drawing conclusions about the entire population based on information in a sample.
A parameter is a number that describes some aspect of a population. A statistic is a number that is computed from data in a sample. We usually have a sample statistic and want to
use it to make inferences about the population parameter Statistics: Unlocking the Power of Data
Lock5
The Big Picture
Population
Statistics: Unlocking the Power of Data
Lock5
Parameter versus Statistic
Sampling
mu
PARAMETERS
x-bar p-hat
sigma
Sample
STATISTICS
rho
Statistical Inference Statistics: Unlocking the Power of Data
Lock5
Statistics: Unlocking the Power of Data
Lock5
1
12/23/2012
Election Polls
Point Estimate
Over the weekend (9/7/12 – 9/9/12), 1000
registered voters were asked who they plan to vote for in the 2012 presidential election
Point estimates will not match population
What proportion of voters plan to vote for
Obama? 𝑝 = 0.50
We use the statistic from a sample as a point estimate for a population parameter.
parameters exactly, but they are our best guess, given the data
p = ???
http://www.politico.com/p/2012-election/polls/president Statistics: Unlocking the Power of Data
Lock5
Election Polls
Statistics: Unlocking the Power of Data
Lock5
IMPORTANT POINTS
Actually, several polls were conducted over
• Sample statistics vary from sample to sample.
the weekend (9/7/12 – 9/9/12):
(they will not match the parameter exactly) • KEY QUESTION: For a given sample statistic, what are plausible values for the population parameter? How much uncertainty surrounds the sample statistic? • KEY ANSWER: It depends on how much the statistic varies from sample to sample!
http://www.politico.com/p/2012-election/polls/president Statistics: Unlocking the Power of Data
Lock5
Lock5
Sampling Distribution
Reese’s Pieces • What proportion of Reese’s pieces are orange? • Take a random sample of 10 Reese’s pieces • What is your sample proportion? class dotplot
A sampling distribution is the distribution of sample statistics computed for different samples of the same size from the same population. A sampling distribution shows us how the
• Give a range of plausible values for the population proportion
Statistics: Unlocking the Power of Data
Statistics: Unlocking the Power of Data
sample statistic varies from sample to sample
Lock5
Statistics: Unlocking the Power of Data
Lock5
2
12/23/2012
Sampling Distribution
Reese’s Pieces • www.lock5stat.com/statkey
In the Reese’s Pieces sampling distribution, what does each dot represent?
a) One Reese’s piece b) One sample statistic
Statistics: Unlocking the Power of Data
Lock5
Statistics: Unlocking the Power of Data
Lock5
Random Samples
Sample Size Matters!
• If you take random samples, the sampling
As the sample size increases, the variability of the sample statistics tends to decrease and the sample statistics tend to be closer to the true value of the population parameter
distribution will be centered around the true population parameter
For larger sample sizes, you get less variability
• If sampling bias exists (if you do not take random samples), your sampling distribution may give you bad information about the true parameter
in the statistics, so less uncertainty in your estimates Statistics: Unlocking the Power of Data
Lock5
Lincoln’s Gettysburg Address
Statistics: Unlocking the Power of Data
Lock5
Interval Estimate An interval estimate gives a range of plausible values for a population parameter.
Statistics: Unlocking the Power of Data
Lock5
Statistics: Unlocking the Power of Data
Lock5
3
12/23/2012
Margin of Error
Sampling Distribution • We can use the spread of the sampling distribution to determine the margin of error for a statistic
One common form for an interval estimate is statistic ± margin of error where the margin of error reflects the precision of the sample statistic as a point estimate for the parameter. How do we determine the margin of error???
Statistics: Unlocking the Power of Data
Lock5
Margin of Error
Statistics: Unlocking the Power of Data
Lock5
Election Polling
The higher the standard deviation of the sampling distribution, the a) higher b) lower
the margin of error. Why is the margin of error smaller for the
Gallup poll than the ABC news poll? http://www.realclearpolitics.com/epolls/2012/president/us/general_election_romney_vs_obama-1171.html
Statistics: Unlocking the Power of Data
Lock5
Interval Estimate
Statistics: Unlocking the Power of Data
Lock5
Election Polling
Using the Gallup poll, calculate an interval
estimate for the proportion of registered voters who plan to vote for Obama.
Statistics: Unlocking the Power of Data
Lock5
Statistics: Unlocking the Power of Data
Lock5
4
12/23/2012
Confidence Interval
Confidence Intervals
A confidence interval for a parameter is an interval computed from sample data by a method that will capture the parameter for a specified proportion of all samples
www.lock5stat.com/StatKey The parameter is fixed The statistic is random
(depends on the sample) The interval is random
The success rate (proportion of all samples
whose intervals contain the parameter) is known as the confidence level
(depends on the statistic)
A 95% confidence interval will contain the true
parameter for 95% of all samples
Statistics: Unlocking the Power of Data
Lock5
If you had access to the sampling distribution, how would you find the margin of error to ensure that intervals of the form statistic ± margin of error
The standard error of a statistic, SE, is the standard deviation of the sample statistic The standard error can be calculated as the
standard deviation of the sampling distribution
would capture the parameter for 95% of all samples? Lock5
95% Confidence Interval
Statistics: Unlocking the Power of Data
Lock5
Economy A survey of 1,502 Americans in January 2012 found that 86% consider the economy a “top priority” for the president and congress this year.
If the sampling distribution is relatively symmetric and bell-shaped, a 95% confidence interval can be estimated using
The standard error for this statistic is 0.01.
What is the 95% confidence interval for the true proportion of all Americans that considered the economy a “top priority” at that time?
statistic ± 2 × SE
(a) (0.85, 0.87) (b) (0.84, 0.88) (c) (0.82, 0.90) Statistics: Unlocking the Power of Data
Lock5
Standard Error
Sampling Distribution
Statistics: Unlocking the Power of Data
Statistics: Unlocking the Power of Data
Lock5
statistic ± 2×SE 0.86 ± 2×0.01 0.86 ± 0.02 (0.84, 0.88)
http://www.people-press.org/2012/01/23/public-priorities-deficit-rising-terrorismslipping/ Statistics: Unlocking the Power of Data Lock5
5
12/23/2012
Interpreting a Confidence Interval 95% of all samples yield intervals that contain
the true parameter, so we say we are “95% sure” or “95% confident” that one interval contains the truth. “We are 95% confident that the true proportion
of all Americans that considered the economy a ‘top priority’ in January 2012 is between 0.84 and 0.88”
Statistics: Unlocking the Power of Data
Lock5
Reese’s Pieces The standard error for 𝑝, the proportion of orange Reese’s Pieces in a random sample of 10, is closest to a) 0.05 b) 0.15 c) 0.25 d) 0.35
Statistics: Unlocking the Power of Data
Reese’s Pieces
Reese’s Pieces Each of you will create a 95% confidence interval based off your sample. If you all sampled randomly, and all create your CI correctly, what percentage of your intervals do you expect to include the true p? a) 95% b) 5% c) All of them d) None of them Statistics: Unlocking the Power of Data
Lock5
Use StatKey to more precisely estimate the SE Use this estimated SE and your 𝑝 to create a
95% confidence interval based on your data. Come up to the board and draw your interval How many include (our best guess at) the
truth?
Lock5
Statistics: Unlocking the Power of Data
Lock5
Confidence Intervals
Reese’s Pieces
If context were added, which of the following would be an appropriate interpretation for a 95% confidence interval:
Did your 95% confidence interval include the true p? a) Yes b) No
a)“we are 95% sure the interval contains the parameter” b)“there is a 95% chance the interval contains the parameter” c)Both (a) and (b) d)Neither (a) or (b) 95% of all samples yield intervals that contain the true parameter, so we say we are “95% sure” or “95% confident” that one interval contains the truth. We can’t make probabilistic statements such as (b) because the interval either contains the truth or it doesn’t, and also the 95% pertains to all intervals that could be generated, not just the one you’ve created.
Statistics: Unlocking the Power of Data
Lock5
Statistics: Unlocking the Power of Data
Lock5
6
12/23/2012
Summary
Confidence Intervals Sample
Population
• To create a plausible range of values for a parameter:
statistic ± ME
o
Sample
Sample
o
Sample
... Sample
Margin of Error (ME) (95% CI: ME = 2×SE)
Sample
o
Take many random samples from the population, and compute the sample statistic for each sample Compute the standard error as the standard deviation of all these statistics Use statistic 2SE
Sampling Distribution Calculate statistic for each sample
Standard Error (SE): standard deviation of sampling distribution
Statistics: Unlocking the Power of Data
Lock5
Reality
•
One small problem…
Statistics: Unlocking the Power of Data
Lock5
To Do Read Sections 3.1, 3.2
… WE ONLY HAVE ONE SAMPLE!!!!
Do Homework 2 (due Tuesday, 9/18)
• How do we know how much sample statistics vary, if we only have one sample?!? … to be continued Statistics: Unlocking the Power of Data
Lock5
Statistics: Unlocking the Power of Data
Lock5
7