Introduction to Statistical Methods for Data Analysis - CERN Indico [PDF]

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics. • Probability definition ... Statistical Methods for

9 downloads 4 Views 5MB Size

Recommend Stories


Untitled - CERN Indico
Everything in the universe is within you. Ask all from yourself. Rumi

Statistical Methods for Sequence Data
If you want to go quickly, go alone. If you want to go far, go together. African proverb

[PDF] An Introduction To Statistical Methods And Data Analysis R. Lyman Ott, Micheal T. Longnecker
No amount of guilt can solve the past, and no amount of anxiety can change the future. Anonymous

[PDF] An Introduction To Statistical Methods And Data Analysis R. Lyman Ott, Micheal T. Longnecker
Ask yourself: Am I a better person today, than I was yesterday? Next

14.30 Introduction to Statistical Methods in Economics
Where there is ruin, there is hope for a treasure. Rumi

Statistical Models for Data Analysis
The wound is the place where the Light enters you. Rumi

PDF Statistical Methods for Geography
Be like the sun for grace and mercy. Be like the night to cover others' faults. Be like running water

14.30 Introduction to Statistical Methods in Economics
Do not seek to follow in the footsteps of the wise. Seek what they sought. Matsuo Basho

Introduction to NGS data analysis
The beauty of a living thing is not the atoms that go into it, but the way those atoms are put together.

A Brief Introduction to Statistical Shape Analysis
Kindness, like a boomerang, always returns. Unknown

Idea Transcript


Introduction to Statistical Methods for Data Analysis

Dr Lorenzo Moneta CERN PH-SFT CH-1211 Geneva 23

sftweb.cern.ch root.cern.ch

1

Outline • • • • • •

Probability definition Probability Density Functions Some typical distributions Bayes Theorem Parameter Estimation Hypothesis Testing

Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

2

References • A lot of the material for this introduction to statistical methods is extracted from a course: –Statistical Methods for Data Analysis
 (Luca Lista, INFN Napoli)

–Material available also in his book • Statistical Methods for Data Analysis in Particle Physics (Springer) – http://www.springer.com/us/book/9783319201757

• Other suggested book is –Data Analysis in High Energy Physics (Wiley) Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

3

Definition Of Probability • Two main different definitions: –Frequentist • Probability is the ratio of the number of occurrences of an event to the total number of experiments, in the limit of very large number of repeatable experiments. • Can only be applied to a specific classes of events (repeatable experiments) • Meaningless to state: “probability that the lightest SuSy particle’s mass is less tha 1 TeV” 


–Bayesian • Probability measures someone’s the degree of belief that 
 something is or will be true: would you bet? • Probability measures someone’s the degree of belief that 
 something is or will be true: would you bet? – Probability that Barcelona will win the next Champion League Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

4

Classical Probability • Assume all accessible cases are equally probable • Valid on discrete cases only –Problem in continuous cases (definition of metrics)

Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

5

Binomial Distribution • Distribution of number of successes on N trials –e.g. spinning a coin or a dice N times

• Each trial has a probability p of success

• • • •

Average: = Np Variance: -2 = Np(1-p) Used for efficiency In ROOT is available as ROOT::Math::binomial_pdf(n,p,N)

Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

6

Frequentist Probability • Law of large numbers

• this means also that

• circular definition of probabilities – a phenomenon can be proven to be random only if we observe infinite cases Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

7

Conditional Probability • Probability of A, given B : P(A|B) – probability that an event known to belong to set B is also member of set A – P(A | B) = P(A ∩ B) / P(B) – A is independent of B if
 the conditional probability 
 of A given B is equal to the
 probability of A: • P(A | B) = P(A)

– Hence, if A is independent on B • P(A | B) = P(A) P(B)

– If A is independent on B, B is independent on A Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

8

Prob. Density Functions (PDF)

Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

9

Gaussian (Normal) Distribution

• Average = µ • Variance = σ2 • Widely used
 because of the
 central limit theorem TMath::Gaus(x, μ, σ,true) ROOT::Math::normal_pdf( x, σ, μ ) TF1 f(“f”,”gausn”,xmin,xmax); x = gRandom->Gaus(μ, σ);

PDF(x)

Gaussian PDF

µ=0 σ=0.3 µ=0 σ=1

1.2

µ=0 σ=3

1

µ=-2 σ=1

0.8 0.6

0.4 0.2 0 −5

−4

−3

−2

−1

0

1

2

3

4

5 x

N.B. “gausn” for a normalised (PDF) Gaussian Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

10

Central limit theorem • Sum of n random variables xn converges to a Gaussian, irrespective of the original distributions of the variables xn 
 (only some basic regularity conditions must hold) – ∑xn → Gaussian – Example adding n flat distributions for n = 2 (x is uniform in [0,10])

for n = 5 (x is uniform in [0,10])

χ2 / ndf = 422.9 / 97

220

Constant

200

Mean

180

190.8 ± 2.3

χ 2 / ndf

300

Constant

4.989 ± 0.022

250 Sigma

160

87.47 / 83 306.4 ± 3.7

Mean

5.011 ± 0.013

Sigma

1.293 ± 0.009

2.031 ± 0.015

140

200

120 150

100 80

100 60 40

n=2

20 0 0

Lorenzo Moneta CERN PH-SFT

50

1

2

3

4

5

6

7

8

9

10

0 0

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

n=5 1

2

3

4

5

6

7

8

9

10

11

Uniform (“flat”) distribution

• Standard Deviation

• Model for position of rain drops, time of cosmic ray passage, etc.. • Basic distribution for pseudo-random number generation ROOT::Math::uniform_pdf( x, a, b) x = gRandom->Uniform(a, b);

Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

12

Cumulative Distribution • Given a PDF f(x) the cumulative is defined as

• The PDF for F is uniform distributed in [0,1]

• Inverting the cumulative distribution one can generate pseudo-random numbers according to any distribution Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

13

Example of Cumulative Distributions 0.4

• Probability density function – ROOT::Math::normal_pdf(x,σ,μ)

0.35

normal_pdf

0.3 0.25 0.2 0.15 0.1 0.05

• Cumulative distribution and its complement (right tail integral) – ROOT::Math::normal_cdf(x,σ,μ)

p

0−5

– ROOT::Math::normal_quantile(p,σ) – ROOT::Math::normal_quantile_c(p,σ)

−2

−1

0

1

2

3

4

x5

1

normal_cdf

0.6

normal_cdf_c

0.4 0.2 −4

−3

−2

−1

0

1

2

3

4

x5

0.8

0.9

p1

x

0−5

3

normal_quantile

2

normal_quantile_c

1 0 −1 −2 −3 0

Lorenzo Moneta CERN PH-SFT

−3

0.8

– ROOT::Math::normal_cdf_c(x,σ,μ)

• Inverse of the cumulative distributions (quantile distributions)

−4

0.1

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

0.2

0.3

0.4

0.5

0.6

0.7

14

Poisson Distribution • Probability to have n entries in x a subset of X >> x

• Limit of binomial distribution when
 p = x/X = 𝜈/N

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.