Introduction to Statistical Methods for Data Analysis - CERN Indico [PDF]

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics. â¢ Probability definition ... Statistical Methods for

9 downloads 4 Views 5MB Size

Report

Download PDF

PNG Network

Recommend Stories

Untitled - CERN Indico

Everything in the universe is within you. Ask all from yourself. Rumi

Statistical Methods for Sequence Data

If you want to go quickly, go alone. If you want to go far, go together. African proverb

[PDF] An Introduction To Statistical Methods And Data Analysis R. Lyman Ott, Micheal T. Longnecker

No amount of guilt can solve the past, and no amount of anxiety can change the future. Anonymous

[PDF] An Introduction To Statistical Methods And Data Analysis R. Lyman Ott, Micheal T. Longnecker

Ask yourself: Am I a better person today, than I was yesterday? Next

14.30 Introduction to Statistical Methods in Economics

Where there is ruin, there is hope for a treasure. Rumi

Statistical Models for Data Analysis

The wound is the place where the Light enters you. Rumi

PDF Statistical Methods for Geography

Be like the sun for grace and mercy. Be like the night to cover others' faults. Be like running water

14.30 Introduction to Statistical Methods in Economics

Do not seek to follow in the footsteps of the wise. Seek what they sought. Matsuo Basho

Introduction to NGS data analysis

The beauty of a living thing is not the atoms that go into it, but the way those atoms are put together.

A Brief Introduction to Statistical Shape Analysis

Kindness, like a boomerang, always returns. Unknown

Idea Transcript

Introduction to Statistical Methods for Data Analysis

Dr Lorenzo Moneta CERN PH-SFT CH-1211 Geneva 23

sftweb.cern.ch root.cern.ch

1

Outline • • • • • •

Probability definition Probability Density Functions Some typical distributions Bayes Theorem Parameter Estimation Hypothesis Testing

Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

2

References • A lot of the material for this introduction to statistical methods is extracted from a course: –Statistical Methods for Data Analysis  (Luca Lista, INFN Napoli)

–Material available also in his book • Statistical Methods for Data Analysis in Particle Physics (Springer) – http://www.springer.com/us/book/9783319201757

• Other suggested book is –Data Analysis in High Energy Physics (Wiley) Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

3

Definition Of Probability • Two main different definitions: –Frequentist • Probability is the ratio of the number of occurrences of an event to the total number of experiments, in the limit of very large number of repeatable experiments. • Can only be applied to a specific classes of events (repeatable experiments) • Meaningless to state: “probability that the lightest SuSy particle’s mass is less tha 1 TeV”  

–Bayesian • Probability measures someone’s the degree of belief that   something is or will be true: would you bet? • Probability measures someone’s the degree of belief that   something is or will be true: would you bet? – Probability that Barcelona will win the next Champion League Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

4

Classical Probability • Assume all accessible cases are equally probable • Valid on discrete cases only –Problem in continuous cases (definition of metrics)

Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

5

Binomial Distribution • Distribution of number of successes on N trials –e.g. spinning a coin or a dice N times

• Each trial has a probability p of success

• • • •

Average: = Np Variance: -2 = Np(1-p) Used for efficiency In ROOT is available as ROOT::Math::binomial_pdf(n,p,N)

Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

6

Frequentist Probability • Law of large numbers

• this means also that

• circular definition of probabilities – a phenomenon can be proven to be random only if we observe infinite cases Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

7

Conditional Probability • Probability of A, given B : P(A|B) – probability that an event known to belong to set B is also member of set A – P(A | B) = P(A ∩ B) / P(B) – A is independent of B if  the conditional probability   of A given B is equal to the  probability of A: • P(A | B) = P(A)

– Hence, if A is independent on B • P(A | B) = P(A) P(B)

– If A is independent on B, B is independent on A Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

8

Prob. Density Functions (PDF)

Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

9

Gaussian (Normal) Distribution

• Average = µ • Variance = σ2 • Widely used  because of the  central limit theorem TMath::Gaus(x, μ, σ,true) ROOT::Math::normal_pdf( x, σ, μ ) TF1 f(“f”,”gausn”,xmin,xmax); x = gRandom->Gaus(μ, σ);

PDF(x)

Gaussian PDF

µ=0 σ=0.3 µ=0 σ=1

1.2

µ=0 σ=3

1

µ=-2 σ=1

0.8 0.6

0.4 0.2 0 −5

−4

−3

−2

−1

0

1

2

3

4

5 x

N.B. “gausn” for a normalised (PDF) Gaussian Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

10

Central limit theorem • Sum of n random variables xn converges to a Gaussian, irrespective of the original distributions of the variables xn   (only some basic regularity conditions must hold) – ∑xn → Gaussian – Example adding n flat distributions for n = 2 (x is uniform in [0,10])

for n = 5 (x is uniform in [0,10])

χ2 / ndf = 422.9 / 97

220

Constant

200

Mean

180

190.8 ± 2.3

χ 2 / ndf

300

Constant

4.989 ± 0.022

250 Sigma

160

87.47 / 83 306.4 ± 3.7

Mean

5.011 ± 0.013

Sigma

1.293 ± 0.009

2.031 ± 0.015

140

200

120 150

100 80

100 60 40

n=2

20 0 0

Lorenzo Moneta CERN PH-SFT

50

1

2

3

4

5

6

7

8

9

10

0 0

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

n=5 1

2

3

4

5

6

7

8

9

10

11

Uniform (“flat”) distribution

• Standard Deviation

• Model for position of rain drops, time of cosmic ray passage, etc.. • Basic distribution for pseudo-random number generation ROOT::Math::uniform_pdf( x, a, b) x = gRandom->Uniform(a, b);

Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

12

Cumulative Distribution • Given a PDF f(x) the cumulative is defined as

• The PDF for F is uniform distributed in [0,1]

• Inverting the cumulative distribution one can generate pseudo-random numbers according to any distribution Lorenzo Moneta CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

13

Example of Cumulative Distributions 0.4

• Probability density function – ROOT::Math::normal_pdf(x,σ,μ)

0.35

normal_pdf

0.3 0.25 0.2 0.15 0.1 0.05

• Cumulative distribution and its complement (right tail integral) – ROOT::Math::normal_cdf(x,σ,μ)

p

0−5

– ROOT::Math::normal_quantile(p,σ) – ROOT::Math::normal_quantile_c(p,σ)

−2

−1

0

1

2

3

4

x5

1

normal_cdf

0.6

normal_cdf_c

0.4 0.2 −4

−3

−2

−1

0

1

2

3

4

x5

0.8

0.9

p1

x

0−5

3

normal_quantile

2

normal_quantile_c

1 0 −1 −2 −3 0

Lorenzo Moneta CERN PH-SFT

−3

0.8

– ROOT::Math::normal_cdf_c(x,σ,μ)

• Inverse of the cumulative distributions (quantile distributions)

−4

0.1

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

0.2

0.3

0.4

0.5

0.6

0.7

14

Poisson Distribution • Probability to have n entries in x a subset of X >> x

• Limit of binomial distribution when  p = x/X = 𝜈/N

Introduction to Statistical Methods for Data Analysis - CERN Indico [PDF]

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch