Asif Tamuri EMBL-EBI Molecular Phylogenetics Course Agricultural [PDF]

Yang, Z. & Nielsen, R. (1998) Journal of Molecular Evolution 46:409-418. ..... Ch. 5 in Statistical Methods in Molec

0 downloads 6 Views 3MB Size

Report

Download PDF

PNG Network

Recommend Stories

Molecular Phylogenetics of Crassulaceae

Goodbyes are only for those who love with their eyes. Because for those who love with heart and soul

Molecular Phylogenetics and Evolution

If you want to go quickly, go alone. If you want to go far, go together. African proverb

ASIF ALI

Live as if you were to die tomorrow. Learn as if you were to live forever. Mahatma Gandhi

ASİF ATA

You can never cross the ocean unless you have the courage to lose sight of the shore. Andrè Gide

ASİF ATA

Don't count the days, make the days count. Muhammad Ali

ASIF type C

Almost everything will work again if you unplug it for a few minutes, including you. Anne Lamott

ASIF type C

Forget safety. Live where you fear to live. Destroy your reputation. Be notorious. Rumi

Molecular Phylogenetics and Evolution of Maternal Care in Membracine Treehoppers

Love only grows by sharing. You can only have more for yourself by giving it away to others. Brian

Molecular phylogenetics of the bee-eaters (Aves: Meropidae)

Courage doesn't always roar. Sometimes courage is the quiet voice at the end of the day saying, "I will

Molecular evolution, phylogenetics and biogeography in southern hemispheric bryophytes with

And you? When will you begin that long journey into yourself? Rumi

Idea Transcript

Hypothesis Testing to Detect Positive Selection Asif Tamuri EMBL-EBI Molecular Phylogenetics Course Agricultural Institute of Slovenia, Ljubljana

Outline • Introduction • Codon substitution models & ω • Tests - Lineage-specific adaptation - Site-specific adaptation - Branch-site models • Final words

Natural selection • Phenotypic evolution

driven mostly by selection

• Fitness of the phenotype/ morphological trait

Mutations and fitness fWT = 1 fMUT = 1 + s s = fMUT − fWT

s

0 <

s> s≈0

S = 2Ns

0

Neutral theory

Neutral

Neutral Nearly Neutral 0.5

Mutation

Neutral : Adaptive 999 : 1

4

3

2

Mutation

5

Neutral : Adaptive 0.5 Neutral 999 :: 1Adaptive : Adaptive 0.4 0.5 Deleterious 0.3 0.2 0.1

1

0

0

1

0 0

10

20

Substitution

Substitution

Neutral : Adaptive 24 : 1

999 : 1 9:1

0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0

Neutral : Adaptive : Adaptive 1.5 1.5Neutral Deleterious 1.5 24 24 : 1: 1 : Adaptive 3:1 1 1 1

0.5

0.5 0.5

Nearly Neutral

Nearly Neutral Adaptive

Adaptiv

0.5

Deleterious : Adaptive 0.5 Deleterious :9Adaptive :1 0.5 Neutral : Adaptive 0.4 9:1 9:1

0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 00 0

1.5 1.5 1.5 1

0.5

N

0.4 0.3 0.2 0.1 0

Deleterious : Adaptive Deleterious : Adaptive Neutral3: :Adaptive 1:1 3 1: 4

N

1.5

1

1

0.5 0.5 0.5

0.5

1

0 0 0 00 000 –30 20 –10 10 –20 –10 –20 30 –30 –30 –10 –10 0 –20 –20 –30 –20 –10 00 0 10 10 20 20 30 30–30 –20 –10 30 –30

Ns

A

Ns

Ns

0 20 –10 10 30 –30 –20 10 –20 2020 3030 –30 10

0

–10

Fig. 8.1 Distributions of fitness effects (Ns) among mutations and substitutions under neutral, nearly neutral, and ad Akashi, H. (1999). Gene, 238(1), 39–51. ularamong evolution. One of possible distribution is shown for each model, area under the summing to 1. Ta ness effects (Ns) mutations and substitutions under neutral, nearly neutral, andthe adaptive models ofcurves molecFig. 8.1 Distributions fitness effects (Ns) among mutations and with substitutions under neutral, nearly neutral,

Sequence diversity •

What is the underlying cause of the sequence diversity we observe?

•

Natural selection

•

-

Most mutations are deleterious Substitution are adaptive

Neutral theory

-

Most mutations are deleterious Most substitutions are nearly neutral Fate of neutral mutations is determined by drift

Molecular adaptation • Defensive immunity • Evading defensive immunity

• Reproduction • Digestion • Many more... Yang, Z., & Bielawski, J. P. (2000). Trends in ecology & evolution, 15(12), 496–503. Pedersen et al. (2012) BMC Genomics 2012, 13:694!

Synonymous and nonsynonymous substitutions

Measuring selection using ω

•d

S (Ks)

site

: synonymous substitutions per

•d

N (KA):

nonsynonymous substitutions per site

•ω=d

N/

dS : nonsynonymous/ synonymous rate ratio

Measuring selection using ω •

•

Amino acid change:

-

neutral: fixed at the same rate as a synonymous mutations, ω = 1.

-

deleterious: purifying selection will reduce its fixation rate, ω < 1.

-

selective advantage: fixed with a higher rate than a synonymous mutations, ω > 1.

ω significantly higher than one is evidence for diversifying selection.

Codon substitution rates Synonymous transition Synonymous transversion Nonsynonymous transition Nonsynonymous transversion

Transition: Purine Purine (A Pyrimidine

GTG ATG TTG

G)

Pyrimidine (C

Adapted from Yang (2006)

T)

CTT

CTC

CTG

CTA CCG

CGG CAG

Basic model of codon substitution if i and j differ at more than one position,

0, j,

qij

for synonymous transversi on,

j,

for synonymous transition ,

j,

for nonsynonym ous transversi on, j,

for nonsynonym ous transition.

• κ: transition/transversion rate ratio • π : biased codon usage for codon j j

Goldman and Yang (1994) Mol Biol Evol 11:725-736

Codon frequencies Fequal

1/61

F1x4

f(C) f(T) f(G)

F3x4

f1(C) f2(T) f3(G)

F61

f(CTG)

ML estimation of dN and dS • Transition probabilities P(t) = {p (t)} = exp(Qt) • Imagine two sequences: - Probability of a site with ij

codons i and j: πi pij(t)

-

Given t, κ and ω, calculate the log-likelihood = log{πi pij(t)}

-

Maximise

i

t

j

Maximum likelihood estimation lnL

#

"

Selective pressure on phylogenies • Adaptive evolution is difficult to detect using only two sequences

• It averages selective pressure over entire evolutionary history & over all sites

• To increase power of the method, allow selection pressure to vary over sites or branches of multiple sequences on a phylogeny

Lineage-specific adaptation •

Most selection pressure is episodic, happening at some particular moment in evolutionary history

•

A lineage that underwent Darwinian selection may have a dN/dS rate different from other lineages or greater than one.

•

Models allowing variable dN/dS ratios among lineages

-

Ratio for few lineages different from background Ratio of lineage of interest is greater than 1 Variable among evolutionary lineages

Yang, Z. (1998). Mol Biol Evol, 15(5), 568–573. Yang, Z. & Nielsen, R. (1998) Journal of Molecular Evolution 46:409-418.

Branch model • •

H0: Same selective pressure across entire tree. H1: Selective pressure change in one part of the tree.

-

Two set of branches with independent ω ratios Two different Q rate matrices → two P(t)-s Maximise log-likelihood wrt all parameters

ω

vs.

ω0

ω1

LRT ω

• • •

vs.

ω0

ω1

H0: Same selective pressure across entire tree. H1: Selective pressure change in one part of the tree. Likelihood Ratio Test:

-

H0 has p parameters with log-likelihood ℓ0 H1 has q parameters with log-likelihood ℓ1 Compare 2(ℓ1 - ℓ0) with χ2 distribution with d.o.f. = p - q to test whether to reject null

Lysozyme example !c

!h

Yang, Z. (1998). Mol Biol Evol, 15(5), 568–573.

Table 1 Log Likelihood Values and Parameter Estimates Under Different Models Model Large data set (n 19) A. One ratio: 0 H B. Two ratios: 0 H, C. Two ratios: 0 C, D. Two ratios: 0, H E. Three ratios: 0, H, F. Two ratios: G. Two ratios: H. Two ratios: I. Three ratios: J. Three ratios:

p

ˆ

ˆ0

......... C .......... H .......... C .......... C ...........

35 36 36 36 37

1043.84 1041.70 1039.92 1037.59 1037.04

4.157 4.163 4.186 4.199 4.196

0.574 0.489 0.484 0.392 0.392

1. . . . . . . 1 ...... 0 C, H 1 ...... 0, H C 1. . . . . . . . 0, H, C 1, C. . . . . . . . 0, H

35 35 35 36 36

1042.50 1042.29 1040.32 1037.92 1039.49

4.074 4.058 3.974 4.101 4.063

0.488 0.484 0.392 0.392 0.392

0

H,

C

C

ˆH ˆ0 ˆ0 7.166 ˆ0 1 1 1

ˆC ˆ0 3.383 ˆ0 ˆH 3.516 1 ˆ0 1 1 3.448

Yang, Z. (1998). Mol Biol Evol, 15(5), 568–573.

Table 2 Likelihood Ratio Statistics (2

Null Hypothesis Tested A. ( H B. C C. C D. H E. H A.( H B. C C. C D. H E. H

C)

Assumption Models Made Compared

0

........ 0 ........ 0 ........ 0 ........ 0

C)

1.. 1.. .. ..... 1.. .. ..... 1. . . . . . . . . 1.........

* t (P ** Extremely

) for Testing Hypotheses

5%;

H

C

H

0

free

H C C

0

free

H

C

H

0

free

H C C

0

free

2 1

t (P

3.84). 1%;

A A C A B

and D and B and E and C and E

D and H B and F E and I C and G E and J 2 1

Large Data Set (n 19)

Small Data Set (n 7)

12.50** 4.28* 5.76* 7.84** 9.32**

8.78** 2.76 3.96* 5.88* 7.08**

5.46* 1.60 1.76 4.74* 4.90*

5.46* 1.68 1.84 4.60* 4.76*

6.63).

Yang, Z. (1998). Mol Biol Evol, 15(5), 568–573.

Site-specific adaptation •

Previous model assumes all sites are under the same selection pressure (averaging ω across all sites).

•

Unrealistic; sites have various functional and structural roles, and so likely to be under different selective pressure.

•

Many sites might be under strong purifying selection, ω close to 0.

‘Site model’ allows for variable selection intensity among amino acid sites.

Nielsen, R. & Yang, Z. (1998) Genetics 148:929-936 Yang, Z. et al. (2000). Genetics, 155(1), 431–449.

Random-sites model • Use statistical distribution to describe variation of ω among sites.

- e.g. classes of sites with different ω • Test: 1. Do sites exist where ω > 1? (LRT) 2. If so, identify positively selected sites (Bayes)

Simple model • • •

H0: M1 (neutral)

-

proportion p0 of conserved sites, ω0 = 0 proportion p1 of neutral sites, ω1 = 1

H1: M2 (selection)

-

as H0, and... proportion p2 of sites with ω2 estimated from data

Does M2 fit data significantly better than M1 (LRT) and is M2 estimated ω2 greater than one?

Nielsen, R. & Yang, Z. (1998) Genetics 148:929-936

More models TABLE 2

Models of variable

p

Model code M0 M1 M2 M3

(one-ratio) (neutral) (selection) (discrete)

Parameters

M5 (gamma) M6 (2gamma) M7 (beta) M8 (beta& ) M9 (beta&gamma) M10 (beta&gamma 1) M11 (beta&normal 1)

1 1 3 2K 1 (K 3) K 1 (K 5) 2 4 2 4 5 5 5

M12 (0&2normal 1)

5

p0, p1,

2

M13 (3normal 0)

6

p0, p1,

2

M4 (freqs)

p, number of parameters in the

ratios among sites

Notes One ratio for all sites p1 1 p0, 0 0, 1 1 p2 1 p0 p1, 0 0, 1 1 pK 1 1 p0 p1 . . . pK 2

p0 p0, p1, 2 p0, p1, . . . , pK 2, 0, 1, . . . , K 1 p0, p1, . . . , pK 2 , p0, 0, p, q p0, p, q, p0, p, q, p0, p, q, p0, p, q,

0

,

The

1

, , , ,

,

,

1

,

0

2

,

1

2

k

are fixed at 0, 1⁄3, 2⁄3, 1, and 3

From ( , ) p0 from ( 0, 0) and p1 1 p0 from ( 1, 1) From (p, q) p0 from (p, q) and 1 p0 with p0 from (p, q) and 1 p0 from ( , ) p0 from (p, q) and 1 p0 from 1 ( , ) p0 from (p, q) and 1 p0 from ( , 2), truncated to 1 p0 with 0 0 and 1 p0 from the mixture: p1 from (1, 21), and 1 p1 from ( 2, 22), both normals truncated to 1 p0 from (0, 20), p1 from (1, 21), and p2 1 p0 p1 from ( 2, 22), all normals truncated to 1

distribution.

Yang, Z. et al. (2000). Genetics, 155(1), 431–449.

Beta distribution

https://en.wikipedia.org/wiki/Beta_distribution

Common tests • •

*

Variable selective pressure among sites?

-

M0 (one ratio) and M3 (discrete)

Sites evolving by positive selection?

-

M1 (neutral) and M2 (selection) [ω0 = 0]

-

M7 (beta) and M8 (beta&ω)

M1a (NearlyNeutral) and M2a (PositiveSelection) [ω0 < 1] *

M8a (beta&ωs=1) and M8 (beta&ω) ✝

Yang, Z. et al. (2005). Mol Biol Evol, 22(4), 1107–1118. ✝ Swanson, W. J. et al. (2003). Mol Biol Evol, 20(1), 18–20.

Identification of sites evolving under positive selection • LRT is significant for alternative model and estimated ω > 1

• Empirical Bayes (EB) approach - Calculate posterior probability of each site for each ω class - which class does the site most likely belong to?

-

Positive selection acting on sites with high posterior probability for the “ω > 1” site class

EB procedure • Imagine K site classes with ω ratios (ω , 0

ω1, ..., ωk-1) and proportions (p0, p1, ..., pk-1)

• Posterior probability that site with data x is from site class i: P(x | ω i )pi P(ω i | x) = = P(x)

∑

P(x | ω i )pi

K −1 j=0

P(x | ω j )p j

Class I MHC example Table 2 Log-likelihood Values and Parameter Estimates Under Random-sites Models for the Class I MHC Alleles Model Code

p

M7 (beta). . . . . . . . . M8 (beta & ) . . . .

393 395

7498.97 7232.68

= 0.88

pˆ 0.103, qˆ 0.354 pˆ0 0.900, (pˆ1 0.100), pˆ 0.710, ˆ 5.122

Positively Selected Sites 0.168, qˆ

Not allowed 9F 24A 45M 62G 63E 67V 69A 70H 71S 77D 80T 81L 82R 94T 95V 97R 99Y 113Y 114H 116Y 151H 152V 156L 163T 167W

= 5.12

Probability

< 0.55

Estimates of Parameters

Site Yang, Z., & Swanson, W. J. (2002). Mol Biol Evol, 19(1), 49–57. Yang, Z. (2002). Current Opinion in Genetics & Development, 12(6), 688–694.

Variations of site model • Random-effects model • Fixed-sites model - Sites are partitioned by user • One ω per site - e.g. SLR Massingham & Goldman (2005). Genetics, 169(3), 1753–1762.

Branch-site models • • • •

Site class 0 - conserved Site class 1 - neutral Site class 2a - variable class 1 Site class 2b - variable class 2

•

Null:

• •

LRT

Yang, Z., & Nielsen, R. (2002). Mol Biol Evol, 19(6), 908–917. Yang, Z. et al. (2005). Mol Biol Evol, 22(4), 1107–1118. Yang, Z., & dos Reis, M. (2011). Mol Biol Evol, 28(3), 1217–1228.

!2 = 1 (fixed) Alternative: !2 >= 1 (estimated)

Notes

• Two sequences are not enough! • Branch models use one ω for all sites • Site models use distribution of ω for sites but unchanged in time

-

Good power if this is true

• Branch-site models - Some power in detecting episodic adaptation - Sensitive to sequence and alignment errors • Sequences need to be selected carefully

•

•

•

Parameters on the boundary affects the LRT

-

χ2 approximation is not expected to hold The test is conservative i.e. false positive rate will be less than significance level

Bayesian prediction of sites

-

no practical from a few highly similar sequences to increase accuracy, increase number of lineages sensitive to sample errors in parameter estimates and ω distribution

LRT seems robust to model assumptions, but test!

-

Use multiple models (M0-M3, M1a-M2a, M7-M8) Check for local optima Check effect of ω distribution, or codon frequency method, on parameters estimation and site identification

Limitations • Episodic selection • At a positively selected site, all amino acids are assumed to be equally advantageous

• ω > 1 is a conservative measure

(averaging across branches/sites)

• Ignores errors in the phylogeny

References Yang, Z. (2012) PAML User Guide Yang, Z., & Reis, dos, M. (2011). Statistical properties of the branch-site test of positive selection. Molecular biology and evolution, 28(3), 1217–1228. Yang, Z. (2006). Computational molecular evolution. Oxford University Press. Bielawski, J. P. & Yang, Z. (2005). Maximum Likelihood Methods for Detecting Adaptive Protein Evolution. Ch. 5 in Statistical Methods in Molecular Evolution ed. Nielsen, R. Springer, NY. Yang, Z., Wong, W. S. W., & Nielsen, R. (2005). Bayes empirical bayes inference of amino acid sites under positive selection. Molecular biology and evolution, 22(4), 1107– 1118. Yang, Z., & Nielsen, R. (2002). Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Molecular biology and evolution, 19(6), 908–917. Yang, Z., & Bielawski, J. P. (2000). Statistical methods for detecting molecular adaptation. Trends in ecology & evolution, 15(12), 496–503.

Yang, Z. et al. (2000). Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics, 155(1), 431–449. Nielsen, R. & Yang, Z. (1998). Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929-936. Yang, Z. (1998). Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Molecular biology and evolution, 15(5), 568–573. Yang, Z. & Nielsen, R. (1998). Synonymous and nonsynonymous rate variation in nuclear genes of mammals. Journal of Molecular Evolution 46:409-418. Goldman, N., & Yang, Z. (1994). A codon-based model of nucleotide substitution for protein-coding DNA sequences. Molecular biology and evolution, 11(5), 725–736.

Asif Tamuri EMBL-EBI Molecular Phylogenetics Course Agricultural [PDF]

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch