Causal Network Inference by Optimal Causation Entropy [PDF]

Collaborators. Erik Bollt (Clarkson), Carlo Cafaro (Clarkson), and. Dane Taylor (University of North Carolina). Acknowle

16 downloads 4 Views 4MB Size

Recommend Stories


Inference of networks of causal relationships using Causation Entropy
And you? When will you begin that long journey into yourself? Rumi

Causal Inference Topics
We must be willing to let go of the life we have planned, so as to have the life that is waiting for

Population Heterogeneity and Causal Inference
The wound is the place where the Light enters you. Rumi

Causal Inference with Observational Data
Don’t grieve. Anything you lose comes round in another form. Rumi

Causal Inference: a Bayesian Perspective
Don't count the days, make the days count. Muhammad Ali

statistical models and causal inference
Knock, And He'll open the door. Vanish, And He'll make you shine like the sun. Fall, And He'll raise

Regression Analysis and Causal Inference
If your life's work can be accomplished in your lifetime, you're not thinking big enough. Wes Jacks

Causal Inference with Observational Data
Be who you needed when you were younger. Anonymous

UAI 2014 Workshop Causal Inference
Be grateful for whoever comes, because each has been sent as a guide from beyond. Rumi

Causal inference with observational data
What we think, what we become. Buddha

Idea Transcript


International Seminar and Workshop Causality, Information Transfer and Dynamic Networks Max Planck Institute for the Physics of Complex Systems

Causal Network Inference by Optimal Causation Entropy Jie Sun Department of Mathematics & Computer Science Clarkson University Collaborators Erik Bollt (Clarkson), Carlo Cafaro (Clarkson), and Dane Taylor (University of North Carolina) Acknowledgements Organizers: Jorn Davidsen, Peter Grassberger, Bjorn Schelter, Claudia Ponisch

June 16, 2014 (Dresden, Germany)

Dynamics on Networks

The Problem of Causal Network Inference (a)

network structure

(b)

node dynamics

(c) direct & indirect causal nodes

i

simulation / experiment

?

0

50

100

150

200

time time

Given: (time series) observations at each component of a system Goal: identify the direct causal relationships between the components …there are several reasons… ! medical diagnosis: identify the causes of a disease in order to suggest effective treatments

Procedure of Causal Network Inference Gather a sufficient amount of relevant data (experimental work) ! !

Develop an appropriate causal inference measure (theoretical work) ! !

Accurate and reliable estimate of such a measure (computational/statistical work)

A good causal inference measure should satisfy …general applicability and neat interpretation... !

…immune (in principle) to false positives and false negatives… !

…accurate and fast numerical estimation… !

Condition (1) →…linear and nonlinear interactions… !

Condition (2) →…correct identification of direct couplings in complex systems with more than two components… !

Condition (3) →…appropriate statistical estimation techniques…

Correlation vs. Causality

Correlation does not imply causality.

Information vs. Physical Causality

Mathematical Assumptions

The necessity of making assumptions:

M. Eichler, Graphical modeling of multivariate time series, Probability Theory and Related Fields (2012).

Stochastic process 8

(i) {Xt }i=1,2,...,N ;t=1,2,...

(i) Temporally Markov: > > > > > p(Xt |Xt 1 , Xt 2 , . . . ) = p(Xt |Xt > > > p(Xt |Xt 1 ) = p(Xt |Xt 1 ) for any i. > > > > > (iii) Identifiability: > > : (i) (K) (i) (L) p(Xt |Xt 1 ) 6= p(Xt |Xt 1 ) whenever (K \ Ni ) 6= (L \ Ni ).

Ni : set of direct causal components of i Ni

i

(also assumes full observability of all the components)

Basic Information-theoretic Measures Z

Entropy h(X) = p(x) log p(x)dx 8 R ⌘ h(Y, X) ⌘ p(x, y) log p(x, y)dxdy. > R :Conditional entropies: h(Y |X) ⌘ p(x, y) log p(y|x)dxdy. 8 > : I(X; Y |Z) ⌘ h(X|Z) h(X|Y, Z) ⌘ h(Y |Z) h(Y |X, Z). Cover & Thomas (2006).

Transfer Entropy, Self-Causality, and Conditioning Transfer Entropy (TE) TY !X ⌘ h(Xt+1 |Xt )

h(Xt+1 |Xt , Yt )

future of X past of X

past of Y

TY !X measures the reduction of uncertainty about Xt+1 given knowledge about Yt in addition to that of Xt . T. Schreiber, Phys. Rev. Lett. 85, 461 (2000) A. Kaiser & T. Schreiber, Physica D (2002) M. Palus et. al., Phys. Rev. E (2001)

Remarks: 1. TX!X ⌘ 0 , TE makes no indication about whether the process {Xt } is self-causal or not, and is not designed to address such question. !

2. TY !X is a bivariate measure (from Y to X), and is therefore not designed to infer direct causality within multiple processes. Self-causal or not affects system controllability.

Cowan et.al., PLOS ONE (2012)

When there are multiple processes, Yt directly causes Xt+1 only if such “cause” remains after the removal of all other conditions. Frenzel and Pompe, Phys. Rev. Lett. (2007) Vejmelka and Palus, Phys. Rev. E (2008)

Benjamin Franklin and Clive Granger For want of a nail a shoe was lost,

For want of a shoe a horse was lost,

For want of a horse a battle was lost

!

and

!

Clive Granger

Benjamin Franklin For want of a battle a kingdom was lost.

And all for the want of a horseshoe nail.

Conclusion  -­‐  The  blacksmith  destroyed  the  kingdom.   –  everything  is  connected  to  everything   –  everything  causes  (and  is  caused  by)  everything   ….yes….but  not  so  useful. Granger’s notion of causality: (Granger, 1969)! (1) The cause should occur before the effect (caused)! (2) The causal process should carry information— unavailable in other processes—about the effect.

Causation Entropy The Causation Entropy (CSE) from the set of components J to I conditioning on K is defined (explicitly) as: CJ!I|K =

(I) (K) h(Xt+1 |Xt )

(I) (K) (J) h(Xt+1 |Xt , Xt )

Remarks: 1. CSE can be used to assess whether a process is “self-causal”. 2. CSE does not “solve” the causal inference problem. 3. The definition simply emphasizes the fact that cause-and-effect is not a bivariate question, but rather, involves all three parts (cause, effect, and conditioning).

Analytical Properties of CSE (a)

CJ

K

I

I |K

=0

J

(b)

CJ

K

I |K

=0

J

(c)

CJ

I |K

>0

K

NI

NI

I

I

J

Existing Approaches of Conditioning 1. Condition on everything: j i

— curse of dimensionality

(estimation in the fulldimensional space)

2. Condition on subsets of increasing cardinality (PC-algorithm) j

— combinatorial search (high computational burden) i

Spirtes, Glymour, Scheines (2000). Runge, Heitzig, Petoukhov, Kurths, PRL (2012)

Forward & Backward Conditioning: Discover & Remove (a) True network structure

(b) Aggregative discovery of causal nodes if C j i |K is maximal then K’ = K {j}

(c)

Divisive removal of non-causal nodes if Cj i |(K-{j}) = 0 then K’ = K-{j} K’

i K i K’

j

i

j K

Remarks: 1. The “forward” step (Aggregative Discovery) alone in principle would result in false positives that cannot be mitigated by the increased Vlachos, Kugiumtzis, Phys. Rev. E (2010); Kugiumtzis, Phys. Rev. E (2013) amount of data. !

2. The “backward” step (Divisive removal) can be modified to one that is analogous to the PC-algorithm in the statistical inference literature, with the key difference that here enumeration of conditioning subsets needs to be performed only within K, rather than for all nodes. Spirtes, Glymour, Scheines (2000). Runge, Heitzig, Petoukhov, Kurths, PRL (2012)

(i) Xt

Benchmark Systems X (j) (i) -- Classical communication channel

= Aij Xt 1 + ⇠t -- Fluctuations around equilibrium

j2Ni

stable adjacency matrix

uncorrelated Gaussian noise Ahmed and Gokhale, IEEE Trans. Info. Theor. (1989). Barnett, Phys. Rev. Lett. (2009).

(i) Xt are (

(j) Gaussian, with covariance (⌧, t)ij ⌘ cov[x(i) t+⌧ , xt ] P1 (0) ⌘ limt!1 (0, t) = k=0 Ak S(Ak )> (⌧ ) = A (⌧ 1) = A2 (⌧ 2) = · · · = A⌧ (0)

CJ!I|K

0



1 (0)KK



1

det (0)II (1)IK (1)> 1 IK @ h iA = log 1 2 det (0)II (1)I,K[J (0)K[J,K[J (1)> I,K[J

Analytical Calculation: Directed Linear Chain 1

2

...

3

n

Cj!i

1 = 2

i,j+1 log 1 +

Pj

k=1 2 i

2 k

!

Analytical Calculation: Directed Loop 1

2

...

3

n

Cj!i

1 = 2

pi ,j

log



1 1

w2



Analytical Calculation: Directed Tree

false positives

No conditioning:

1 = 2

(di + 1)(dj + 1) di ,dj +1 log (di + 1)(dj + 1) (dpij + 1)2

Cj!i ( pi = arg maxj Cj!i , With conditioning: Cj!i|{pi } = 0.

Numerical Tests for Random Directed Networks n : number of nodes p : connection of probability w : edge weights

172 79 185

pp

12 7

pp

pp

(each directed edge has weight w or -w)

98 pp

pp

15

pp

pp

pp

pp

46 pp

200

90

100

pp

61

pp pp

75

pp

pp

169 21

pp

43

pp pp

125

pp

pp

102

pp pp

183

pp

pp

pp

186 108

pp

67

74

pp

pp

pp

pp

pp

pp

pp

60

pp

pp

9 9 pp

pp pp

pp

86

pp

pp

188

155 pp

pp

p pp p p p

pp

141

95 pp

163

pp

pp

69 177

pp

178

pp

25

17

pp

pp

pp

144

167

151

pp

pp

80

pp

pp

pp pp

pp

pp

pp

87

pp pppp

pp

pp

53 5

pp

pp

pp

147

pp

pp

pp

pp

47 pp

65 pp

pp

pp

128

pp

pp pp

170

pp

pp pp

106

pp

97

126

pp pp

182 9

pp

pp

191

pp pp

pp

pp

pp

pp

pp

96

137

pp pp

pp

pp

50

162

pp

42

pp

165 176

117

122

pp

129 194

157

159

pp

30 pp

192 pp

82

70

pp

pp

pp

132

40

pp

pp

193

pp pp

1 4p p pp

92

34

pp pp

1

131

78

pp

pp

pp

pp

197

pp

1 1p p3

pp

pp

pp

pp

pp

pp

pp pp

pp pp

57

pp

58

pp

63

112 pp

pp

109

pp

68

22

44

pp pp pp

pp pp

pp

pp

160

123

pp

4

175

pp

pp

pp

pp

pp

45

pp

pp

pp

8

pp

11

pp

pp

pp

3p2p

pp

pp

85

134

64

127

pp pp

pp

13

77

pp pp

139

pp

pp pp

pp

150

pp

pp

56

pp

pp

pp pp

pp

pp pp

pp

pp

41 pp

135 pp

pp

pp

pp

pp pp

3p8p

pp

pp

3

1 4 0p p

pp

158

pp

pp

pp

pp

pp

pp

174

pp

66

52

pp

pp

187 8 8

27

pp

107

pp pp pp

118

pp

29

pp pp

pp pp

pp pp

pp

pp

pp

35

pp pp

pp

81

pp

84

pp

pp

pp

pp

pp

pp

110 37 pp

180

190

pp

114

pp

143

pp

pp

pp

94

pp

pp

pp

111

pp

154 pp

28

93

pp

pp pp

pp

pp

pp

pp

pp

pp

51

pp

136

pp

pp

pp

pp

pp

62

pp pp

49

pp

138 1 3 0 1 9p9p

pp

36

pp

pp

pp

pp

72

76

91

pp

pp

pp

pp

pp

184

54

pp

pp

pp

pp

pp

pp

pp

1 4p6p

pp

71

pp

pp

pp

19 pp

83

198 181

24

pp

pp

pp

pp

pp

pp

pp pp

pp

pp

pp pp pp

pp

pp

142

pp pp pp pp

pp

pp pp pp

189

16

33

104

10

pp pp

pp pp

pp

pp

115

pp pp

pp

156

pp

48 pp pp

pp

pppp p p

121

pp

pp pp

pp

pp

pp

pp

pp

pp

196

pp

pp

pp

105

73

173

pp

pp

152

pp

pp pp

pp pp

pp

5 p9p

145

pp

55

pp

pp

pp pp

pp

pp

1 9 5p p

pp pp

101

pp pp

pp

pp

23

pp

39

pp

171 1 8

pp

pp

pp

pp

pp

116

26

pp

pp

pp pp

pp

pp

pp

pp

pp pp

2

pp

161

pp

pp pp

pp

166

149

pp

⇢(A) : spectral radius

pp

103

3p p1

pp

164

pp

179

pp

pp

pp

pp

pp

pp

pp

pp

1 1p9p

pp

20

pp

133

89

pp

pp

6

pp

124

120

168

Significance Level and Network Size n = 200, np = 10, ⇢(A) = 0.8

fals e pos it ive , ε +

0.5

200 400 600 800 1000 s am ple s iz e , T

0.1

0.05

0 0

200 400 600 800 1000 s am ple s iz e , T

n = 500

1 600

400

0.5 200 0

0 0

(d)

n = 200

T*

(c)

1

0 0

(b)

n = 100

θ = 99.9%

fals e ne gat ive , ε −

θ = 99%

fals e pos it ive , ε +

(a)

fals e ne gat ive , ε −

θ = 95%

np = 10, ⇢(A) = 0.8

500 n

1000

200 400 600 800 1000 s am ple s iz e , T

0.01

0.005

0 0

200 400 600 800 1000 s am ple s iz e , T

Effects of Sample Size and Spectral Radius n = 200, np = 10

n = 200, ⇢(A) = 0.8 2000

0.5 0 0

10 20 30 40 50 np

200 400 600 800 1000 s am ple s iz e , T

0.01

0.005

0 0

200 400 600 800 1000 s am ple s iz e , T

1

ρ(A)=0.9

4

10

3

10

0.5

2

10 0.2

0 0

(d)

ρ(A)=0.6

T*

(c)

fals e pos it ive , ε +

fals e pos it ive , ε +

ρ(A)=0.3

1000

0 0

(b)

np = 20

fals e ne gat ive , ε −

1

np = 10

T*

(a)

fals e ne gat ive , ε −

np = 5

0.6 ρ(A)

1000 2000 s am ple s iz e , T

3000

1000 2000 s am ple s iz e , T

3000

0.01

0.005

0 0

1

Conclusions — oCSE allows for direct causal inference in an efficient way (it can be extended to finite-order Markov process and also process with infinite but fading memory) — Tradeoff between computation (choosing the conditioning set) and estimation (resulting dimensionality of the random variables)

Challenges — Latent variables M. Eichler, Journal of Machine Learning Research (2010) — Non-stationarity Bai and Perron, Journal of Applied Econometrics (2003) — Reliable estimation Kraskov and Grassberger, PRE (2004); Frenzel and Pompe, PRL (2007) — Exact statistical test Pethel and Hahs, Entropy (2014, to appear)

Acknowledgements This work is funded by ARO Grant No. 61386-EG. We thank Dr Samuel Stanton (ARO Complex Dynamics and Systems Program) for his continuous and ongoing support.

References JS and E.M. Bollt, Physica D267, 49 (2014) JS, D. Taylor, and E.M. Bollt, arXiv:cs.IT/1401.7574 (2014) JS, C. Cafaro, and E.M. Bollt, Entropy (to appear, 2014)

Thank you!

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.