Idea Transcript
International Seminar and Workshop Causality, Information Transfer and Dynamic Networks Max Planck Institute for the Physics of Complex Systems
Causal Network Inference by Optimal Causation Entropy Jie Sun Department of Mathematics & Computer Science Clarkson University Collaborators Erik Bollt (Clarkson), Carlo Cafaro (Clarkson), and Dane Taylor (University of North Carolina) Acknowledgements Organizers: Jorn Davidsen, Peter Grassberger, Bjorn Schelter, Claudia Ponisch
June 16, 2014 (Dresden, Germany)
Dynamics on Networks
The Problem of Causal Network Inference (a)
network structure
(b)
node dynamics
(c) direct & indirect causal nodes
i
simulation / experiment
?
0
50
100
150
200
time time
Given: (time series) observations at each component of a system Goal: identify the direct causal relationships between the components …there are several reasons… ! medical diagnosis: identify the causes of a disease in order to suggest effective treatments
Procedure of Causal Network Inference Gather a sufficient amount of relevant data (experimental work) ! !
Develop an appropriate causal inference measure (theoretical work) ! !
Accurate and reliable estimate of such a measure (computational/statistical work)
A good causal inference measure should satisfy …general applicability and neat interpretation... !
…immune (in principle) to false positives and false negatives… !
…accurate and fast numerical estimation… !
Condition (1) →…linear and nonlinear interactions… !
Condition (2) →…correct identification of direct couplings in complex systems with more than two components… !
Condition (3) →…appropriate statistical estimation techniques…
Correlation vs. Causality
Correlation does not imply causality.
Information vs. Physical Causality
Mathematical Assumptions
The necessity of making assumptions:
M. Eichler, Graphical modeling of multivariate time series, Probability Theory and Related Fields (2012).
Stochastic process 8
(i) {Xt }i=1,2,...,N ;t=1,2,...
(i) Temporally Markov: > > > > > p(Xt |Xt 1 , Xt 2 , . . . ) = p(Xt |Xt > > > p(Xt |Xt 1 ) = p(Xt |Xt 1 ) for any i. > > > > > (iii) Identifiability: > > : (i) (K) (i) (L) p(Xt |Xt 1 ) 6= p(Xt |Xt 1 ) whenever (K \ Ni ) 6= (L \ Ni ).
Ni : set of direct causal components of i Ni
i
(also assumes full observability of all the components)
Basic Information-theoretic Measures Z
Entropy h(X) = p(x) log p(x)dx 8 R ⌘ h(Y, X) ⌘ p(x, y) log p(x, y)dxdy. > R :Conditional entropies: h(Y |X) ⌘ p(x, y) log p(y|x)dxdy. 8 > : I(X; Y |Z) ⌘ h(X|Z) h(X|Y, Z) ⌘ h(Y |Z) h(Y |X, Z). Cover & Thomas (2006).
Transfer Entropy, Self-Causality, and Conditioning Transfer Entropy (TE) TY !X ⌘ h(Xt+1 |Xt )
h(Xt+1 |Xt , Yt )
future of X past of X
past of Y
TY !X measures the reduction of uncertainty about Xt+1 given knowledge about Yt in addition to that of Xt . T. Schreiber, Phys. Rev. Lett. 85, 461 (2000) A. Kaiser & T. Schreiber, Physica D (2002) M. Palus et. al., Phys. Rev. E (2001)
Remarks: 1. TX!X ⌘ 0 , TE makes no indication about whether the process {Xt } is self-causal or not, and is not designed to address such question. !
2. TY !X is a bivariate measure (from Y to X), and is therefore not designed to infer direct causality within multiple processes. Self-causal or not affects system controllability.
Cowan et.al., PLOS ONE (2012)
When there are multiple processes, Yt directly causes Xt+1 only if such “cause” remains after the removal of all other conditions. Frenzel and Pompe, Phys. Rev. Lett. (2007) Vejmelka and Palus, Phys. Rev. E (2008)
Benjamin Franklin and Clive Granger For want of a nail a shoe was lost,
For want of a shoe a horse was lost,
For want of a horse a battle was lost
!
and
!
Clive Granger
Benjamin Franklin For want of a battle a kingdom was lost.
And all for the want of a horseshoe nail.
Conclusion -‐ The blacksmith destroyed the kingdom. – everything is connected to everything – everything causes (and is caused by) everything ….yes….but not so useful. Granger’s notion of causality: (Granger, 1969)! (1) The cause should occur before the effect (caused)! (2) The causal process should carry information— unavailable in other processes—about the effect.
Causation Entropy The Causation Entropy (CSE) from the set of components J to I conditioning on K is defined (explicitly) as: CJ!I|K =
(I) (K) h(Xt+1 |Xt )
(I) (K) (J) h(Xt+1 |Xt , Xt )
Remarks: 1. CSE can be used to assess whether a process is “self-causal”. 2. CSE does not “solve” the causal inference problem. 3. The definition simply emphasizes the fact that cause-and-effect is not a bivariate question, but rather, involves all three parts (cause, effect, and conditioning).
Analytical Properties of CSE (a)
CJ
K
I
I |K
=0
J
(b)
CJ
K
I |K
=0
J
(c)
CJ
I |K
>0
K
NI
NI
I
I
J
Existing Approaches of Conditioning 1. Condition on everything: j i
— curse of dimensionality
(estimation in the fulldimensional space)
2. Condition on subsets of increasing cardinality (PC-algorithm) j
— combinatorial search (high computational burden) i
Spirtes, Glymour, Scheines (2000). Runge, Heitzig, Petoukhov, Kurths, PRL (2012)
Forward & Backward Conditioning: Discover & Remove (a) True network structure
(b) Aggregative discovery of causal nodes if C j i |K is maximal then K’ = K {j}
(c)
Divisive removal of non-causal nodes if Cj i |(K-{j}) = 0 then K’ = K-{j} K’
i K i K’
j
i
j K
Remarks: 1. The “forward” step (Aggregative Discovery) alone in principle would result in false positives that cannot be mitigated by the increased Vlachos, Kugiumtzis, Phys. Rev. E (2010); Kugiumtzis, Phys. Rev. E (2013) amount of data. !
2. The “backward” step (Divisive removal) can be modified to one that is analogous to the PC-algorithm in the statistical inference literature, with the key difference that here enumeration of conditioning subsets needs to be performed only within K, rather than for all nodes. Spirtes, Glymour, Scheines (2000). Runge, Heitzig, Petoukhov, Kurths, PRL (2012)
(i) Xt
Benchmark Systems X (j) (i) -- Classical communication channel
= Aij Xt 1 + ⇠t -- Fluctuations around equilibrium
j2Ni
stable adjacency matrix
uncorrelated Gaussian noise Ahmed and Gokhale, IEEE Trans. Info. Theor. (1989). Barnett, Phys. Rev. Lett. (2009).
(i) Xt are (
(j) Gaussian, with covariance (⌧, t)ij ⌘ cov[x(i) t+⌧ , xt ] P1 (0) ⌘ limt!1 (0, t) = k=0 Ak S(Ak )> (⌧ ) = A (⌧ 1) = A2 (⌧ 2) = · · · = A⌧ (0)
CJ!I|K
0
⇥
1 (0)KK
⇤
1
det (0)II (1)IK (1)> 1 IK @ h iA = log 1 2 det (0)II (1)I,K[J (0)K[J,K[J (1)> I,K[J
Analytical Calculation: Directed Linear Chain 1
2
...
3
n
Cj!i
1 = 2
i,j+1 log 1 +
Pj
k=1 2 i
2 k
!
Analytical Calculation: Directed Loop 1
2
...
3
n
Cj!i
1 = 2
pi ,j
log
⇣
1 1
w2
⌘
Analytical Calculation: Directed Tree
false positives
No conditioning:
1 = 2
(di + 1)(dj + 1) di ,dj +1 log (di + 1)(dj + 1) (dpij + 1)2
Cj!i ( pi = arg maxj Cj!i , With conditioning: Cj!i|{pi } = 0.
Numerical Tests for Random Directed Networks n : number of nodes p : connection of probability w : edge weights
172 79 185
pp
12 7
pp
pp
(each directed edge has weight w or -w)
98 pp
pp
15
pp
pp
pp
pp
46 pp
200
90
100
pp
61
pp pp
75
pp
pp
169 21
pp
43
pp pp
125
pp
pp
102
pp pp
183
pp
pp
pp
186 108
pp
67
74
pp
pp
pp
pp
pp
pp
pp
60
pp
pp
9 9 pp
pp pp
pp
86
pp
pp
188
155 pp
pp
p pp p p p
pp
141
95 pp
163
pp
pp
69 177
pp
178
pp
25
17
pp
pp
pp
144
167
151
pp
pp
80
pp
pp
pp pp
pp
pp
pp
87
pp pppp
pp
pp
53 5
pp
pp
pp
147
pp
pp
pp
pp
47 pp
65 pp
pp
pp
128
pp
pp pp
170
pp
pp pp
106
pp
97
126
pp pp
182 9
pp
pp
191
pp pp
pp
pp
pp
pp
pp
96
137
pp pp
pp
pp
50
162
pp
42
pp
165 176
117
122
pp
129 194
157
159
pp
30 pp
192 pp
82
70
pp
pp
pp
132
40
pp
pp
193
pp pp
1 4p p pp
92
34
pp pp
1
131
78
pp
pp
pp
pp
197
pp
1 1p p3
pp
pp
pp
pp
pp
pp
pp pp
pp pp
57
pp
58
pp
63
112 pp
pp
109
pp
68
22
44
pp pp pp
pp pp
pp
pp
160
123
pp
4
175
pp
pp
pp
pp
pp
45
pp
pp
pp
8
pp
11
pp
pp
pp
3p2p
pp
pp
85
134
64
127
pp pp
pp
13
77
pp pp
139
pp
pp pp
pp
150
pp
pp
56
pp
pp
pp pp
pp
pp pp
pp
pp
41 pp
135 pp
pp
pp
pp
pp pp
3p8p
pp
pp
3
1 4 0p p
pp
158
pp
pp
pp
pp
pp
pp
174
pp
66
52
pp
pp
187 8 8
27
pp
107
pp pp pp
118
pp
29
pp pp
pp pp
pp pp
pp
pp
pp
35
pp pp
pp
81
pp
84
pp
pp
pp
pp
pp
pp
110 37 pp
180
190
pp
114
pp
143
pp
pp
pp
94
pp
pp
pp
111
pp
154 pp
28
93
pp
pp pp
pp
pp
pp
pp
pp
pp
51
pp
136
pp
pp
pp
pp
pp
62
pp pp
49
pp
138 1 3 0 1 9p9p
pp
36
pp
pp
pp
pp
72
76
91
pp
pp
pp
pp
pp
184
54
pp
pp
pp
pp
pp
pp
pp
1 4p6p
pp
71
pp
pp
pp
19 pp
83
198 181
24
pp
pp
pp
pp
pp
pp
pp pp
pp
pp
pp pp pp
pp
pp
142
pp pp pp pp
pp
pp pp pp
189
16
33
104
10
pp pp
pp pp
pp
pp
115
pp pp
pp
156
pp
48 pp pp
pp
pppp p p
121
pp
pp pp
pp
pp
pp
pp
pp
pp
196
pp
pp
pp
105
73
173
pp
pp
152
pp
pp pp
pp pp
pp
5 p9p
145
pp
55
pp
pp
pp pp
pp
pp
1 9 5p p
pp pp
101
pp pp
pp
pp
23
pp
39
pp
171 1 8
pp
pp
pp
pp
pp
116
26
pp
pp
pp pp
pp
pp
pp
pp
pp pp
2
pp
161
pp
pp pp
pp
166
149
pp
⇢(A) : spectral radius
pp
103
3p p1
pp
164
pp
179
pp
pp
pp
pp
pp
pp
pp
pp
1 1p9p
pp
20
pp
133
89
pp
pp
6
pp
124
120
168
Significance Level and Network Size n = 200, np = 10, ⇢(A) = 0.8
fals e pos it ive , ε +
0.5
200 400 600 800 1000 s am ple s iz e , T
0.1
0.05
0 0
200 400 600 800 1000 s am ple s iz e , T
n = 500
1 600
400
0.5 200 0
0 0
(d)
n = 200
T*
(c)
1
0 0
(b)
n = 100
θ = 99.9%
fals e ne gat ive , ε −
θ = 99%
fals e pos it ive , ε +
(a)
fals e ne gat ive , ε −
θ = 95%
np = 10, ⇢(A) = 0.8
500 n
1000
200 400 600 800 1000 s am ple s iz e , T
0.01
0.005
0 0
200 400 600 800 1000 s am ple s iz e , T
Effects of Sample Size and Spectral Radius n = 200, np = 10
n = 200, ⇢(A) = 0.8 2000
0.5 0 0
10 20 30 40 50 np
200 400 600 800 1000 s am ple s iz e , T
0.01
0.005
0 0
200 400 600 800 1000 s am ple s iz e , T
1
ρ(A)=0.9
4
10
3
10
0.5
2
10 0.2
0 0
(d)
ρ(A)=0.6
T*
(c)
fals e pos it ive , ε +
fals e pos it ive , ε +
ρ(A)=0.3
1000
0 0
(b)
np = 20
fals e ne gat ive , ε −
1
np = 10
T*
(a)
fals e ne gat ive , ε −
np = 5
0.6 ρ(A)
1000 2000 s am ple s iz e , T
3000
1000 2000 s am ple s iz e , T
3000
0.01
0.005
0 0
1
Conclusions — oCSE allows for direct causal inference in an efficient way (it can be extended to finite-order Markov process and also process with infinite but fading memory) — Tradeoff between computation (choosing the conditioning set) and estimation (resulting dimensionality of the random variables)
Challenges — Latent variables M. Eichler, Journal of Machine Learning Research (2010) — Non-stationarity Bai and Perron, Journal of Applied Econometrics (2003) — Reliable estimation Kraskov and Grassberger, PRE (2004); Frenzel and Pompe, PRL (2007) — Exact statistical test Pethel and Hahs, Entropy (2014, to appear)
Acknowledgements This work is funded by ARO Grant No. 61386-EG. We thank Dr Samuel Stanton (ARO Complex Dynamics and Systems Program) for his continuous and ongoing support.
References JS and E.M. Bollt, Physica D267, 49 (2014) JS, D. Taylor, and E.M. Bollt, arXiv:cs.IT/1401.7574 (2014) JS, C. Cafaro, and E.M. Bollt, Entropy (to appear, 2014)
Thank you!