Chapter 8 Differential Entropy [PDF]

Peng-Hua Wang, May 14, 2012. Information Theory, Chap. 8 - p. 4/24. Definitions. Definition 1 (Differential entropy) The

6 downloads 4 Views 187KB Size

Recommend Stories


Chapter 7 Chapter 8
Learning never exhausts the mind. Leonardo da Vinci

Chapter 8 Statistics
Don't ruin a good today by thinking about a bad yesterday. Let it go. Anonymous

Title 19 Chapter 8
If your life's work can be accomplished in your lifetime, you're not thinking big enough. Wes Jacks

MathLinks 8 Chapter 11
Learn to light a candle in the darkest moments of someone’s life. Be the light that helps others see; i

Chapter 8 Communication Skills
I tried to make sense of the Four Books, until love arrived, and it all became a single syllable. Yunus

chapter 8 reinforcing steel
How wonderful it is that nobody need wait a single moment before starting to improve the world. Anne

Chapter 8 Isostasy
Silence is the language of God, all else is poor translation. Rumi

Chapter 8 Meditations
Respond to every call that excites your spirit. Rumi

Final EIS, Chapter 8
Why complain about yesterday, when you can make a better tomorrow by making the most of today? Anon

Chapter 8. Optimal Control
So many books, so little time. Frank Zappa

Idea Transcript


Chapter 8 Differential Entropy Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University

Chapter Outline Chap. 8 Differential Entropy 8.1 Definitions 8.2 AEP for Continuous Random Variables 8.3 Relation of Differential Entropy to Discrete Entropy 8.4 Joint and Conditional Differential Entropy 8.5 Relative Entropy and Mutual Information 8.6 Properties of Differential Entropy and Related Amounts

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 2/24

8.1 Definitions

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 3/24

Definitions Definition 1 (Differential entropy) The differential entropy h(X) of a continuous random variable X with pdf f (X) is defined as

h(X) = −

Z

f (x) log f (x)dx, S

where S is the support region of the random variable. Example

X ∼ U (0, a), h(X) = −

Peng-Hua Wang, May 14, 2012

Z

a 0

1 1 log dx = log a. a a

Information Theory, Chap. 8 - p. 4/24

Differential Entropy of Gaussian −

x2 2σ 2

1 ∼ N (0, σ 2 ) with pdf φ(x) = √2πσ e , then 2 Z ha (x) = − φ(x) loga φ(x)dx   Z 2 1 x = − φ(x) loga √ − 2 loga e dx 2πσ 2 2σ loga e 1 1 2 2 2 E [X ] = log (2πeσ )  = loga (2πσ ) + φ a 2 2 2σ 2

Example. If X

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 5/24

Differential Entropy of Gaussian Remark. If a random variable with pdf f (x) has zero mean and variance σ 2 , then



Z

f (x) loga φ(x)dx   Z 2 1 x = − f (x) loga √ − 2 loga e dx 2πσ 2 2σ loga e 1 1 2 2 2 E [X ] = log (2πeσ ) = loga (2πσ ) + f a 2 2 2σ 2

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 6/24

Gaussian has Maximal Differential Entropy Suppose that a random variable X with pdf f (x) has zero mean and variance σ 2 , what is its maximal differential entropy? Let φ(x) be the pdf of N (0, σ 2 ). Z

Z

φ(x) h(X) + f (x) log φ(x)dx = f (x) log dx f (x) Z  φ(x) dx (convexity of logarithm) ≤ log f (x) f (x) Z = log φ(x)dx = 0

That is,

h(X) ≤ −

Z

1 f (x) log φ(x)dx = log(2πeσ 2 ) 2

and equality holds if f (x)

Peng-Hua Wang, May 14, 2012

= φ(x).  Information Theory, Chap. 8 - p. 7/24

8.2 AEP for Continuous Random Variables

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 8/24

AEP Theorem 1 (AEP) Let X1 , X2 , . . . , Xn be a sequence of i.i.d. random variables with common pdf f (x). Then,

1 − log f (X1 , X2 , . . . , Xn ) → E[− log f (X)] = h(X) n in probability. Definition 2 (Typical Set) For ǫ

> 0 the typical set

(n) Aǫ

with respect

to f (x) is defined as



n A(n) = (x , x , . . . , x ) ∈ S : 1 2 n ǫ  1 − log f (x1 , x2 , . . . , xn ) − h(X) ≤ ǫ n Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 9/24

AEP Definition 3 (Volume) The volume Vol(A) of a set A as

Vol(A) =

Z

⊂ Rn is defined

dx1 dx2 . . . dxn A

Theorem 2 (Properties of typical set) 1. sufficiently large.

(n)

Pr(Aǫ ) > 1 − ǫ for n

(n)

2.

Vol(Aǫ ) ≤ 2n(h(X)+ǫ) for all n.

3.

Vol(Aǫ ) ≥ (1 − ǫ)2n(h(X)−ǫ) for n sufficiently large.

(n)

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 10/24

8.4 Joint and Conditional Differential Entropy

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 11/24

Definitions Definition 4 (Differential entropy) The differential entropy of jointly distributed random variables X1 , X2 , . . . , Xn is defined as

h(X1 , X2 , . . . , Xn ) = − where f (xn )

Z

f (xn ) log f (xn )dxn

= f (x1 , x2 , . . . , xn ) is the joint pdf.

Definition 5 (Conditional differential entropy) The conditional differential entropy of jointly distributed random variables X, Y with joint pdf f (x, y) is defined as, if it exists,

h(X|Y ) = −

Peng-Hua Wang, May 14, 2012

Z

f (x, y) log f (x|y)dxdy = h(X, Y ) − h(Y )

Information Theory, Chap. 8 - p. 12/24

Multivariate Normal Distribution Theorem 3 (Entropy of a multivariate normal) Let X1 , X2 , . . . , Xn have a multivariate normal distribution with mean vector µ and covariance matrix K. Then

1 h(X1 , X2 , . . . , Xn ) = log(2πe)n |K| 2 Proof. The joint pdf of a multivariate normal distribution is

1 − 21 (x−µ)t K−1 (x−µ) e φ(x) = n/2 1/2 (2π) |K|

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 13/24

Multivariate Normal Distribution Therefore, Z

h(X1 , X2 , . . . , Xn ) = − φ(x) loga φ(x)dx   Z 1 1 = φ(x) loga (2π)n |K| + (x − µ)t K−1 (x − µ) loga e dx 2 2   1 1 = loga (2π)n |K| + (loga e) E (x − µ)t K−1 (x − µ) 2 2 | {z } =n

1 1 = loga (2π)n |K| + n loga e 2 2 1 = loga (2πe)n |K|  2

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 14/24

Multivariate Normal Distribution = (Y1 , Y2 , . . . , Yn )t be a random vector. If K = E[YY t ], then E[Y t K−1 Y] = n.

Let Y

Proof. Denote



|

 K = E[YY ] =   k1 | t

and



K−1

We have ki Peng-Hua Wang, May 14, 2012

   =   

| k2

| ...

| at1 at2 . . .

atn

= E[Yi Y] and atj ki = δij .



 kn   |

       

Information Theory, Chap. 8 - p. 15/24

Multivariate Normal Distribution Now, 

   t −1 t Y K Y=Y   

at1 at2 . . .

atn





at1 Y



    at Y     2    Y = (Y1 , Y2 , . . . , Yn )    ..    .     atn Y

= Y1 at1 Y + Y2 at2 Y + · · · + Yn atn Y

and

E[Y t K−1 Y] = at1 E[Y1 Y] + at2 E[Y2 Y] + · · · + atn E[Yn Y] = at1 k1 + at2 k2 + · · · + atn kn = n 

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 16/24

8.5 Relative Entropy and Mutual Information

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 17/24

Definitions Definition 6 (Relative entropy) The relative entropy (or KullbackVLeibler distance) D(f ||g) between two densities f (x) and

g(x) is defined as

D(f ||g) =

Z

f (x) f (x) log dx g(x)

Definition 7 (Mutual information) The mutual information I(X; Y ) between two random variables with joint density f (x, y) is defined as

I(X; Y ) =

Peng-Hua Wang, May 14, 2012

Z

f (x, y) f (x, y) log dxdy f (x)f (y)

Information Theory, Chap. 8 - p. 18/24

Example Let (X, Y )

∼ N (0, K) where " K=

Then h(X)

σ

2

ρσ 2

ρσ

2

σ2

#

.

= h(Y ) = 12 log(2πe)σ 2 and

1 1 2 h(X, Y ) = log(2πe) |K| = log(2πe)2 σ 4 (1 − ρ2 ). 2 2 Therefore,

1 I(X; Y ) = h(X) + h(Y ) − h(X; Y ) = − log(1 − ρ2 ). 2

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 19/24

8.6 Properties of Differential Entropy and Related Amounts

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 20/24

Properties Theorem 4 (Relative entropy)

D(f ||g) ≥ 0 with equality iff f

= g almost everywhere. Corollary 1 1. I(X; Y ) ≥ 0 with equality iff X and Y are independent. 1.

h(X|Y ) ≤ h(X) with equality iff X and Y are independent.

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 21/24

Properties Theorem 5 (Chain rule for differential entropy)

h(X1 , X2 , . . . , Xn ) =

n X i=1

h(Xi |X1 , X2 , . . . , Xi−1 )

Corollary 2

h(X1 , X2 , . . . , Xn ) ≤

n X

h(Xi )

i=1

Corollary 3 (Hadamard’s inequality) If K is the covariance matrix of a multivariate normal distribution, then

|K| ≤

Peng-Hua Wang, May 14, 2012

n Y

Kii .

i=1

Information Theory, Chap. 8 - p. 22/24

Properties Theorem 6 1.

h(X + c) = h(X)

2.

h(aX) = h(X) + log |a|.

3.

h(AX) = h(X) + log | det(A)|

Peng-Hua Wang, May 14, 2012

Information Theory, Chap. 8 - p. 23/24

Gaussian has Maximal Entropy ∈ Rn have zero mean and covariance K = E[XXt ]. Then h(X) ≤ 21 log(2πe)n |K|. with equality X ∼ N (0, K) R Proof. Let g(x) be any density satisfying xi xj g(x)dx = Kij . Let φ(x) be the density of N (0, K). Then, Z Z 0 ≤ D(g||φ) = g log(g/φ) = −h(g) − g log φ Z = −h(g) − φ log φ = −h(g) + h(φ) Theorem 7 Let the random vector X

That is, h(g)

Peng-Hua Wang, May 14, 2012

≤ h(φ). Equality holds if g = φ. 

Information Theory, Chap. 8 - p. 24/24

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.