A second course in analysis [PDF]

Aug 4, 2016 - 13.1 A table of Fourier transforms . ... 14.6 What's next: Harmonic analysis on groups . . . . . . . . . .

2 downloads 42 Views 7MB Size

Report

Download PDF

PNG Network

Recommend Stories

Download A Second Course in Statistics

The greatest of richness is the richness of the soul. Prophet Muhammad (Peace be upon him)

[PDF] A First Course in Probability

Life isn't about getting and having, it's about giving and being. Kevin Kruse

[PDF] A Course in Modern Mathematical Physics

Everything in the universe is within you. Ask all from yourself. Rumi

PDF A First Course in Database Systems

Kindness, like a boomerang, always returns. Unknown

PdF A Course in Phonetics Full ePub

When you do things from your soul, you feel a river moving in you, a joy. Rumi

[PDF] Download A Short Course in Photography

Make yourself a priority once in a while. It's not selfish. It's necessary. Anonymous

[PDF] A Short Course in Photography

Be grateful for whoever comes, because each has been sent as a guide from beyond. Rumi

[PDF] A First Course in Database Systems

If you feel beautiful, then you are. Even if you don't, you still are. Terri Guillemets

PdF A First Course in Database Systems

The best time to plant a tree was 20 years ago. The second best time is now. Chinese Proverb

[PDF] A First Course in Optimization Theory

You often feel tired, not because you've done too much, but because you've done too little of what sparks

Idea Transcript

Fourier series, Fourier transforms, and function spaces: A second course in analysis Tim Hsu, San Jos´e State University August 7, 2017

ii

Contents Introduction

vii

1 Overture 1.1 Mathematical motivation: Series of functions . . . . . . . . . . . . . . . . . 1.2 Physical motivation: Acoustics . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 3

I

7

Complex functions of a real variable

2 Real and complex numbers 2.1 Axioms for the real numbers . . . . . . . 2.2 Complex numbers . . . . . . . . . . . . 2.3 Metrics and metric spaces . . . . . . . . 2.4 Sequences in C and other metric spaces 2.5 Completeness in metric spaces . . . . . . 2.6 The topology of metric spaces . . . . . . 3 Complex-valued calculus 3.1 Continuity and limits . . . . . . . . . . . 3.2 Differentiation . . . . . . . . . . . . . . 3.3 The Riemann integral: Definition . . . . 3.4 The Riemann integral: Properties . . . . 3.5 The Fundamental Theorem of Calculus . 3.6 Other results from calculus . . . . . . . 4 Series of functions 4.1 Infinite series . . . . . . . . . . . . . . . 4.2 Sequences and series of functions . . . . 4.3 Uniform convergence . . . . . . . . . . . 4.4 Power series . . . . . . . . . . . . . . . . 4.5 Exponential and trigonometric functions 4.6 More about exponential functions . . . . 4.7 The Schwartz space . . . . . . . . . . . 4.8 Integration on R . . . . . . . . . . . . . iii

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

. . . . . .

. . . . . . . .

. . . . . .

9 9 13 15 17 23 25

. . . . . .

29 30 37 42 50 56 59

. . . . . . . .

71 72 77 81 90 92 96 99 100

iv

II

CONTENTS

Fourier series and Hilbert spaces

5 The 5.1 5.2 5.3

idea of a function space 111 Which clock keeps better time? . . . . . . . . . . . . . . . . . . . . . . . . . 111 Function spaces and metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Dot products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6 Fourier series 6.1 Fourier polynomials . 6.2 Fourier series . . . . . 6.3 Real Fourier series . . 6.4 Convergence of Fourier

. . . . . . . . . . . . . . . series of

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . differentiable functions

7 Hilbert spaces 7.1 Inner product spaces . . . . . . . . . 7.2 Normed spaces . . . . . . . . . . . . 7.3 Orthogonal sets and bases . . . . . . 7.4 The Lebesgue integral: Measure zero 7.5 The Lebesgue integral: Axioms . . . 7.6 Hilbert spaces . . . . . . . . . . . . . 8 Convergence of Fourier series 8.1 Fourier series in L2 (S 1 ) . . . 8.2 Convolutions . . . . . . . . . 8.3 Dirac kernels . . . . . . . . . 8.4 Convergence of Fourier series 8.5 Applications of Fourier series

III

109

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

. . . . . .

. . . . .

. . . .

121 121 123 126 131

. . . . . .

135 135 140 146 151 156 165

. . . . .

171 171 173 174 179 183

Operators and differential equations

197

9 PDE’s and diagonalization 199 9.1 Some PDE’s from classical physics . . . . . . . . . . . . . . . . . . . . . . . 199 9.2 Schr¨ odinger’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 9.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 10 Operators on Hilbert spaces 10.1 Operators on Hilbert spaces . . . 10.2 Hermitian and positive operators 10.3 Eigenvectors and eigenvalues . . 10.4 Eigenbases . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

211 211 216 221 224

CONTENTS

v

11 Eigenbases and differential equations 11.1 The heat equation on the circle . . . 11.2 The eigenbasis method . . . . . . . . 11.3 The wave equation on the circle . . . 11.4 Boundary value problems . . . . . . 11.5 Legendre polynomials . . . . . . . . 11.6 Hermite functions . . . . . . . . . . . 11.7 The quantum harmonic oscillator . . 11.8 Sturm-Liouville theory . . . . . . . .

IV

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

The Fourier transform and beyond

12 The 12.1 12.2 12.3 12.4 12.5

Fourier transform The big picture . . . . . . . . . . . . . . . . Convolutions, Dirac kernels, and calculus on The Fourier transform on S(R) . . . . . . . Inversion and the Plancherel theorem . . . . The L2 Fourier transform . . . . . . . . . .

227 228 232 235 242 248 252 256 258

261 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

263 263 266 271 274 276

13 Applications of the Fourier transform 13.1 A table of Fourier transforms . . . . . . . . . . . . . . 13.2 Linear differential equations with constant coefficients 13.3 The heat and wave equations on R . . . . . . . . . . . 13.4 An eigenbasis for the Fourier transform . . . . . . . . 13.5 Continuous-valued quantum observables . . . . . . . . 13.6 Poisson summation and theta functions . . . . . . . . 13.7 Miscellaneous applications of the Fourier transform . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

281 281 283 285 289 292 298 304

. . . . . .

309 310 310 312 315 318 320

14 What’s next? 14.1 What’s next: 14.2 What’s next: 14.3 What’s next: 14.4 What’s next: 14.5 What’s next: 14.6 What’s next:

. . R . . . . . .

. . . . .

More analysis . . . . . . . . . . . . Signal processing and distributions Wavelets . . . . . . . . . . . . . . . Quantum mechanics . . . . . . . . Spectra and number theory . . . . Harmonic analysis on groups . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

A Rearrangements of series

323

B Linear algebra

327

C Bump functions

331

Index

340

vi

CONTENTS

Introduction Life is uncertain. Eat dessert first. — Ernestine Ulmer This book gives a different answer to the question: What should be covered in a second undergraduate course in analysis? One standard approach to undergraduate Analysis II is to study the theory of integration (either Riemann, Lebesgue, or both). This has the virtue of preparing students for graduate study, or even giving them a head start on their graduate classes. However, for students not headed to a doctoral program, and especially for students thinking about Analysis II as the last analysis class they’ll ever take (or even the last math class they’ll ever take), studying the theory of integration runs the risk of getting involved in a significant amount of technical detail without ever seeing why we would ever find Lebesgue integration useful. Instead, in this course, we eat dessert first, with the following benefits: • Applications: One of our main goals is to show that the theory you learn in analysis can be used to solve “applied” problems rigorously in many different subjects, ranging from partial differential equations, to mathematical quantum mechanics, to signal processing. (And also number theory, if you’re willing to call that an application.) • Theory: Nevertheless, this is definitely a theory course, in which everything is proven rigorously (with one giant exception, see below). Furthermore, students will encounter big, fundamental ideas that show up everywhere in mathematics (and statistics!): metrics, function spaces, problems in convergence, solving analysis problems in terms of linear algebra, and different kinds of approximations. • Future motivation: And finally, even though the design of this book comes from thinking about “the last analysis course you’ll ever take”, students who continue on to graduate study in analysis will actually learn things that are complementary to their first graduate analysis courses, in that they’ll learn why the Lebesgue integral is necessary before they dig in to see how it’s defined. Of course, there’s a catch: To get to dessert, we skip the dinner of developing the theory of the Lebesgue integral. Instead, we axiomatize the properties that we need it to have, and simply stipulate that it exists; see Section 7.5 if you want to check out how this works. Note that we lose no applications, since all of the concrete calculations we need can be done vii

viii

INTRODUCTION

using Riemann integration, and the reader looking for rigor can fill in the (considerable!) gap later in a course on Lebesgue integration. We now briefly describe the rest of this book. After the equivalent of the overture of a musical or opera (Chapter 1), Part I of the book proper can be thought of as a “reboot” of Analysis I: We review the fundamental theory of functions of one real variable, but we revamp the material to allow complex-valued outputs (and occasionally, inputs). Sometimes the content of real-valued functions carries over almost intact, and sometimes we have to repeat or apply the old arguments twice, but the punchline is that we obtain much the same results, with greater generality. Note that of the material in Part I, Chapter 4 is the chapter most likely to be new to the reader, and is also the chapter referred to most often in the rest of the book, as one of our main concerns is the convergence of series of functions. Part II starts our new material by asking the question: How can we determine the “best” approximation of some particular type to a specified function? This leads us to framing the approximation problem not just in terms of (pointwise) convergence, but in terms of function spaces and the L2 metric. We then present one solution to the approximation problem in the form of Fourier series, which are, in a sense to be made precise, the best possible approximations to a (sufficiently nice) periodic function as an infinite series of trigonometric functions. In Part III, we apply the theory of function spaces and Fourier series to solve problems from partial differential equations and quantum mechanics. For much of this material, the basic idea is to express derivatives and integrals in terms of operators on function spaces, and express a given problem in terms of linear algebra, or specifically, as an eigenvalue problem. Part IV gives an introduction to the Fourier transform, which one can think of as a continuous analogue of Fourier series. After establishing the fundamental theory, we return to applications, including second looks at PDE’s and quantum mechanics and a look at the interplay between the Fourier transform and Fourier series. We conclude the book with a brief survey of what the reader might choose to learn next.

About problems in this book Q: What did one math book say to the other math book? A: I’ve got a lot of problems. — National Geographic Kids Almanac 2017 The reader will quickly notice that many, even most, of the results in this book do not come with proofs. Instead, the proofs are left to be done either by the reader or in class, by the instructor; indeed, proving important results of the book is the goal of the great majority of its 361 problems. The idea is that students can cement their mastery of Analysis I by applying it to build what comes next. To suggest some ground rules: • Problems that prove results of the text are marked with a note like (Proves Theorem x.y.z ). To maintain logical consistency, students should only use results appearing

ix before Theorem x.y.z in the book in the proof of Theorem x.y.z. An exception to this rule comes when, for motivational purposes, we introduce the statement of a theorem long before its proof; in those cases, the reader can use results up to the point where the proof occurs, except, of course, for the result to be proven. • Many problems come with suggested approaches or suggestions of key ideas to apply. We hope we have provided enough suggestions to make the problems tractable for students who understand Analysis I well.

How to use this book in a course One reasonable approach to using the problems in this book in a lecture-based class is to do a few proofs (i.e., a few problems) from a given section in lecture, as models; assign some problems as homework, perhaps saving some for in-class exams; and omit the others for the sake of time. The structure of the book also makes it well-suited for an undergraduate capstone course or an independent study/reading course. The amount of the book that can be covered in one semester depends on how much background the instructor chooses to rely on, as this book is designed to be accessible to students with a range of preparation from Analysis I. Specifically, most first courses will have covered through differentiation and the Mean Value Theorem (Section 3.2), albeit only in R and not in C; still others will have reached the Fundamental Theorems of Calculus (Section 3.5); and ambitious courses may have covered series of functions, perhaps up to power series (Section 4.4). In any case, we recommend covering Chapters 2 and 3 at least at the level of review, so students can become familiar with complex-valued versions of the usual Analysis I material, and again, Chapter 4 is likely to be less familiar to even well-prepared students. With that in mind, if the review material is covered thoroughly, one should still be able to get to the heat and wave equations in Chapter 11 comfortably. If the early material is covered more lightly, one should also be able to cover the Fourier transform (Chapter 12) and possibly some of the further applications (Chapter 13). As for which sections can be skipped without harming subsequent material: From Part I, Section 2.6 is optional, and Sections 4.7 and 4.8 are only used for the Fourier transform. From Part II, all of Section 8.5 except Section 8.5.1 is optional. In Part III, Sections 11.1 and 11.2 establish the basic methods of Chapter 11, and after that, instructors or readers can choose applications suited to their interests. Similarly, in Part IV, the applications in Chapter 13 are fairly independent of each other, as are the sections of Chapter 14.

x

INTRODUCTION

Chapter 1

Overture In this chapter, we briefly introduce some of the main themes found throughout this book. Specifically, in Section 1.1, we introduce some mathematical problems that motivate what we study in this book, and in Section 1.2, we introduce just one of the physical applications that motivate what we study.

1.1

Mathematical motivation: Series of functions

As you may have seen in analysis I, or in calculus, the exponential function ex is equal to the following infinite series for all real x: ex =

∞ X xn n=0

n!

=1+x+

x2 x3 + + .... 2! 3!

(1.1.1)

If you’ve forgotten from analysis (or calculus) what it means for a function to be equal to an infinite series, we’ll go over all of that again in Section 4.1; in fact, in Section 4.5, we will actually use (1.1.1) to give a rigorous definition of the exponential function. For now, it’s enough to remember that an infinite series is the limit of its partial sums, that is, the limit as N → ∞ of the sum of its first N terms. In some sense, one main goal of this course is to make sense of the somewhat similarlooking ∞ X n+1 sin(2πnx) x= (−1) nπ n=1 (1.1.2) sin(2πx) sin(4πx) sin(6πx) sin(8πx) = − + − + ..., π 2π 3π 4π which, as it turns out, holds for − 12 < x < 12 . However, at this point, there are several questions you might ask about (1.1.2): • Why on earth would you want to replace the function x with the stuff on the righthand side of (1.1.2)? 1

2

CHAPTER 1. OVERTURE • If you think back to what you know about trig functions, the right-hand side of (1.1.2) is periodic with period 1; how could it be equal to x? Moreover, it can be shown (Problem 1.1.1) that the right-hand side must be equal to 0 when x is any integer multiple of 21 . (At least that explains why (1.1.2) fails for x = ± 12 .) • How on earth can we prove something like (1.1.2)?

Figure 1.1.1: The two sides of (1.1.2), with partial sums 10, 15, and 20 See Section 1.2 for an application that gives one answer to the first question. An answer to the second question, and another partial answer to the first, comes from comparing the graphs of x and the partial sums of the series in (1.1.2), as shown in Figure 1.1.1. We see that the mysterious series of (1.1.2) is not really approximating the function f (x) = x; it’s really approximating the function you get by taking f (x) = x for − 12 < x < 21 and repeating it along the real line with period 1. The last question is harder, so for the moment, we’ll answer it with a question few people would ask at this point: • In what sense do we mean “=” in (1.1.2)? The idea that there might be different ways to say that functions are equal, or approximately equal, is one of the central ideas of this book, and also leads to the very useful idea of function space that is the focus of Part II. But we’re getting ahead of ourselves here, so let’s return to another motivating problem. Again, as you may have seen in analysis I, or calculus, we can take the derivative of the right-hand side of (1.1.1) using term-by-term differentiation: X ∞ ∞ ∞ ∞ X xn−1 xk d X xn X d xn = = = = ex . (1.1.3) dx n! dx n! (n − 1)! k! n=0

n=0

n=1

k=0

1.2. PHYSICAL MOTIVATION: ACOUSTICS

3

Note that the n = 0 term drops out because we are taking the derivative of a constant, and we make the substitution k = n − 1 to get the final equality. In any case, assuming we can d d push into an infinite sum the same as we can push into a finite sum, we see that ex dx dx is its own derivative, or in other words, ex is a solution to the differential equation y 0 = y. As we will see in Part III, one of the main applications of series like the one in (1.1.2) is in solving differential equations, which means that it would be very helpful to have term-byterm differentiation available to us. However, operations that work well with power series can go very wrong with other series. For example, if we try to take the derivative of the series in (1.1.2) term-by-term, we get: ∞

d X (−1)n+1 dx n=1

sin(2πnx) nπ

?

= =

∞ X

n+1

(−1)

n=1 ∞ X

d dx

sin(2πnx) nπ

(1.1.4)

n+1

(−1)

2 cos(2πnx).

n=1

As it turns out, the right-hand side of (1.1.4) diverges for every value of x (see Problem 1.1.2); in any case, it certainly bears no resemblance to the “correct” derivative of 1 (for x not an odd integer multiple of 12 ). The moral here is that even if we have a series expansion for a function, as in (1.1.2), there is no reason to think we can take term-by-term derivatives of that series and still be sure that everything works. Again, finding conditions under which series like (1.1.2) are “durable” enough to survive taking derivatives is another one of the central problems of this course.

Problems 1.1.1. Use your calculus knowledge of trig functions to prove that the right-hand side of (1.1.2) is equal to 0 whenever x = k/2 for some k ∈ Z. 1.1.2. Use your calculus knowledge of trig functions to prove that if x = p/q, where p, q ∈ Z, q 6= 0, then cos(2πnx) = 1 for infinitely many n > 0. (Note that it follows that lim cos(2πnx) 6= 0, and therefore, by the nth term test from calculus, analysis I, or n→∞

Corollary 4.1.10, the series in (1.1.4) diverges.) Note: If x is irrational, one can also show, with more effort, that cos(2πnx) is very close to 1 for infinitely many n > 0, with much the same consequences.

1.2

Physical motivation: Acoustics

For real-valued functions, the fundamental mathematical problem of Part II of this book can be expressed in the following (slightly simplified) manner:

4

CHAPTER 1. OVERTURE Suppose we have a function f on R that is periodic with period 1 (i.e., f (t + 1) = f (t) for all t ∈ R). When can we express f as a series of the form f (t) =

∞ X

(an cos(2πnt) + bn sin(2πnt))?

(1.2.1)

n=1

A series of the form (1.2.1), or its complex-valued generalization that we will see in Chapter 6, is called a Fourier series. While the above question is both concise and relatively precise, it may seem somewhat unmotivated to the first-time reader. One motivation for this question comes from the study of acoustics, or the study of sound waves. In mathematical acoustics, we model an idealized periodic sound wave, or tone, as a function f : R → R of time t that is periodic with period 1. See Figure 1.2.1 for a simulation of what this might look like.

Figure 1.2.1: Two cycles of a simulated sound wave Acoustically, we may then interpret (1.2.1) as follows. • The fact that f has domain S 1 means that the fundamental frequency of the tone represented by f is 1. (The reader should not worry that this somehow limits our discussion, as we may use this same setting to study tones having any fundamental frequency we want, by adjusting the units of time t to be the reciprocal of the fundamental frequency.) • The summand a1 cos(2πt) + b1 sin(2πt) is called the first harmonic of the tone, and the quantity a21 + b21 represents the amount of energy of the tone contained in its first harmonic. • Similarly, the summand a2 cos(4πt) + b2 sin(4πt) is called the second harmonic of the tone, and a22 + b22 represents the amount of energy contained in the second harmonic. Using calculuations the reader may recall from precalculus, we see that the frequency of the second harmonic is twice the fundamental frequency. • In general, the summand an cos(2πnt) + bn sin(2πnt) is called the nth harmonic of the tone, a2n + b2n represents the amount of energy contained in the second harmonic, and the frequency of the nth harmonic is n times the fundamental frequency.

1.2. PHYSICAL MOTIVATION: ACOUSTICS • The infinite sum

∞ X

5

(a2n + b2n ), which, as we shall see, converges given only mild

n=1

assumptions on f , represents the total energy of the tone. One of the central ideas of acoustics (see the epigraph to Chapter 6) is that the distinctive sound quality, or timbre, of a given tone, is determined by the relative strengths of its harmonics. Higher harmonics can be exhibited physically on most any musical stringed instrument. For example, the first picture in Figure 1.2.2 shows the C string of a cello being played at a fundamental frequency of roughly 65.4 Hz (cycles per second). In the second picture, we see how lightly placing a finger exactly halfway down the C string suppresses the odd harmonics, leaving only the natural even harmonics of the C string and producing a sound that not only has a fundamental frequency of 130.8 Hz (in musical terms, up one octave) but also has a timbre that is purer, or at least less complex, than the ordinary sound of the cello C string. For mathematical details of this explanation, see Remark 11.4.6.

Figure 1.2.2: Harmonics in action on a cello We hope this brief discussion gives some idea of how obtaining a decomposition like (1.2.1) might provide a lot of information about a given tone. For a readable and mathematically sound introduction to the rest of the subject of Fourier series and acoustics, much of which should be accessible to a reader who understands Part II of this book, see Alm and Walker [AW02].

6

CHAPTER 1. OVERTURE

Part I

Complex functions of a real variable

7

Chapter 2

Real and complex numbers Feynman used to tell the most complex stories — part real, part imaginary. — Gen. Donald Kutyna, quoted in What Do You Care What People Think? by Richard Feynman In this chapter, after briefly reviewing the axiomatic characterization of the real numbers (Section 2.1), we define the complex numbers (Section 2.2) and establish the complexvalued versions of sequences and their key properties (Section 2.4). We also introduce the general concept of a metric space (Section 2.3), including completeness (Section 2.5) and the topology of metric spaces (Section 2.6).

2.1

Axioms for the real numbers

A first course in analysis explains how the axioms for the real numbers (Definition 2.1.1) imply the usual properties of the real numbers, including calculus. In this section, we briefly summarize those axioms and some of their immediate consequences. Note that we present this material in condensed form to serve as a quick reference, so the reader should feel free to skim this section for now and refer back to it as necessary. For a more in-depth discussion, see, for example, Ross [Ros13] or Rudin [Rud76]. Definition 2.1.1. Let R be a set on which two binary operations + and · and a binary relation ≤ are defined. Consider the following axioms on the system (R, +, ·, ≤). (A1) For all a, b, c ∈ R, (a + b) + c = a + (b + c). (A2) For all a, b ∈ R, a + b = b + a. (A3) There exists 0 ∈ R such that for all a ∈ R, a + 0 = a. (A4) For all a ∈ R, there exists (−a) ∈ R such that a + (−a) = 0. (M1) For all a, b, c ∈ R, (a · b) · c = a · (b · c). (M2) For all a, b ∈ R, a · b = b · a. 9

10

CHAPTER 2. REAL AND COMPLEX NUMBERS

(M3) There exists 1 ∈ R, s.t. for all a ∈ R, a · 1 = a. (DL) For all a, b, c ∈ R, a · (b + c) = a · b + a · c. (F1) For all a 6= 0 in R, there exists a−1 ∈ R such that a · a−1 = 1. (F2) 1 6= 0. (O1) For all a, b ∈ R, either a ≤ b or b ≤ a. (O2) For all a, b ∈ R, if a ≤ b and b ≤ a, then a = b. (O3) For all a, b, c ∈ R, if a ≤ b and b ≤ c, then a ≤ c. (O4) For all a, b, c ∈ R, if a ≤ b, then a + c ≤ b + c. (O5) For all a, b, c ∈ R, if a ≤ b and 0 ≤ c, then ac ≤ bc. (OC) (Order completeness) Every nonempty subset of R that has an upper bound also has a least upper bound, or supremum. To give a bit more detail about (OC), for S ⊆ R, S 6= ∅, to say that u is an upper bound for S means that for all x ∈ S, x ≤ u; and to say that U is the supremum of S means both that U is an upper bound for S and also that for any other upper bound u for S, U ≤ u. If R satisfies axioms (A1)–(A4), we say that R is an additive abelian group, with additive identity 0. If R satisfies (A1)–(A4), (M1)–(M3), and (DL), we say that R is a commutative ring with unity, where the unity is 1. If R satisfies (A1)–(A4), (M1)–(M3), (DL), and (F1)–(F2), we say that R is a field. Suppose R is a field (i.e., R satisfies (A1)–(A4), (M1)–(M3), (DL), and (F1)–(F2)). A relation ≤ on R that satisfies (O1)–(O3) is called a total order on R; (O4) means that the ordering is preserved under addition, and (O5) means that the ordering is preserved under multiplication by “nonnegative” elements. If there exists some relation ≤ on R that satisfies (O1)–(O5), we say that R is orderable, and we say that R with ≤ is an ordered field; if no such relation exists, we say that R is not orderable. Given the usual properties of the integers, it can be shown that there exists a unique algebraic object that satisfies all of the axioms of Definition 2.1.1; for example, this can be done by means of Dedekind cuts (see Rudin [Rud76, App. to Ch. 1]). We call this (unique) algebraic object R, the field of real numbers. We note some other facts about Definition 2.1.1. • The element 0 is unique in an additive abelian group, and the element 1 is unique in a commutative ring with unity (Problem 2.1.1). • The classes of commutative rings with unity, rields, ordered fields, and R form a sequence of proper containments (Problem 2.1.2). In other words, every commutative ring is a field, but not every field is a commutative ring, and so on.

2.1. AXIOMS FOR THE REAL NUMBERS

11

• We use the relation ≤ to define all of the other usual kinds of inequalities. For example, to say a ≥ b means that b ≤ a, and to say that a < b means that a ≤ b and a 6= b. • The reader may have previously seen axiom (OC) called simply completeness, and not order completeness. We use the term “order completeness” to distinguish from a more general notion of completeness that will be important later (Definition 2.5.4). Returning to our main discussion, we have the following consequences of the axioms of an ordered field. Theorem 2.1.2. Let R be an ordered field and a, b, c ∈ R. 1. a ≥ 0 if and only if −a ≤ 0. 2. If a ≤ b and c ≤ 0, then ac ≥ bc. 3. a2 ≥ 0. 4. −1 < 0 < 1. Proof. Problem 2.1.3. As mentioned earlier, much of a first course in analysis is concerned with consequences of (order) completeness. First, we recall, without proof, two initial consequences of completeness: the Archimedean Property and the density of the rationals in the reals. Theorem 2.1.3. In R, we have that: 1. (Archimidean Property) For any a ∈ R, there exists an integer n such that n > a. 2. (Density of Q in R) For a, b ∈ R such that a < b, there exists some r ∈ Q such that a < r < b. We also note two characterizations of suprema that will be useful later. Theorem 2.1.4 (Arbitrarily Close Criterion). Suppose S is a nonempty subset of R, and suppose u is an upper bound for S. Then the following are equivalent: 1. For every > 0, there exists some s ∈ S such that u − s < . 2. u = sup S. Proof. Problem 2.1.5. Theorem 2.1.5 (Sup Inequality Trick). If S is a nonempty bounded subset of R, then sup S ≤ u if and only if u is an upper bound for S. Proof. Problem 2.1.6.

12

CHAPTER 2. REAL AND COMPLEX NUMBERS

Problems 2.1.1. (a) Prove that if R is an additive abelian group, and 0 and 00 both satisfy (A3), then 0 = 00 . (b) Prove that if R is a commutative ring with unity, and 1 and 10 both satisfy (M3), then 1 = 10 . 2.1.2. This problem shows that the containments of classes of algebraic objects in Definition 2.1.1 are proper. (a) Prove that the integers Z are a commutative ring with identity, but not a field. (b) Prove that the set {0, 1}, with addition and multiplication (mod 2), is a field, but is not orderable. (c) Prove that the rational numbers Q are an ordered field, but do not satisfy the ordercompleteness axiom (OC). 2.1.3. (Proves Theorem 2.1.2 ) Let R be an ordered field and a, b, c ∈ R. In the following, you may use the field axioms freely (i.e., you may assume that arithmetic operations work in the usual way in R). (a) Prove that a ≥ 0 if and only if −a ≤ 0. (b) Prove that if a ≤ b and c ≤ 0, then ac ≥ bc. (c) Prove that a2 ≥ 0. (Suggestion: Consider the cases a > 0, a = 0, a < 0.) (d) Prove that −1 < 0 < 1. 2.1.4. (a) Give definitions of lower bound and greatest lower bound (or infimum) analogous to the definitions of upper bound and least upper bound in Definition 2.1.1. (b) Prove that any nonempty subset of R that is bounded below has a greatest lower bound (i.e., an infimum). 2.1.5. (Proves Theorem 2.1.4 ) Suppose S is a nonempty subset of R, and suppose u is an upper bound for S. Prove that the following are equivalent: • For every > 0, there exists some s ∈ S such that u − s < . • u = sup S. (Suggestion: Prove that the negations of each condition are equivalent.) 2.1.6. (Proves Theorem 2.1.5 ) Prove that if S is a nonempty bounded subset of R, then sup S ≤ u if and only if u is an upper bound for S.

2.2. COMPLEX NUMBERS

2.2

13

Complex numbers

Compared with the logical jump from rational numbers to real numbers, the jump from real numbers to complex numbers is relatively straightforward. Definition 2.2.1. The complex numbers C are defined as follows. 1. Set: Formally, C is the set of all pairs (a, b), where a, b ∈ R. However, instead of (a, b), we write a + bi. 2. Operations: Addition and multiplication of complex numbers is defined like the addition and multiplication of polynomials in the variable i, but with the additional rule i2 = −1. Formally, that means: (a, b) + (c, d) = (a + c, b + d), (a, b) · (c, d) = (ac − bd, ad + bc),

(2.2.1)

or in other words, (a + bi) + (c + di) = (a + c) + (b + d)i, (a + bi) · (c + di) = ac + adi + bci + bdi2

(2.2.2)

= (ac − bd) + (ad + bc)i. Since, formally speaking, elements of C are pairs of real numbers, we draw x + yi ∈ C as the point (x, y) ∈ R2 . This picture is called the complex plane (Figure 2.2.1).

y 3 x −2 Figure 2.2.1: The point 3 − 2i in the complex plane It is relatively straighforward, and fairly uninteresting, to verify that Definition 2.2.1 gives C the structure of a commutative ring with zero element 0 = 0+0i and unity 1 = 1+0i (Problem 2.2.1). Verifying that C is a field is a bit more interesting, and uses the following important ideas. Definition 2.2.2. Let a + bi be a complex number. The complex conjugate, or simply conjugate, of a + bi is defined to be a + bi = a − bi.

(2.2.3)

14

CHAPTER 2. REAL AND COMPLEX NUMBERS

The absolute value, or norm, of a + bi is defined to be p |a + bi| = a2 + b2 .

(2.2.4)

Note that if x is a real number, then the absolute value |x| = usual absolute value in the reals.

√

x2 is consistent with the

Complex conjugation and absolute values have the following straightforward but crucial properties. Theorem 2.2.3. For z, w ∈ C, we have that 1. z = z; 2. z + w = z + w and zw = z · w; 3. zz = |z|2 ; 4. |zw| = |z| |w|; 5. |z| = |z|; 6. |z| ≥ 0, and |z| = 0 if and only if z = 0; and z 7. If z 6= 0, then z = 1. |z|2 Consequently, C is a field. Proof. Problem 2.2.2. On a related note, we introduce the following terminology and notation. Definition 2.2.4. For z = a + bi ∈ C, we define d(x, z) + d(z, y), or in other words, one cannot get a shortcut from x to y by travelling through a third point z. The truth of this property for triangles in the (complex) plane is shown in Figure 2.3.1.

z x

No shortcuts via extra destinations y

Figure 2.3.1: The triangle inequality Returning to our main discussion, as mentioned above, we must prove that Examples 2.3.2–2.3.3 actually define metrics, which we do in the following theorem. Theorem 2.3.4. For z, w ∈ C, we have the following. 1. (Cauchy-Schwarz inequality) | 0, there exists n→∞

some N () such that if n > N (), then |an − L| < . To prove property (1), choose some integer K > N (1). By the definition of limit, for n > K, we know that |an − L| < 1, which means that |an | ≤ |an − L| + |L| < |L| + 1, (2.4.1) by the triangle inequality. Therefore, since {|a1 | , . . . , |aK |} is a finite set, we see that for all n, |an | < M , where M = max {|a1 | , . . . , |aK | , L + 1}. For property (2), see Problem 2.4.1. The limit laws for sequences are also proven in a manner similar to their real-valued versions. Theorem 2.4.6. Let an and bn be sequences in C, and suppose that lim an = L, lim bn = n→∞ n→∞ M , and c ∈ C. Then we have that: 1. lim can = cL; n→∞

2. lim (an + bn ) = L + M ; n→∞

3. lim an = L; n→∞

4. lim an bn = LM ; n→∞

5. If L 6= 0, then lim

n→∞

1 1 = ; and an L

6. If an is real-valued, and an ≤ K for all n, then lim an = L ≤ K. n→∞

2.4. SEQUENCES IN C AND OTHER METRIC SPACES

19

Proof. The proof of properties (1) and (2) are in Problems 2.4.2 and 2.4.3. Property (3) follows by Remark 2.3.8 and the definition of limit. For property (4), fix > 0. By Theorem 2.4.5(1), we know that there exists some K such that |bn | < K for all n. Furthermore, by the definition of lim an = L, there exists n→∞ some Na = Na such that |an − L| < for all n > Na , and by the definition of 2K 2K such that |bn − M | < lim bn = M , there exists some Nb = Nb for all n→∞ 2 |L| + 1 2 |L| + 1 n > Nb . Therefore, for n > N () = max(Na , Nb ), we have that |an bn − LM | = |an bn − Lbn + Lbn − LM | ≤ |an − L| |bn | + |L| |bn − M | (triangle inequality) K + |L| < 2K 2 |L| + 1 < + = . 2 2

(2.4.2)

Finally, for property (5), again fix > 0. By Theorem 2.4.5(2), there exists some K ! |L| |L|2 such that if n > K, then |an | > ; and by definition, there exists some Na = Na 2 2 |L|2 for all n > Na . Then for n > N () = max(K, Na ), we see that 2 1 L − an 1 |L|2 2 1 − = < |L − an | < = . (2.4.3) an L an L (|L| /2) |L| 2 |L|2

such that |an − L| <

The theorem follows. Next, we would like to ensure that, for example, the limit of a convergent real-valued sequence is real and the limit of a convergent nonnegative real-valued sequence is nonnegative. We may generalize both those ideas using the following terminology, which will also be useful in other circumstances. Definition 2.4.7. For x ∈ C and r ≥ 0, the open disc of radius r around x is defined to be Nr (x) = {y ∈ C | |y − x| < r} ,

(2.4.4)

and the closed disc of radius r around x is defined to be Nr (x) = {y ∈ C | |y − x| ≤ r} .

(2.4.5)

Definition 2.4.8. Let V be a subset of C, and let V c = C − V be the complement of V in C. To say that V is closed in C means that for every x ∈ V c , there exists some > 0 such that the open disc N (x) is contained in V c . See Figure 2.4.1, where the shaded area and its boundary represent a closed set V .

20

CHAPTER 2. REAL AND COMPLEX NUMBERS

x ε V

Figure 2.4.1: A closed set in C and a point x in its complement Theorem 2.4.9. If V is a closed subset of C and an is a sequence in V such that lim an = n→∞ L, then L is contained in V . In particular: 1. If an is real-valued, L is real. 2. If an is real-valued and an ≤ K for all n, then L ≤ K. 3. For x ∈ C and r > 0, if |an − x| ≤ r for all n, then |L − x| ≤ r. 4. If an is real-valued, a < b, and an ∈ [a, b] for all n, then L ∈ [a, b]. 5. For b1 < b2 and c1 < c2 , let V be the set of all x + yi ∈ C such that x ∈ [b1 , b2 ] and y ∈ [c1 , c2 ]. If an is a sequence in V , then L ∈ V . The set in statement (5), above, is known as a rectangle, as it is drawn as the rectangle [b1 , b2 ] × [c1 , c2 ] in the complex plane (or in R2 ). Proof. The first statement is proved in Problem 2.4.4. The other claims then reduce to showing that certain subsets of C are closed, and this is found in Problem 2.4.5. It will also occasionally be useful for us to consider the convergence of a complex sequence in terms of its real and imaginary parts, and vice versa. The following theorem describes the necessary equivalence. Theorem 2.4.10. Let zn = xn + yn i be a complex sequence with real and imaginary parts xn and yn , respectively, and let L = a + bi ∈ C have real and complex parts a and b, respectively. Then lim zn = L if and only if lim xn = a and lim yn = b. n→∞

n→∞

n→∞

Proof. Problem 2.4.6. Staying with real-valued sequences for a moment, the idea of supremum is also related to the limit of a real-valued sequence in several ways. First, we have the following refinement of the Arbitrarily Close Criterion (Theorem 2.1.4). Theorem 2.4.11 (Arbitrarily Close Criterion, redux). Suppose S is a nonempty subset of R, and suppose u is an upper bound for S. Then the following are equivalent: 1. u = sup S.

2.4. SEQUENCES IN C AND OTHER METRIC SPACES

21

2. For every > 0, there exists some s ∈ S such that u − s < . 3. There exists a sequence xn in S such that lim xn = u. n→∞

Proof. Problem 2.4.7. We also recall the following fact. Theorem 2.4.12 (Convergence of Monotone Sequences). Let an be a real-valued increasing sequence that is bounded above, and let S = {an } (the set of all values attained by an ). Then an converges to sup S. Proof. Problem 2.4.8. Later, we will find it useful to extend the definition of the limit of a sequence (Definition 2.4.3) to an arbitrary metric space, as follows. Definition 2.4.13. For a sequence an in a metric space X and L ∈ X, to say that lim an = n→∞

L means that for every > 0, there exists some N () ∈ R such that if n > N (), then d(an , L) < . The terms convergent, divergent, and so on, are also used as they are with complex-valued sequences. Note that Definition 2.4.3 is precisely the special case of Definition 2.4.13 where d(an , L) = |an − L|. When considering limits of sequences in a general metric space, we often need a tool like the following to prove the convergence of particular examples. Lemma 2.4.14 (Metric Squeeze Lemma). Let xn be a sequence in a metric space X, and suppose that for some L ∈ X, there exists a sequence dn in R such that d(xn , L) < dn for all n and lim dn = 0. Then lim xn = L. n→∞

n→∞

Proof. Problem 2.4.9. Finally, it will be useful to be able to describe a subset Y of a metric space X that contains points arbitrarily close to every point of X. The following theorem and definition make this idea precise. Theorem 2.4.15. Let X be a metric space and Y a subset of X. Then the following conditions are equivalent: 1. For every x ∈ X and every > 0, there exists some y ∈ Y such that d(x, y) < . 2. For every x ∈ X, there exists some sequence yn in Y such that lim yn = x. n→∞

Proof. Problem 2.4.10. Definition 2.4.16. To say that a subset Y of a metric space X is dense in X means that either (and therefore, both) of the conditions of Theorem 2.4.15 hold. Example 2.4.17. The rationals Q are a dense subset of the metric space R (Problem 2.4.11).

22

CHAPTER 2. REAL AND COMPLEX NUMBERS

Problems In Problems 2.4.1–2.4.3, let an and bn be sequences in C. 2.4.1. (Proves Theorem 2.4.5 ) Prove that if lim an = L 6= 0, then there exists some real n→∞

|L| number K such that if n > K, then |an | > . (Suggestion: Choose an in the definition 2 of limit that will ensure that an is close to L, and therefore, far enough away from 0.) 2.4.2. (Proves Theorem 2.4.6 ) Prove that if lim an = L and c ∈ C, then lim can = cL. n→∞

n→∞

2.4.3. (Proves Theorem 2.4.6 ) Prove that if lim an = L and lim bn = M , then lim (an + n→∞ n→∞ n→∞ bn ) = L + M . (Suggestion: Use an trick.) 2 2.4.4. (Proves Theorem 2.4.9 ) Prove that if V is a closed subset of C (Definition 2.4.8), and an is a sequence in V such that lim an = L, then L ∈ V . (Suggestion: Contradiction.) n→∞

2.4.5. (Proves Theorem 2.4.9 ) Prove that the following subsets of C are closed (Definition 2.4.8). (a) The real line R (i.e., the x-axis in the complex plane). (b) For a fixed K > 0, the set of all x ∈ R such that x ≥ K. (c) For a fixed x ∈ C and r > 0, the closed disc Nr (x) (i.e., the set of all y ∈ C such that |y − x| ≤ r). (d) For a < b, the real interval [a, b]. (e) For b1 < b2 and c1 < c2 , the set of all x + yi ∈ C such that x ∈ [b1 , b2 ] and y ∈ [c1 , c2 ]. 2.4.6. (Proves Theorem 2.4.10 ) Let zn = xn + yn i be a complex sequence with real and imaginary parts xn and yn , respectively, and let L = a + bi ∈ C have real and complex parts a and b, respectively. (a) Prove that if |zn − L| < , then |xn − a| < and |yn − b| < . (b) Prove that if lim zn = L, then lim xn = a and lim yn = b. n→∞

n→∞

n→∞

(c) Prove that if |xn − a| < /2 and |yn − b| < /2, then |zn − L| < . (d) Prove that if lim xn = a and lim yn = b, then lim zn = L. n→∞

n→∞

n→∞

2.4.7. (Proves Theorem 2.4.11 ) Let S be a nonempty subset of R, and let u be an upper bound for S. Given Theorem 2.1.4, the following suffices to obtain Theorem 2.4.11. (a) Suppose that for every > 0, there exists some s ∈ S such that u − s < . Prove that there exists a sequence xn in S such that lim xn = u. (Suggestion: = 1/n.) n→∞

(b) Now suppose that u 6= sup S and xn is a convergent sequence in S. Prove that lim xn < u. (Suggestion: Try contradiction.) n→∞

2.5. COMPLETENESS IN METRIC SPACES

23

2.4.8. (Proves Theorem 2.4.12 ) Let an be a real-valued increasing sequence that is bounded above, let S = {an } (the set of all values attained by an ), and let u = sup S. Prove that lim an = u. (Suggestion: Use the Arbitrarily Close Criterion to find some aN close to u.) n→∞

2.4.9. (Proves Lemma 2.4.14 ) Let xn be a sequence in a metric space X, and suppose that for some L ∈ X, there exists a real-valued sequence dn such that d(xn , L) < dn for all n and lim dn = 0. Prove that lim xn = L. n→∞

n→∞

2.4.10. (Proves Theorem 2.4.15 ) Prove that the conditions of Theorem 2.4.15 are equivalent. (Suggestion: = 1/n.) 2.4.11. Prove that Q is a dense subset of the metric space R. (Suggestion: Theorem 2.1.3.)

2.5

Completeness in metric spaces

As mentioned in Section 2.3, to study the analysis of complex-valued functions, we need a replacement for the idea of order completeness, and here it is. Definition 2.5.1. Let an be a complex-valued sequence an . To say that an is Cauchy means that for every > 0, there exists some N () ∈ R such that if n, k > N (), then |an − ak | < . More generally, if an is a sequence in a metric space X, to say that an is Cauchy means that for every > 0, there exists some N () ∈ R such that if n, k > N (), then d(an , ak ) < . Now, on the one hand, any convergent sequence is Cauchy; in fact, this holds in any metric space. Theorem 2.5.2. Let an be a convergent sequence in a metric space X. Then an must be Cauchy. Proof. If lim an = L, then by definition, for any a > 0, there exists some Na () ∈ R such n→∞ that if n > Na (a ), then d(an , L) < a . For > 0, let N () = Na . Then for n, k > N (), 2 by the triangle inequality, d(an , ak ) ≤ d(an , L) + d(L, ak ) <

+ = . 2 2

(2.5.1)

The theorem follows. On the other hand, the following gives an example of a metric space X and a Cauchy sequence in X that does not converge to a limit in X. Example 2.5.3. Let X = Q, with the usual metric d(x, y) = |x − y|, and let an be a rational-valued sequence whose limit is irrational. Then by Theorem 2.5.2, an is Cauchy, as it converges in R, but an does not have a limit in X = Q.

24

CHAPTER 2. REAL AND COMPLEX NUMBERS

While Example 2.5.3 may seem like a cheap trick, it actually conveys the idea behind metric spaces in which the converse of Theorem 2.5.2 fails: Such spaces (e.g., Q) have “holes”, or put another way, are in some sense incomplete. We make the idea of completeness, or not having holes, precise with the following definition. Definition 2.5.4. To say that a metric space X is Cauchy complete, or simply complete, means that any Cauchy sequence in X converges to some limit in X. The main goal of the rest of this section is to prove that C is a (Cauchy) compete metric space (Corollary 2.5.8). We begin to sneak up on this result with the following lemma. Lemma 2.5.5. If an is a Cauchy sequence in C, then an is bounded. Proof. Problem 2.5.1. The following theorem, which, for brevity, we merely quote without proof, is one of the deeper and more difficult results of Analysis I. Theorem 2.5.6 (Bolzano-Weierstrass). Every bounded sequence in R has a convergent subsequence. The Bolzano-Weierstrass Theorem is the key technical tool of Analysis I. For example, it can be used to prove the (Cauchy) completeness of the real numbers as follows. Theorem 2.5.7. The real numbers are a complete metric space. Proof. Let an be a Cauchy sequence in R, so by definition, there exists some Na (a ) ∈ R such that if n, k > Na (a ), then |an − ak | < a . By Lemma 2.5.5, an is bounded, so by Bolzano-Weierstrass (Theorem 2.5.6), there exists some subsequence ank such that lim ank = L for some L ∈ C, which means that there exists some Nb (b ) ∈ R such that if k→∞

k > Nb (b ), then |ank − L| < b . , Nb , and suppose that m > N (). Then Given > 0, let N () = max Na 2 2 choosing any k > m, and using the fact that nk ≥ k > Na , we see that 2 |am − L| ≤ |am − ank | + |ank − L| <

+ = . 2 2

(2.5.2)

The theorem follows. We may then bootstrap the completeness of R up to the completeness of C. Corollary 2.5.8. The complex numbers are a complete metric space. Proof. Problem 2.5.2. Alternately, we may prove Corollary 2.5.8 by first proving that Bolzano-Weierstrass holds for sequences in C, a result of independent interest.

2.6. THE TOPOLOGY OF METRIC SPACES

25

Theorem 2.5.9 (Bolzano-Weierstrass in C). Every bounded sequence in C has a convergent subsequence. Proof. Problem 2.5.3. We may then use the argument from Theorem 2.5.7 to obtain Corollary 2.5.8; see Problem 2.5.4. Remark 2.5.10. Finally, we note that even though we have used Bolzano-Weierstrass to prove that R and C are complete metric spaces, Bolzano-Weierstrass may not hold even in a complete metric space; see Example 7.2.20.

Problems 2.5.1. (Proves Lemma 2.5.5 ) Let an be a Cauchy sequence in C. Prove that an is bounded. (Suggestion: Imitate the proof of Theorem 2.4.5, but replace the fact that all but finitely many terms in an are close to L with the fact that all but finitely many terms are close to some ak .) 2.5.2. (Proves Corollary 2.5.8 ) Let zn be a Cauchy sequence in C, and let an and bn be the real and imaginary parts of zn (i.e., zn = an + bn i). (a) Prove that an and bn are (real) Cauchy sequences. (b) Prove that zn converges to some limit L ∈ C. 2.5.3. (Proves Theorem 2.5.9 ) Let zn be a sequence in C, and let an and bn be the real and imaginary parts of zn (i.e., zn = an + bn i). Assume that zn is bounded, i.e., that there exists some M ∈ R such that |zn | < M for all n. (a) Prove that an and bn are bounded. (b) Prove that there exists some subsequence znk such that lim znk = L for some L ∈ C. k→∞

(Suggestions: Use 2.4.6. Note that by definition, a subsequence znk has real and imaginary parts ank and bnk with the same nk .) 2.5.4. (Proves Corollary 2.5.8 ) Let zn be a Cauchy sequence in C. Use Problem 2.5.3 to prove that zn converges to some limit L ∈ C.

2.6

The topology of metric spaces

The material in this section will not be used in the rest of this book. However, we have already encountered, and will continue to use, special cases of some of the main ideas (e.g., closed subsets and dense subsets), so we present this digression for the reader who is interested in further context for those ideas. We begin with the following terminology.

26

CHAPTER 2. REAL AND COMPLEX NUMBERS

Definition 2.6.1. For a metric space X, x ∈ X, and a real number r ≥ 0, the open r-neighborhood of x, written Nr (x), is defined to be Nr (x) = {y ∈ X | d(x, y) < r} ;

(2.6.1)

and the closed r-neighborhood of x, written Nr (x), is defined to be Nr (x) = {y ∈ X | d(x, y) ≤ r} .

(2.6.2)

In particular, if X = C, x ∈ X, and r ≥ 0, then Nr (x) is called the open disc of radius r around x, and Nr (x) is called the closed disc of radius r around x. Let X be a metric space. The following ideas define what is known as the topology of X. (See Remark 2.6.9 for a brief discussion of topology in general.) Definition 2.6.2. We define an open subset U of X to be a set such that, for every x ∈ X, there exists some > 0 such that x ∈ N (x) ⊆ U . Definition 2.6.3. We define a closed subset V of X to be a set such that, for every convergent sequence xn in V , lim xn ∈ V . n→∞

The above definitions are complementary (pun intended) in the following precise sense. Theorem 2.6.4. Let X be a metric space. Then U ⊆ X is open if and only if its complement X − U is closed. Proof. Problems 2.6.1 and 2.6.2. It follows that Definitions 2.4.8 and 2.6.3 are consistent in the case X = C. (Compare Theorem 2.4.9.) Example 2.6.5. As one might hope, for a metric space X, x ∈ X, and a real number r ≥ 0, the open r-neighborhood Nr (x) is an open subset of X, and the closed r-neighborhood Nr (x) is a closed subset of X (Problems 2.6.3 and 2.6.4). As discussed in Section 2.5, the Bolzano-Weierstrass Theorem (Theorem 2.5.9) is one of the key results of analysis. It can be generalized to a more abstract setting using the following idea. Definition 2.6.6. To say that a subset C of a metric space X is compact means that every sequence in C has a subsequence that converges to an element of C. In those terms, we have the following corollary of Theorem 2.5.9. Corollary 2.6.7 (Generalized Bolzano-Weierstrass). Every closed and bounded subset of C is compact. Proof. Problem 2.6.5.

2.6. THE TOPOLOGY OF METRIC SPACES

27

Remark 2.6.8. Note that while Corollary 2.6.7 gives valuable insight into what we might expect a compact set to be like, the reader may be wondering why it is only stated for subsets of C, and not for metric spaces in general. The perhaps surprising reason is that Corollary 2.6.7 does not hold in a general metric space! See Example 7.2.20. Remark 2.6.9. Note that if X is a metric space, then the collection of open subsets of X has the following properties: • Both ∅ and X are open. • S If {Uα } is a (possibly uncountable) collection of open subsets of X, then the union Uα is also an open subset of X (Problem 2.6.6). • If {U1 , . . . , Un } is a finite collection of open subsets of X, then the intersection

n \

Ui

i=1

is also an open subset of X (Problem 2.6.7). A topology on an arbitrary set X is a choice of a collection of subsets of X, called the open subsets of X, that have all three of the above properties. The closed subsets of X are then defined to be those sets whose complements are open. The subject of point-set topology studies how ideas like compactness and dense subspaces can be developed using the above axiomatic properties in place of, for example, a metric. The reader interested in learning more may consult, for example, Munkres [Mun00].

Problems 2.6.1. (Proves Theorem 2.6.4 ) Let X be a metric space and U a subset of X. Prove that if U is not open, then V = X − U is not closed. (Suggestion: Write out what it means to say that U is not open and V is not closed.) 2.6.2. (Proves Theorem 2.6.4 ) Let X be a metric space and V a subset of X. Prove that if V is not closed, then U = X − V is not open. (Suggestion: Write out what it means to say that V is not closed and U is not open.) 2.6.3. Let X be a metric space, x ∈ X, and r ≥ 0 a real number. Prove that the open r-neighborhood Nr (x) (Definition 2.6.1) is an open subset of X. (Suggestion: Triangle inequality.) 2.6.4. Let X be a metric space, x ∈ X, and r ≥ 0 a real number. Prove that the closed r-neighborhood Nr (x) (Definition 2.6.1) is a closed subset of X. (Suggestion: Prove that the complement of Nr (x) is open.) 2.6.5. (Proves Corollary 2.6.7 ) Prove that if C is a closed and bounded subset of C, then C is compact. (Suggestion: Apply Theorem 2.5.9.) 2.6.6. Let X be a metric space and S let {Uα } be a (possibly uncountable) collection of open subsets of X. Prove that the union Uα is also an open subset of X.

28

CHAPTER 2. REAL AND COMPLEX NUMBERS

2.6.7. Let X be a metric space and let {U1 , . . . , Un } be a finite collection of open subsets n \ of X. Prove that the intersection Ui is also an open subset of X. i=1

Chapter 3

Complex-valued calculus But neither Fourier nor anyone else in the early 1820s can prove that Fourier Integrals work for all f (x)’s, in part because there’s still deep confusion in math about how to define the integral. . . but anyway, the reason we’re even mentioning the F.I. problem is that A.-L. Cauchy’s work on it leads him to most of the quote-unquote rigorizing of analysis he gets credit for, some of which rigor involves defining the integral as ‘the limit of a sum’ but most (= most of the rigor) concerns the convergence problems mentioned in (b) and its little [Quick Embedded Interpolation] in the Differential Equations part of [Emergency Glossary II], specifically as those problems pertain to Fourier Series∗ . * — There’s really nothing to be done about the preceding sentence except apologize.

— David Foster Wallace, Everything and More Having established the fundamentals of real and complex numbers in Chapter 2, in this chapter, we redo the march through calculus from Analysis I, from continuity and limits (Section 3.1), to differentiation (Section 3.2), to integration (Sections 3.3 and 3.4), and finishing with the Fundamental Theorems of Calculus (Section 3.5). The attentive reader should notice that even though our primary concern is functions f : R → C (i.e., real inputs and complex outputs), we will usually be as general as possible without extra effort. Specifically, for continuity, limits, and differentiation, we consider functions f : C → C, or more generally, f : X → C for some X ⊆ C, as functions of a complex variable can be treated just like functions of a real variable in those respects. (In fact, we will occasionally find the extra generality to be useful.) On the other hand, when we turn to integration, we see that the definition of the Riemann integral is essentially a matter of functions f : R → R, though by handling real and imaginary parts separately, we may extend integration to functions f : R → C, as desired. 29

30

CHAPTER 3. COMPLEX-VALUED CALCULUS

3.1

Continuity and limits

Continuity and limits generalize from real-valued functions of a real variable to complexvalued functions of a complex variable almost without change. We begin with the definition of continuity, which we first state in that generality. (The reader should keep in mind that subsets of R are particular examples of subsets of C.) Definition 3.1.1. Let X be a nonempty subset of C, let f : X → C be a function, and let a be a point in X. To say that f is continuous at a means that one of the following conditions holds: • (Sequential continuity) For every sequence xn in X such that lim xn = a, we have n→∞

that lim f (xn ) = f (a). n→∞

• (-δ continuity) For every > 0, there exists some δ() > 0 such that if |x − a| < δ(), then |f (x) − f (a)| < . To say that f is continuous on X means that f is continuous at a for all a ∈ X. Note that Definition 3.1.1 uses only the metric properties of R and C, and not any other features. We may therefore generalize Definition 3.1.1 to arbitrary metric spaces: Definition 3.1.2. Let X and Y be metric spaces, let f : X → Y be a function, and let a be a point in X. To say that f is continuous at a means that one of the following conditions holds: • (Sequential continuity) For every sequence xn in X such that lim xn = a, we have n→∞

that lim f (xn ) = f (a). n→∞

• (-δ continuity) For every > 0, there exists some δ() > 0 such that if d(x, a) < δ(), then d(f (x), f (a)) < . To say that f is continuous on X means that f is continuous at a for all a ∈ X. Now, it would be perverse to have two ideas called “continuity” (sequential and -δ) if they were not equivalent, and indeed they are: Theorem 3.1.3. Let X and Y be metric spaces, let f : X → Y be a function, and let a be a point in X. Then f is sequentially continuous at a if and only if f is -δ continuous at a. In particular, sequential continuity is equivalent to -δ continuity for complex-valued functions on subsets of R. Proof. On the one hand, suppose f is -δ continuous at a and xn is a sequence in X such that lim xn = a. By the definition of -δ continuity, for every > 0, there exists some δ() > 0 n→∞

such that if d(x, a) < δ(), then d(f (x), f (a)) < ; and by the definition of convergent sequence, there exists some Nx (x ) such that if n > Nx (x ), then d(xn , a) < x . So, given

3.1. CONTINUITY AND LIMITS

31

> 0, let N = N () = Nx (δ()). Then given n > N = Nx (δ()), d(xn , a) < δ(); and because d(xn , a) < δ(), d(f (xn ), f (a)) < . Conversely, suppose f is not -δ continuous at a, or in other words: (*) There exists 0 > 0 such that for every δ > 0, there exists x ∈ X such that d(x, a) < δ and d(f (x), f (a)) ≥ 0 . We construct a sequence xn that violates sequential continuity (a “bad sequence”) as follows. 1 For each n ∈ N, by (*), we may choose xn ∈ X such that d(xn , a) < and d(f (xn ), f (a)) ≥ n 0 . Then lim xn = a, by the Metric Squeeze Lemma (Lemma 2.4.14), but the sequence n→∞

f (xn ) never comes within 0 of f (a), so f (xn ) cannot converge to f (a). The theorem follows. We will find both the sequential and -δ versions of continuity to be useful for different purposes. For example, -δ continuity will later be useful when considering the properties of a continuous function over an entire interval. Conversely, sequential continuity makes the proof of the algebraic properties of continuity straightforward: Theorem 3.1.4. Let X be a subset of C, let f, g : X → C be functions, and for some a ∈ X, suppose that f and g are contiuous at a. Then: 1. For c ∈ C, cf (x) is continuous at a. 2. f (x) + g(x) is continuous at a. 3. f (x) is continuous at a. 4. f (x)g(x) is continuous at a. 5. If g(x) 6= 0 for all x ∈ X, then f (x)/g(x) is continuous at a. Proof. This essentially follows from Theorem 2.4.6. We give the details for property (4) as an example, and leave the rest for Problem 3.1.1. For X ⊆ C, functions f, g : X → C, and a ∈ X, suppose that f and g are continuous at a, and suppose xn is a sequence in X such that lim xn = a. By the definition of sequential n→∞

continuity, lim f (xn ) = f (a) and lim g(xn ) = g(a), which means that by Theorem 2.4.6, n→∞

n→∞

part (4), lim f (xn )g(xn ) = f (a)g(a). Property (4) then follows by definition of sequential n→∞ continuity. Next, in exactly the same manner as Theorem 2.4.6 implies Theorem 3.1.4, Theorem 2.4.10 implies: Theorem 3.1.5. Let X be a subset of C, let f : X → C be a function, let u and v be the real and imaginary parts of f , and let a be a point of X. Then f is continuous at a if and only if both u and v are continuous at a.

32

CHAPTER 3. COMPLEX-VALUED CALCULUS

Sequential continuity is also useful in proving that the composition of continuous functions is continuous, which we do for general metric spaces. Theorem 3.1.6. Let X, Y , and Z be metric spaces, let f : X → Y and g : Y → Z be functions, let a be a point in X, and suppose that f is continuous at a and g is continuous at f (a). Then g ◦ f is continuous at a. Proof. Problem 3.1.2. We briefly pause to note some examples of continuous functions, one concrete, one quite abstract. Example 3.1.7. For X = R or C, any polynomial function p : X → X is continuous (Problem 3.1.3). Example 3.1.8. Let X be any metric space, and fix a point b ∈ X. Then the function f : X → R given by f (x) = d(x, b) is continuous (Problem 3.1.4). As a special case of Example 3.1.8, we have the following useful fact: Corollary 3.1.9. The function f : C → R given by f (z) = |z| is continuous. Proof. Problem 3.1.4. We now come to the version of continuity known as uniform continuity, which we will use, for example, to study integration. Definition 3.1.10. Let X be a nonempty subset of C and let f : X → C be a function. To say that f is uniformly continuous on X means that for every > 0, there exists some δ() > 0 such that if x, y ∈ X and |x − y| < δ(), then |f (x) − f (y)| < . Note that to say that f is continuous at every a ∈ X means precisely that: For every a ∈ X and > 0, there exists some δ(, a) > 0 such that if x ∈ X and |x − a| < δ(, a), then |f (x) − f (a)| < . Uniform continuity on X is therefore generally a stricter condition than continuity on X, as the “degree of continuity” δ(, a) is no longer allowed to vary with a. However, we have the following result, whose proof is a consequence of Bolzano-Weierstrass: Theorem 3.1.11. If X is a closed and bounded subset of C and f : X → R is continuous, then f is uniformly continuous on X. Proof. Problem 3.1.5. Remark 3.1.12. For the reader familiar with Section 2.6, we note that since Theorem 3.1.11 really only relies on Bolzano-Weierstrass (see Problem 3.1.5), by Corollary 2.6.7, the theorem can be extended to compact subsets of metric spaces (Definition 2.6.6).

3.1. CONTINUITY AND LIMITS

33

We also have the Extreme Value Theorem, which is most definitely a result on real valued functions, as the statement makes no sense for complex-valued functions. Theorem 3.1.13 (Extreme Value Theorem). Let X be a closed and bounded subset of C, and let f : X → R be continuous. Then f attains both an absolute maximum and an absolute minimum on X; that is, there exist c, d ∈ X such that f (c) ≤ f (x) ≤ f (d) for all x ∈ X. Proof. Problem 3.1.6. The Extreme Value Theorem does, however, have the following useful corollary for complex-valued functions. Corollary 3.1.14. Let X be a closed and bounded subset of C, and let f : X → C be continuous. Then f is bounded; that is, there exists some M > 0 such that |f (x)| < M for all x ∈ X. Proof. Problem 3.1.7. We also state the following important complement to the Extreme Value Theorem. Theorem 3.1.15 (Intermediate Value Theorem). Let f : [a, b] → R be continuous and f (a) < d < f (b). Then there exists some c ∈ [a, b] such that f (c) = d. Proof. Problem 3.1.8. We now come to limits of functions, which we need to define derivatives. Most properties of limits of functions can be defined and proven in almost exactly the same way as the analogous properties of continuous functions, and so we leave all proofs to the reader in the problems. First, as the reader may recall from calculus, the most useful case of the limit lim f (x) is when a is not in the domain of f , which makese the following definition helpful x→a when dealing with limits: Definition 3.1.16. For X a nonempty subset of C, to say that a is a limit point of X means that there exists some sequence xn in X such that lim xn = a and xn 6= a for all n. n→∞

(Note that this is possible both for a ∈ X and a ∈ / X.) Turning to the formal definition of limit, we emphasize in bold the few places where the definition of continuity needs to be changed to get the definition of limit: Definition 3.1.17. Let X be a nonempty subset of C, let f : X → C be a function, and let a be a limit point of X. To say that lim f (x) = L means that one of the following x→a conditions holds: • (Sequential limit) For every sequence xn in X such that lim xn = a and xn 6= a for n→∞

all n, we have that lim f (xn ) = L. n→∞

34

CHAPTER 3. COMPLEX-VALUED CALCULUS • (-δ limit) For every > 0, there exists some δ() > 0 such that if |x − a| < δ() and x 6= a, then |f (x) − L| < . Again, the two versions of the definition of limit are equivalent:

Theorem 3.1.18. Let X be a nonempty subset of C, let f : X → C be a function, and let a be a limit point of X. For L ∈ C, the sequential and -δ definitions of lim f (x) = L are x→a equivalent. Proof. Problem 3.1.9. Furthermore, note that comparing Definitions 3.1.1 and 3.1.17, we see that f is continuous at a if and only if lim f (x) = f (a), a fact we will find useful later. In any case, we x→a now state the algebraic limit laws, and the fact that limits can be broken down to their real and imaginary parts; the proofs of those results are left to the reader. Theorem 3.1.19. Let X be a nonempty subset of C, let f : X → C be a function, and for some limit point a of X, suppose that lim f (x) = L and lim g(x) = M . Then: x→a

x→a

1. For c ∈ C, lim cf (x) = cL. x→a

2. lim (f (x) + g(x)) = L + M . x→a

3. lim f (x) = L. x→a

4. lim (f (x)g(x)) = LM . x→a

L f (x) = . x→a g(x) M

5. If g(x) 6= 0 for all x ∈ X and M 6= 0, then lim Proof. Problem 3.1.10.

Theorem 3.1.20. Let X be a subset of C, let f : X → C be a function, let u and v be the real and imaginary parts of f , let L = r + si (r, s ∈ R), and let a be a limit point of X. Then lim f (x) = L if and only if lim u(x) = r and lim v(x) = s. x→a

x→a

x→a

Proof. Problem 3.1.11. We will also need the Squeeze Lemma for functions, which the reader may recall from calculus. Lemma 3.1.21 (Squeeze Lemma). Let X be a nonempty subset of C, let f, g, h : X → R be real-valued functions such that f (x) ≤ g(x) ≤ h(x) for all x ∈ X, and for some limit point a of X, suppose lim f (x) = L = lim h(x). (3.1.1) x→a

Then lim g(x) = L. x→a

x→a

3.1. CONTINUITY AND LIMITS

35

We conclude this section by defining piecewise properties of functions precisely. Definition 3.1.22. To say that f : [a, b] → C is piecewise continuous means that there exist a0 , . . . , an ∈ [a, b] with a = a0 < a1 < · · · < an = b such that on each subinterval [ai−1 , ai ] (1 ≤ i ≤ n): − 1. The limits lim f (x) and lim f (x), which we denote by f (a+ i−1 ) and f (ai ), respecx→a− i

x→a+ i−1

tively, both exist; and 2. The function

 +  f (ai−1 ) if x = ai−1 , fi (x) = f (x) if ai−1 < x < ai ,   − f (ai ) if x = ai ,

(3.1.2)

is continuous on [ai−1 , ai ]. Similarly, to say that f is piecewise P (where later, P will be “differentiable” or “Lipschitz” ) means that the above holds with “continuous” replaced by P . In all of these cases, the intervals [ai−1 , ai ] are called the intervals of continuity of f .

Problems 3.1.1. (Proves Theorem 3.1.4 ) Prove the parts of Theorem 3.1.4 other than property (4). 3.1.2. (Proves Theorem 3.1.6 ) Let X, Y , and Z be metric spaces, let f : X → Y and g : Y → Z be functions, let a be a point in X, and suppose that f is continuous at a and g is continuous at f (a). Prove that g ◦ f is continuous at a. (Suggestion: Sequential continuity.) 3.1.3. Let X = R or C. (a) Prove that f : X → X defined by f (x) = x is continuous. (b) Prove that if p(x) = an xn + · · · + a0 is a polynomial function with coefficients in X, then p : X → X is continuous. (Suggestion: Induction and Theorem 3.1.20.) 3.1.4. (Proves Corollary 3.1.9 ) Let X be a metric space. (a) Now fix some b ∈ X, and define f : X → R by f (x) = d(x, b). Prove that f is continuous. (Suggestion: Use Theorem 2.3.7.) (b) Prove that the function f : C → R given by f (z) = |z| is continuous. 3.1.5. (Proves Theorem 3.1.11 ) Suppose X is a closed and bounded subset of C and f : X → R is continuous. (a) Prove that if f is not uniformly continuous on X, then there exists a constant 0 > 0 1 and |f (xn ) − f (yn )| ≥ 0 for and sequences xn and yn in X such that |xn − yn | < n all n. (Suggestion: Negate the definition of uniform continuity.)

36

CHAPTER 3. COMPLEX-VALUED CALCULUS

(b) Use contradiction to prove that f is uniformly continuous on X. (Suggestion: Use (a), Bolzano-Weierstrass, and sequential continuity to obtain a contradiction.) 3.1.6. (Proves Theorem 3.1.13 ) Let X be a closed and bounded subset of C, and let f : X → R be continuous. (a) Prove that f is bounded. (Suggestion: Suppose f is not bounded above, construct a “bad” sequence, and use Bolzano-Weierstrass and sequential continuity to obtain a contradiction.) (b) Prove that there exists some d ∈ X such that f (x) ≤ f (d) for all x ∈ X. (Suggestion: Since f is bounded above, use the Arbitrarily Close Criterion (Theorem 2.4.11), Bolzano-Weierstrass, and sequential continuity.) 3.1.7. (Proves Corollary 3.1.14 ) Let X be a closed and bounded subset of C, and let f : X → C be continuous. Prove that there exists some M > 0 such that |f (x)| < M for all x ∈ X. (Suggestion: Use continuity of the absolute value and the Extreme Value Theorem.) 3.1.8. (Proves Theorem 3.1.15 ) Let f : [a, b] → R be continuous, and suppose f (a) < d < f (b). (a) Let c = sup {x ∈ [a, b] | f (x) ≤ d}. Prove that f (c) ≤ d. (Suggestion: Arbitrarily Close Criterion and Theorem 2.4.9.) (b) Prove that if f (c) < d, then there exists some x such that c < x < b and f (x) < d. Conclude that f (c) = d. (Suggestion: Take = (d − f (c))/2.) 3.1.9. Let X be a nonempty subset of R, let f : X → C be a function, let a be a limit point of X, and fix L ∈ C. (a) Suppose lim f (x) = L by the -δ definition of limit. Prove that lim f (x) = L by the x→a x→a sequential definition of limit. (b) Now suppose that it is not the case that lim f (x) = L using the -δ definition of limit. x→a

Prove that it is not the case that lim f (x) = L using the sequential definition of limit. x→a

(Suggestion: Bad sequence.) 3.1.10. (Proves Theorem 3.1.19 ) Prove Theorem 3.1.19. (Suggestion: Select just one or two of the limit laws to prove, as the others are similar.) 3.1.11. (Proves Theorem 3.1.20 ) Prove Theorem 3.1.20. 3.1.12. (Proves Lemma 3.1.21 ) Let X be a nonempty subset of C, let f, g, h : X → R be real-valued functions such that f (x) ≤ g(x) ≤ h(x) for all x ∈ X, and for some limit point a of X, suppose lim f (x) = L = lim h(x). (3.1.3) x→a

x→a

Prove that lim g(x) = L. (Suggestion: Each of f and h has an associated δ().) x→a

3.2. DIFFERENTIATION

3.2

37

Differentiation

As with continuity and limits, derivatives extend to complex-valued functions with complex domains with little change, except that one of the most important results in differentiation, the Mean Value Theorem, does not work as it did before (Example 3.2.11). That fact will sometimes force us to do calculus of a complex-valued function by doing calculus on its real and imaginary parts; however, once we make that adjustment, everything works just fine. We begin with a space- and time-saving convention, followed by the basic definition. Convention 3.2.1. For the rest of this section, we let X denote a subset of C such that every point of X is a limit point of X (Definition 3.1.16), and similarly for Y . This ensures that lim makes sense for all points a ∈ X and a ∈ Y . x→a

Definition 3.2.2. Let X be as in Convention 3.2.1, let f : X → C be a function, and let a be a point in X. To say that f is differentiable at a means that the limit f (x) − f (a) f (a + h) − f (a) = lim x→a h→0 x−a h

f 0 (a) = lim

(3.2.1)

exists (where h = x − a). To say that f is differentiable on X means that for all a ∈ X, f is differentiable at a. Example 3.2.3. For α ∈ C, f : C → C defined by f (x) = αx, f is differentiable at all x ∈ C and f 0 (x) = α for all x ∈ C (Problem 3.2.1). Similarly, if we define g : (C−{0}) → C by g(x) = x−1 , then for all x ∈ C − {0}, g is differentiable at x and g 0 (x) = −x−2 (Problem 3.2.1). Note that if u(x) and v(x) are the real and imaginary parts of f (x), then f (x) − f (a) lim = lim x→a x→a x−a

u(x) − u(a) v(x) − v(a) +i x−a x−a

.

(3.2.2)

Theorem 3.1.20 therefore immediately yields the following corollary. Corollary 3.2.4. Let X be as in Convention 3.2.1 and let f : X → C be a function. If u(x) and v(x) are the real and imaginary parts of f (x), then f is differentiable at a ∈ X if and only if both u(x) and v(x) are differentiable at a. One helpful way to think of differentiability is to observe that being differentiable at a is equivalent to having a sufficiently good linear approximation near a. More precisely: Lemma 3.2.5. Let X be as in Convention 3.2.1 and let f : X → C be a function. For a ∈ X, the following are equivalent: • f is differentiable at a.

38

CHAPTER 3. COMPLEX-VALUED CALCULUS • There exists some m ∈ C such that if we define E : X → C by   f (x) − f (a) − m for x 6= a, E(x) = x−a 0 for x = a,

(3.2.3)

for all x ∈ X, then E(x) is continuous at a (i.e., lim E(x) = 0). x→a

Futhermore, if either (and therefore both) of these conditions hold, m = f 0 (a). Proof. Problem 3.2.2. Lemma 3.2.5 is most often used in the form of the following immediate corollary. Corollary 3.2.6 (Local Linearity). Let X be as in Convention 3.2.1 and let f : X → C be differentiable at a ∈ X. Then there exists a function Ef : X → C such that Ef is continuous at a, Ef (a) = 0, and for all x ∈ X, f (x) = f (a) + (f 0 (a) + Ef (x))(x − a).

(3.2.4)

If we replace Ef (x) with 0 in (3.2.4), we get the approximation f (x) ≈ f (a) + f 0 (a)(x − a).

(3.2.5)

As the reader may recall from calculus, (3.2.5) is known as the local linear approximation to f at a. The function Ef (x) in Corollary 3.2.6 is therefore sometimes called the relative error in the local linear approximation. To give one example of the usefulness of local linearity, it quickly follows that: Corollary 3.2.7. Let X be as in Convention 3.2.1, let f : X → C be a function, and suppose f is differentiable at a ∈ X. Then f is continuous at a. Proof. Let Ef (x) be as in Corollary 3.2.6. Then by (3.2.4) and the limit laws for functions, we see that lim f (x) = f (a) + (f 0 (a) + lim Ef (x))( lim x − a) x→a x→a x→a (3.2.6) = f (a) + (f 0 (a) + 0)(a − a) = f (a). The corollary follows. The algebraic properties of the derivative can now be proven in the same manner as they were for real-valued functions. Theorem 3.2.8. Let X be as in Convention 3.2.1 and let a be a point in X. Suppose that f, g : X → C are differentiable at a. Then: 1. For c ∈ C, cf is differentiable at a, with derivative (cf )0 (a) = cf 0 (a). 2. f + g is differentiable at a, with derivative (f + g)0 (a) = f 0 (a) + g 0 (a).

3.2. DIFFERENTIATION

39 0

3. f is differentiable at a, with derivative f (a) = f 0 (a). 4. f g is differentiable at a, with derivative (f g)0 (a) = f 0 (a)g(a) + f (a)g 0 (a). 0 f 5. If g(x) 6= 0 for all x ∈ X, then f /g is differentiable at a, with derivative (a) = g g(a)f 0 (a) − f (a)g 0 (a) . g(a)2 Proof. Properties (1)–(3) follow from the corresponding limit laws in an entirely straightforward way, so we omit their proofs. Properties (4) and (5) are more interesting; see Problems 3.2.3 and 3.2.5. Extending the chain rule is also relatively straightforward. Theorem 3.2.9. Let X and Y be as in Convention 3.2.1 and let a be a point in X. Suppose that f : X → Y is differentiable at a and g : Y → C is differentiable at f (a). Then g ◦ f : X → C is differentiable at a, and (g ◦ f )0 (a) = g 0 (f (a))f 0 (a). Proof. Problem 3.2.4. We now quote the Mean Value Theorem, which is perhaps the most important theoretical result in differentiation. The key new point to realize is that the Mean Value Theorem only holds for real-valued functions, and not complex-valued functions (Example 3.2.11). Theorem 3.2.10 (Mean Value Theorem). Let f : [a, b] → R be a real-valued function that is continuous on [a, b] and differentiable on (a, b). Then there exists some c ∈ (a, b) such that f (b) − f (a) = f 0 (c). (3.2.7) b−a f (2π) − f (0) = 0, 2π − 0 0 but for x ∈ [0, 2π], f (x) = − sin x + i cos x is never equal to the complex number 0. Example 3.2.11. Define f : R → C by f (x) = cos x + i sin x. Then

Fortunately, the real-valued Mean Value Theorem is good enough to get what we need for complex-valued functions. For example, the reader may recall from calculus that a function whose derivative is 0 must be constant, and the same is true for complex-valued functions on the following kind of domain. Definition 3.2.12. We define a path in C to be a differentiable function γ : [0, 1] → C. To say that X ⊆ C is path-connected means that for any x, y ∈ X, there is a finite sequence of paths in X starting at x and ending at y (Figure 3.2.1). For example, open and closed discs (Definition 2.4.7) are path-connected, as are intervals in R.

40

CHAPTER 3. COMPLEX-VALUED CALCULUS

y x

Figure 3.2.1: A path-connected subset of C Corollary 3.2.13 (Zero Derivative Theorem). Let X be a path-connected subset of C, and let f : X → C be a function. Suppose either that f 0 (x) = 0 for all x ∈ X, or X = [a, b], f is continuous on [a, b], and f 0 (x) = 0 for all x ∈ (a, b). Then f (x) is constant on X. Proof. By the usual f (x) = u(x) + iv(x) argument, it suffices to consider the case of a real-valued u : X → R. In fact, it suffices to show that if γ : [0, 1] → X is a path such that u0 (γ(t)) = 0 for all t ∈ (0, 1) and u is continuous at γ(0) and γ(1), then u(γ(0)) = u(γ(1)), for in that case, it will follow that u(x) = u(y) for all x, y ∈ X. So let γ be such a path, and let f : [0, 1] → R be defined by f (t) = u(γ(t)). Then f is continuous on [0, 1], and the chain rule implies that for all t ∈ (0, 1), f 0 (t) = u0 (γ(t))γ 0 (t) = 0.

(3.2.8)

Applying the Mean Value Theorem 3.2.10 to f then shows that u(γ(0)) = f (0) = f (1) = u(γ(1)), and the corollary follows. On a similar note: Corollary 3.2.14. Let X be a subinterval of R (possibly X = R), and suppose that f : X → C is differentiable and f 0 is bounded. Then f is uniformly continuous on X. Proof. Problem 3.2.7. As one more application of the Mean Value Theorem, we prove the following version of the derivative of an inverse function. Since we will only use this result once, as background in Section 4.6, we assume unnecessary hypotheses that simplify the proof; see Ross [Ros13, Thm. 29.9] for a more general version. Theorem 3.2.15. Let f : [a, b] → R be differentiable, and suppose also that f 0 is continuous and positive on [a, b]. Then: 1. f is strictly increasing and maps [a, b] bijectively onto a closed interval [c, d]; and 2. If g : [c, d] → [a, b] is the inverse function of f , then g is differentiable on [c, d], and 1 for y ∈ [c, d] such that g(y) = x, g 0 (y) = 0 . f (g(y)) Proof. Problem 3.2.8.

3.2. DIFFERENTIATION

41

Remark 3.2.16. After this section, we will generally focus on complex-valued functions of one real variable. However, to give the reader some perspective, when X is an open subset of C (Definition 2.6.2), a differentiable function f : X → C is more often called a holomorphic function, and these functions are the main topic of complex analysis. Holomorphic functions have remarkable and highly non-obvious properties above and beyond those held by differentiable complex-valued functions of a real variable. To take just one example that is easy to explain but not easy to prove, the derivative of a holomorphic function of a complex variable is itself always holomorphic (!). For this and much more, see any good text on complex analysis.

Problems In all of the problems in this section, let X and Y be as in Convention 3.2.1. 3.2.1. (a) For α ∈ C, prove that f : C → C given by f (x) = αx is differentiable, and that f 0 (x) = α. (b) Prove that g : C − {0} → C given by g(x) = 1/x is differentiable on its domain, and that g 0 (a) = −a−2 for all a ∈ C − {0}. 3.2.2. (Proves Lemma 3.2.5 ) Fix f : X → C and a ∈ X. (a) Assume that f is differentiable at a. Prove that if m = f 0 (a), then E(x) (as defined in (3.2.3)) is continuous at a. (Suggestion: Compute lim E(x).) x→a

(b) Assume m ∈ C and E(x) (as defined in (3.2.3)) is continuous at a (i.e., lim E(x) = 0). Prove that f is differentiable at a and f 0 (a) = m.

x→a

3.2.3. (Proves Theorem 3.2.8, part 4 ) Let a be a point in X, and suppose that f, g : X → C are differentiable at a. Prove that (f g)(x) = f (x)g(x) is differentiable at a, and that (f g)0 (a) = f 0 (a)g(a) + f (a)g 0 (a). (Suggestion: Use local linearity to approximate f (x)g(x) near a.) 3.2.4. (Proves Theorem 3.2.9 ) Let a be a point in X. Suppose that f : X → Y is differentiable at a and g : Y → C is differentiable at f (a). Prove that g ◦ f : X → C is differentiable at a, and (g ◦ f )0 (a) = g 0 (f (a))f 0 (a). (Suggestion: Take y = f (x) and b = f (a); use local linearity (3.2.4) twice.) 3.2.5. (Proves Theorem 3.2.8, part 5 ) Let a be a point in X, suppose that f, g : X → C are differentiable at a, and suppose that g(x) 6= 0 for all x ∈ X. (a) Let h(x) = 1/g(x). Use the chain rule and Problem 3.2.1 to prove that for g(a) 6= 0, g 0 (a) h0 (a) = − . g(a)2 0 f g(a)f 0 (a) − f (a)g 0 (a) (b) Now use the product rule to deduce that (a) = . g g(a)2

42

CHAPTER 3. COMPLEX-VALUED CALCULUS

3.2.6. For α ≥ 0, define fα : R → R by fα (x) = |x|1+α . (a) Prove that f0 (x) = |x| is not differentiable at x = 0. (b) Prove that for α > 0, fα0 (0) = 0. 3.2.7. (Proves Corollary 3.2.14 ) Let f : X → R be differentiable, and suppose that for some C > 0 and all x ∈ X, we have |f 0 (x)| ≤ C. (a) Let f (x) = u(x)+iv(x) as usual. Prove that for any x, y ∈ X, |u(x) − u(y)| ≤ C |x − y|, and similarly for v(x). (Suggestion: Mean Value Theorem 3.2.10.) (b) Prove that f is uniformly continuous on X. 3.2.8. (Proves Theorem 3.2.15 ) Let f : [a, b] → R be differentiable, and suppose also that f 0 is continuous and positive on [a, b]. (a) Prove that f is strictly increasing on [a, b]. (Suggestion: Mean Value Theorem.) (b) Prove that f maps [a, b] bijectively onto a closed interval [c, d]. (Suggestion: Intermediate Value Theorem.) (c) Prove that there exist m, M ∈ R such that for a ≤ x0 < x1 ≤ b, we have that 0 0, there exists a partition P such that U (v; P ) − L(v; P ) < .

3.3. THE RIEMANN INTEGRAL: DEFINITION

47

Furthermore, if condition (2) holds, then Z b lim L(v; Pn ) = v(x) dx = lim U (v; Pn ). n→∞

n→∞

a

(3.3.11)

Note that as the name implies, Lemma 3.3.10 once again reduces integrability and computing the integral to the problem of computing the limit of some sequence of Riemann sums. Proof. Let P be the set of all partitions of [a, b], let L = {L(v; P ) | P ∈ P}, and let U = {U (v; P ) | P ∈ P}. (1)⇒(2): Let n be a positive integer. By the arbitrarily close criterion (Theorem 2.4.11), there exists some Qn ∈ P such that Z b Z b 1 v(x) dx − L(v; Qn ) = v(x) dx − L(v; Qn ) < , (3.3.12) 2n a a and there exists some Q0n ∈ P such that U (v; Q0n )

Z −

b

v(x) dx < a

1 . 2n

(3.3.13)

By Lemma 3.3.8, taking the common refinement Pn = Qn ∪ Q0n pushes the quantities U (v; Q0n ) and L(v; Qn ) closer together, and so U (v; Pn ) − L(v; Pn ) ≤ U (v; Q0n ) − L(v; Qn ) Z b Z b 1 0 v(x) dx − L(v; Qn ) < . v(x) dx + = U (v; Qn ) − n a a

(3.3.14)

Condition (2) follows by the Metric Squeeze Lemma (Lemma 2.4.14). (2)⇒(3): Given > 0, by the definition of limit, we may let P = Pn for any sufficiently large n. Z b Z b (3)⇒(1): Let ∆ = v(x) dx − v(x) dx ≥ 0 (Theorem 3.3.9). If condition (3) holds, a

a

by (3.3.10), ∆ < for any > 0, and so ∆ = 0. Finally, suppose condition (2) holds. In that case, by (3.3.10), we have that for any n, Z b v(x) dx − L(v; Pn ) ≤ U (v; Pn ) − L(v; Pn ). (3.3.15) a

So by the Metric Squeeze Lemma (Lemma 2.4.14), Z b Z b lim (v; Pn ) = v(x) dx = v(x) dx, L→∞

and the theorem follows.

a

a

(3.3.16)

48

CHAPTER 3. COMPLEX-VALUED CALCULUS

Turning to complex-valued functions, we define integrability for such functions in terms of their real and imaginary parts. Definition 3.3.11. Let f : [a, b] → C be bounded, and let u and v be the real and imaginary parts of f . To say that f is integrable means that both u and v are integrable, in which case we define Z b Z b Z b v(x) dx. (3.3.17) u(x) dx + i f (x) dx = a

a

a

We will also need a fully complex version of Lemma 3.3.10. Lemma 3.3.12 (Sequential criterion for complex integrability). Let f : [a, b] → C be bounded, and for any partition P = {x0 , . . . , xn } of [a, b] and 1 ≤ i ≤ n, define µ(f ; P, i) = sup {|f (x) − f (y)| | x, y ∈ [xi−1 , xi ]} , n X E(f ; P ) = µ(f ; P, i)(∆x)i .

(3.3.18) (3.3.19)

i=1

Then the following are equivalent. 1. f is integrable on [a, b]. 2. For any > 0, there exists a partition P such that E(f ; P ) < . Proof. Problem 3.3.3. Remark 3.3.13. We may interpret Lemma 3.3.12 as saying that f : [a, b] → C is integrable if and only if f is “uniformly continuous on average,” in the following sense. If we think of µ(f ; P, i) from (3.3.18) as the supremum of the “wiggle” of f on subinterval i, then µ(f ; P, i)(∆x)i is the wiggle on subinterval i, weighted by the size of subinterval i, and E(f ; P ) is the “weighted total wiggle” of f on [a, b]. In these terms, uniform continuity (Definition 3.1.10) gives a uniform bound on the wiggle of f , upon subdividing [a, b] into sufficiently small subintervals, whereas the condition E(f ; P ) < does something similar, but with the possibility of there being some very small subintervals where the wiggle is large. For a more precise expression of the idea of “integrable means uniformly continuous except on some very small subintervals”, see the proof of Lemma 3.4.7. Later, we will describe (but not prove) a necessary and sufficient condition for integrability, expressed in terms of continuity and some more sophisticated ideas; see Remark 7.5.7. We conclude with a single, admittedly silly example. Nevertheless, the reader should carry out this computation, not just because this result actually becomes useful later, but also as an exercise in understanding the many layers of definitions established in this section. Z b Lemma 3.3.14. Constant functions are integrable, and for c ∈ C, c dx = c(b − a). a

Proof. This follows from the real-valued case; see Problem 3.3.4. See also Problem 3.3.5 for an example of a nonintegrable function.

3.3. THE RIEMANN INTEGRAL: DEFINITION

49

Problems 3.3.1. (Proves Lemma 3.3.8 ) Let v : [a, b] → R be bounded and let P be a partition of [a, b]. Prove that L(v; P ) ≤ U (v; P ). 3.3.2. (Proves Theorem 3.3.9 ) Let v : [a, b] → R be bounded, let P be the set of all partitions of [a, b], let L = {L(v; P ) | P ∈ P}, and let U = {U (v; P ) | P ∈ P}. (a) For P, Q ∈ P, prove that L(v; P ) ≤ U (v; Q). (Suggestion: Use common refinements.) (b) Prove that (3.3.10) holds for any P ∈ P. (Suggestion: The middle inequality is the interesting part; by part (a), every upper sum is an upper bound for L and every lower sum is a lower bound for U.) 3.3.3. (Proves Lemma 3.3.12 ) Let f : [a, b] → C be bounded, let u(x) and v(x) be the real and imaginary parts of f (x), let P = {x0 , . . . , xn } be a partition of [a, b], and let 1 ≤ i ≤ n. We also retain the notation of Definition 3.3.4 and Lemma 3.3.12. (a) Prove that if z = a + bi and w = c + di are complex numbers, then |a − c| ≤ |z − w| and |z − w| ≤ |a − c| + |b − d|. (Suggestion: Use a right triangle.) (b) Prove that if S is a nonempty bounded subset of R, then sup {|s − t| | s, t ∈ S} = sup S − inf S.

(3.3.20)

(Suggestion: Use the arbitrarily close criterion (Theorem 2.4.11).) (c) Prove that sup {|u(x) − u(y)| | x, y ∈ [xi−1 , xi ]} = M (u; P, i) − m(u; P, i). (d) Prove that M (u; P, i) − m(u; P, i) ≤ µ(f ; P, i)

(3.3.21)

and µ(f ; P, i) ≤ (M (u; P, i) − m(u; P, i)) + (M (v; P, i) − m(v; P, i)).

(3.3.22)

(e) Prove Lemma 3.3.12. 3.3.4. (Proves Lemma 3.3.14 ) For c ∈ R, prove that the constant function v(x) = c is Z b integrable, and prove that c dx = c(b − a). a

3.3.5. Define f : [0, 1] → R by ( 1 if x is rational, f (x) = 0 if x is irrational.

(3.3.23)

Prove that f is not integrable. (Suggestion: Prove that all upper Riemann sums U (f ; P ) are equal and that all lower Riemann sums L(f ; P ) are equal.)

50

3.4

CHAPTER 3. COMPLEX-VALUED CALCULUS

The Riemann integral: Properties

In this section, we prove the “ordinary” properties of the integral, that is, the ones that are not directly related to the Fundamental Theorem of Calculus. We begin with some useful estimates. Lemma 3.4.1. Let v, w : [a, b] → R be bounded, let P be a partition of [a, b], and c > 0. Then, in the notation of Definition 3.3.4, we have: m(v; P, i) + m(w; P, i) ≤ m(v + w; P, i) ≤ M (v + w; P, i) ≤ M (v; P, i) + M (w; P, i),

(3.4.1)

cm(v; P, i) = m(cv; P, i) ≤ M (cv; P, i) = cM (v; P, i),

(3.4.2)

m(−v; P, i) = −M (v; P, i) ≤ −m(v; P, i) = M (−v; P, i).

(3.4.3)

Furthermore, if v(x) ≤ w(x) for all x ∈ [a, b], then M (v; P, i) ≤ M (w; P, i).

(3.4.4)

Proof. We prove only (3.4.1), and leave the other estimates as Problem 3.4.1. Let [xi−1 , xi ] be the ith subinterval of P . By the definitions of M (v; P, i) and M (w; P, i), for any x ∈ [xi−1 , xi ], we have v(x) + w(x) ≤ M (v; P, i) + M (w; P, i). In other words, M (v; P, i) + M (w; P, i) is an upper bound for S = {v(x) + w(x) | x ∈ [xi−1 , xi ]}, which means that M (v + w; P, i) ≤ M (v; P, i) + M (w; P, i), since M (v + w, P, i) = sup S (by the sup inequality trick 2.1.5). The last inequality follows, and the first inequality follows by an analogous argument. Theorem 3.4.2. Let v, w : [a, b] → R be bounded and integrable on [a, b] and c > 0. Then v + w, cv, and −v are integrable on [a, b], and Z a

b

Z

b

Z

(v(x) + w(x)) dx = v(x) dx + a Z b Z b cv(x) dx = c v(x) dx, a a Z b Z b (−v(x)) dx = − v(x) dx. a

b

w(x) dx,

(3.4.5)

a

(3.4.6) (3.4.7)

a

Furthermore, if v(x) ≤ w(x) for all x ∈ [a, b], Z

b

Z v(x) dx ≤

a

b

w(x) dx.

(3.4.8)

a

The basic idea for each property is to sum Lemma 3.4.1 over all of the subintervals of a partition to obtain the theorem for Riemann sums, and then take a limit of a well-chosen sequence of Riemann sums.

3.4. THE RIEMANN INTEGRAL: PROPERTIES

51

Proof. By Lemma 3.3.10, there exist sequences Qn , Q0n of partitions of [a, b] such that lim (U (v; Qn ) − L(v; Qn )) = lim (U (w; Q0n ) − L(w; Q0n )) = 0.

n→∞

n→∞

(3.4.9)

In fact, by taking the common refinement Pn = Qn ∪ Q0n , we may replace both Qn and Q0n with Pn , since refinements only make these sequences converge faster (Lemma 3.3.8). Turning first to (3.4.5), summing (3.4.1) over all subintervals of Pn , we see that L(v; Pn ) + L(w; Pn ) ≤ L(v + w; Pn ) ≤ U (v + w; Pn ) ≤ U (v; Pn ) + U (w; Pn ).

(3.4.10)

It follows that 0 ≤ U (v + w; Pn ) − L(v + w; Pn ) ≤ (U (v; Pn ) − L(v; Pn )) + (U (w; Pn ) − L(w; Pn )), (3.4.11) and since the right-hand side converges to 0, the Squeeze Lemma implies that lim (U (v + w; Pn ) − L(v + w; Pn )) = 0.

n→∞

(3.4.12)

Therefore, by Lemma 3.3.10, v + w is integrable. Furthermore, since U (v + w; Pn ) ≤ Z b U (v; Pn )+U (w; Pn ) for all n, taking the lim of both sides, we see that (v(x)+w(x)) dx ≤ n→∞ a Z b Z b Z b v(x) dx+ w(x) dx; and since L(v; Pn )+L(w; Pn ) ≤ L(v+w; Pn ) for all n, v(x) dx+ a a a Z b Z b w(x) dx ≤ (v(x) + w(x)) dx. Equation (3.4.5) follows. a

a

Similar arguments (Problem 3.4.2) give the other integrability results and formulas. A computation then yields the full complex version of the algebraic parts of Theorem 3.4.2. Corollary 3.4.3. Let f, g : [a, b] → R be bounded and integrable on [a, b] and let c, d ∈ C. Then cf + dg is integrable on [a, b] and Z b Z b Z b (cf (x) + dg(x)) dx = c f (x) dx + d g(x) dx. (3.4.13) a

a

a

Proof. We observe that integrability of cf + dg and the complex versions of (3.4.5) and (3.4.7) hold by decompositions into real and imaginary parts. It will therefore suffice to prove the complex version of (3.4.6). So suppose that f (x) = u(x) + iv(x) is integrable on [a, b] and c + di ∈ C. Then by Theorem 3.4.2, Z b Z b (c + di)(u(x) + iv(x)) dx = (cu(x) − dv(x) + i(du(x) + cv(x))) dx, a a Z b Z b =c u(x) dx − d v(x) dx a a (3.4.14) Z b Z b + id u(x) dx + ic v(x) dx, a a Z b = (c + di) (u(x) + iv(x)) dx. a

52

CHAPTER 3. COMPLEX-VALUED CALCULUS

The corollary follows. Now, up to this point, we have not really used the full flexibility of allowing arbitrary (uneven) partitions. It is precisely in the proof of the next result where we see that come to fruition. Theorem 3.4.4. For a < b < c, let f : [a, c] → C be integrable on [a, b] and [b, c]. Then f is integrable on [a, c] and Z

c

Z

Z

a

c

f (x) dx.

f (x) dx +

f (x) dx = a

b

(3.4.15)

b

Proof. This follows from the real-valued case; see Problem 3.4.3 for the proof. Theorem 3.4.5. Let f : [a, b] → C be continuous. Then f is integrable on [a, b]. Proof. This follows from the real-valued case; see Problem 3.4.4 for the proof. A little more work then yields the following improved version of Theorem 3.4.5. Corollary 3.4.6. Let f : [a, b] → C be bounded and continuous except at finitely many points in [a, b]. Then f is integrable on [a, b]. As the details are not terribly interesting, we only sketch the proof. Sketch of proof. This follows from the real-valued case, so suppose v : [a, b] → C is continuous except possibly at the points a = x0 < · · · < xn = b in [a, b], and suppose that for some M ∈ R, |v(x)| < M for all x ∈ [a, b]. Given > 0, choose a partition P of [a, b] as follows: • (Small subintervals) Make sure that each of the points x0 , . . . , xn is contained in only one “small” subinterval of Pn , either by putting xi in the center of its subinterval, by putting it at the left endpoint (for x0 ), or by putting it at the right endpoint (for xn ). Also make sure that these small subintervals are disjoint and have total length at most . 4M • (Continuous subintervals) The remainder of [a, b], plus boundary points, is the union of n disjoint closed intervals, and v is continuous on each of those intervals. By Theorem 3.4.5, v is integrable on each of those intervals, so by Lemma 3.3.10, we may choose a partition Pi on the ith such interval such that U (v; Pi ) − L(v; Pi ) < . 2n Then the small subintervals and continuous subintervals each contribute at most to the 2 total U (v; P ) − L(v; P ), and so U (v; P ) − L(v; P ) < . We will also need a “combination of integrable functions is integrable” result beyond the linear combinations of Corollary 3.4.3. The result we want (Theorem 3.4.8) is mostly a consequence of the following somewhat technical lemma.

3.4. THE RIEMANN INTEGRAL: PROPERTIES

53

Lemma 3.4.7. If f : [a, b] → C is integrable, and ϕ : C → C is continuous, then ϕ ◦ f : [a, b] → C is integrable. Now, while the proof of this lemma requires somewhat fancier epsilonics than usual, we include the details because it also gives a more precise formulation of the idea that an integrable function is “uniformly continuous on average” (Remark 3.3.13). Proof. Fix > 0. First, since f is bounded (by definition of integrable), there exists some M1 > 0 such that |f (x)| < M1 for all x ∈ [a, b]. Furthermore, since the closed disk NM1 (z) is closed and bounded (Theorem 2.4.9), we see that: • ϕ is uniformly continuous on D (Theorem 3.1.11), or more specifically, there exists some δ > 0 such that if |z − w| < δ, then |ϕ(z) − ϕ(w)| < ; and 3(b − a) • ϕ is bounded on D (Corollary 3.1.14), or more specifically, there exists some M > 0 such that |ϕ(z)| < M for all z ∈ D, and therefore, |ϕ(f (x))| < M for all x ∈ [a, b]. Recall the notation of Lemma 3.3.12: µ(f ; P, i) = sup {|f (x) − f (y)| | x, y ∈ [xi−1 , xi ]} , n X E(f ; P ) = µ(f ; P, i)(∆x)i .

(3.4.16) (3.4.17)

i=1

In those terms, choose a partition P of [a, b] such that E(f ; P ) < 0 = the subintervals σi = [xi−1 , xi ] of P into two disjoint categories.

δ . We divide 6M

1. We say that σi is in the class S of stable subintervals if µ(f ; P, i) < δ. 2. Otherwise, if µ(f ; P, i) ≥ δ, then we say that σi is in the class T of thin subintervals (a name to be justfied momentarily). The reader may find it helpful to consider a heuristic picture like Figure 3.4.1, in which the shaded regions represent the area between upper and lower Riemann sums. So now, let w(P ) be the sum of the widths of all thin subintervals, or in other words, X w(P ) = (∆x)i . (3.4.18) σi ∈T

It follows that 0 > E(f ; P ) ≥

X

µ(f ; P, i)(∆x)i ≥

σi ∈T

X

δ(∆x)i = δw(P ),

(3.4.19)

σi ∈T

which means that w(P ) <

0 = , δ 6M

(3.4.20)

justifying the name “thin”. It remains to compute E(ϕ ◦ f ; P ), which we do separately for the S and T subintervals:

54

CHAPTER 3. COMPLEX-VALUED CALCULUS

Figure 3.4.1: Stable and thin subintervals • On each subinterval σi ∈ S, for x, y ∈ σi , by the definition of S, we have |f (x) − f (y)| ≤ µ(f ; P, i) < δ, which means that |ϕ(f (x)) − ϕ(f (y))| < . It follows that 3(b − a) for all σi ∈ S, which means that µ(ϕ ◦ f ; P, i) ≤ 3(b − a) X X µ(ϕ ◦ f ; P, i)(∆x)i ≤ (∆x)i ≤ (b − a) = . (3.4.21) 3(b − a) 3(b − a) 3 σi ∈S

σi ∈S

• On each subinterval σi ∈ T , for x, y ∈ σi , by the choice of M and the triangle inequality, we have |ϕ(f (x)) − ϕ(f (y))| ≤ 2M , and therefore, µ(ϕ ◦ f ; P, i) ≤ 2M . It follows that X X (3.4.22) µ(ϕ ◦ f ; P, i)(∆x)i ≤ 2M (∆x)i ≤ 2M w(P ) ≤ . 3 σi ∈T

σi ∈T

Combining both types of subintervals, we see that E(f ; P ) < , and the lemma follows by Lemma 3.3.12. Theorem 3.4.8. If f, g : [a, b] → C are integrable, then |f (x)|, f (x)2 , and f (x)g(x) are also integrable on [a, b]. If we also assume that f and g are real-valued, then min(f (x), g(x)) and max(f (x), g(x)) are integrable on [a, b]. Furthermore, for both real- and complex-valued integrable functions f , Z b Z b f (x) dx ≤ |f (x)| dx. (3.4.23) a

a

Proof. Everything up to, and including, the real-valued case of (3.4.23) is proven in Problem 3.4.5. Z b For the complex-valued case of (3.4.23), let C = f (x) dx. Since (3.4.23) holds if a

C = 0, we may assume C 6= 0 and let γ = |C| /C. We then have γC = |C|, and so Z b Z b Z b =γ f (x) dx f (x) dx = γf (x) dx. (3.4.24) a

a

a

3.4. THE RIEMANN INTEGRAL: PROPERTIES

55 Z

b

Z

b

u(x) dx, and so

γf (x) dx =

Let u be the real part of γf . Since (3.4.24) is real,

a

a

applying the real case of (3.4.23), we get Z b Z b Z b Z b = f (x) dx u(x) dx ≤ |u(x)| dx ≤ |f (x)| dx, a

a

a

(3.4.25)

a

where the last inequality follows because |u(x)| ≤ |f (x)|. The theorem follows. We conclude this section with a property of the integral of a nonnegative continous function that will come in handy later. Lemma 3.4.9. Let f : [a, b] → R be a continuous nonnegative function (i.e., f (x) ≥ 0). If Z b f (x) dx = 0, then f (x) = 0 for all x ∈ [a, b]. a

Proof. Problem 3.4.6.

Problems 3.4.1. (Proves Lemma 3.4.1 ) Let v, w : [a, b] → R be bounded, let P be a partition of [a, b], and let c ∈ R be positive. (a) Prove (3.4.2). (b) Prove (3.4.3). (c) Now assume that v(x) ≤ w(x) for all x ∈ [a, b]. Prove (3.4.4). 3.4.2. (Proves Theorem 3.4.2 ) Let v, w : [a, b] → R be bounded and integrable and let c ∈ R be positive. (a) Prove that cv is integrable and prove (3.4.6). (b) Prove that −v is integrable and prove (3.4.7). (c) Now assume that v(x) ≤ w(x) for all x ∈ [a, b]. Prove (3.4.8). 3.4.3. (Proves Theorem 3.4.4 ) For a < b < c in R, let v : [a, c] → R be integrable on [a, b] and [b, c]. Prove that v is integrable on [a, c] and (3.4.15) holds. (Suggestion: Use Lemma 3.3.10 and the fact that if P is a partition of [a, b] and Q is a partition of [b, c], then P ∪ Q is a partition of [a, c].) 3.4.4. (Proves Theorem 3.4.5 ) Let v : [a, b] → R be continuous. (a) Suppose v is continuous on [x0 , x1 ] and satisfies the condition that for x, y ∈ [x0 , x1 ], we have that |v(x) − v(y)| < 0 . Prove that if M = sup {v(x) | x ∈ [x0 , x1 ]} , m = inf {v(x) | x ∈ [x0 , x1 ]} , then |M − m| < 0 .

56

CHAPTER 3. COMPLEX-VALUED CALCULUS

(b) Use the uniform continuity of v to show that given > 0, for sufficiently large n, if Pn is the nth standard partition of [a, b] (Example 3.3.2), then for x and y contained in the same subinterval of Pn , we have |v(x) − v(y)| < . (b − a) (c) Prove that v is integrable on [a, b]. (Suggestion: Use Lemma 3.3.10.) 3.4.5. (Proves Theorem 3.4.8 ) Let f, g : [a, b] → C be integrable. (a) Prove that |f (x)| and f (x)2 are integrable on [a, b]. (Suggestion: Lemma 3.4.7.) (b) Prove that f (x)g(x) is integrable on [a, b]. (Suggestion: Consider (f (x) + g(x))2 .) (c) Prove that for c, d ∈ R, max(c, d) = 21 (c + d + |c − d|), and find and prove a similar formula for min(c, d). (d) Now suppose also that f and g are real-valued. max(f (x), g(x)) are integrable on [a, b].

Prove that min(f (x), g(x)) and

(e) Suppose again that f is real-valued. Prove that Z b Z b ≤ f (x) dx |f (x)| dx. a

(3.4.26)

a

(Suggestion: Consider f+ (x) = max(f (x), 0) and f− (x) = max(−f (x), 0), and express both sides of (3.4.26) in terms of f+ and f− .) 3.4.6. (Proves Lemma 3.4.9 ) Suppose f : [a, b] → R is continuous and nonnegative (i.e., f (x) ≥ 0 for all x ∈ [a, b], and suppose that f (c) > 0 for some c ∈ [a, b]. (a) Prove that there exists some δ > 0 such that for c − δ ≤ x ≤ c + δ, we have f (x) > f (c)/2. Z b (b) Prove that f (x) dx > 0. (Suggestion: Separate [a, b] into [c − δ, c + δ] and its a

complement.)

3.5

The Fundamental Theorem of Calculus

Our complex-ified review/recovery/reboot of calculus now culminates in the Fundamental Theorems of Calculus. First, we need two definitions. Z b Definition 3.5.1. For b < a, if f (x) is integrable on [b, a], we define the symbol f (x) dx a

to be Z

b

Z f (x) dx = −

a

a

f (x) dx.

(3.5.1)

b

In other words, a definite integral “travelled backwards” is defined to be the negative of the corresponding forwards integral.

3.5. THE FUNDAMENTAL THEOREM OF CALCULUS

57

Definition 3.5.2. Let I be a subinterval (not necessarily closed) of R. For f : I → C such that f is integrable on any closed subinterval of I, we define an indefinite integral of f to be any function of the form Z x

F (x) =

f (t) dt,

(3.5.2)

a

where a ∈ I is fixed. Note that we only use t as the “inner” variable to distinguish it from the “outer” variable x that appears as a limit of integration in (3.5.2). Note also that if x < a, then (3.5.2) must be interpreted in the sense of Definition 3.5.1. We now state and prove the Fundamental Theorems of Calculus. Theorem 3.5.3 (FTC I). Let I be an interval, a ∈ I, let f : I → C be integrable on any closed subinterval of I, and let Z x F (x) = f (t) dt. (3.5.3) a

Then F is (uniformly) continuous on I, and furthermore, if f is continuous at some b ∈ I, then F is differentiable at b and F 0 (b) = f (b). Proof. Turning first to the continuity of F , since f is bounded, choose M > 0 such that |f (x)| < M for all x ∈ I. Fixing > 0, let δ() = /M , and suppose x, y ∈ I and |x − y| < δ. By symmetry, we may assume x < y, in which case Z y Z x f (t) dt − f (t) dt |F (y) − F (x)| = Za y Za y (3.5.4) f (t) dt ≤ |f (t)| dt < M (y − x) < , = x

x

by Theorems 3.4.4, 3.4.8, and 3.4.2 and Lemma 3.3.14. It follows that F is uniformly continuous on I. As for differentiability, suppose that f is continuous at b; more specifically, suppose that for any 0 > 0, there exists some δ0 (0 ) > 0 such that if |x − b| < δ0 (0 ), then |f (x) − f (b)| < 0 . To prove that F 0 (b) = f (b), we will prove the local linearity condition (Lemma 3.2.5) for m = f (b) by showing that if E(x) =

F (x) − F (b) − f (b), x−b

(3.5.5)

then lim E(x) = 0. x→a

Now, by Theorem 3.4.4 (possibly in the extended sense of Definition 3.5.1, see Problem 3.5.1), we have that Z x Z b Z x F (x) − F (b) = f (t) dt − f (t) dt = f (t) dt. (3.5.6) a

a

Furthermore, by Lemma 3.3.14, we know that Z x f (b) dt = f (b)(x − b). b

b

(3.5.7)

58

CHAPTER 3. COMPLEX-VALUED CALCULUS So given > 0, let δ() = δ0

2

. Then for |x − b| < δ(), x 6= b, we have

F (x) − F (b) |E(x)| = − f (b) x−b Z x Z x f (t) dt − f (b) dt b b = |x − b| Z x |f (t) − f (b)| dt ≤ b |x − b| Z x (/2) dt b ≤ |x − b| = /2 < ,

(3.5.8)

where the first inequality holds by Theorem 3.4.8, and the second holds because |x − b| < . The theorem follows by the -δ definition of the limit of a function. δ0 2 dF = F 0 (x) to create a greater visual contrast dx with F (x) and emphasize that the right-hand side of (3.5.9), below, is the integral of the derivative of F (x). In the following, we use the notation

Theorem 3.5.4 (FTC II). Let F : [a, b] → C be continuous on [a, b] and differentiable on dF (a, b), and suppose that is continuous on (a, b). Then dx Z b dF dx. (3.5.9) F (b) − F (a) = a dx Proof. Problem 3.5.2. One familiar and useful consequence of FTC II is the following theorem. Theorem 3.5.5. Let X be a subset of C, and let u : [a, b] → X and f : X → C be continuously differentiable. Then Z

b

f 0 (u(x))u0 (x) dx = f (u(b)) − f (u(a)).

(3.5.10)

a

If we further assume that X is a subinterval of R and g : X → C is continuous, then Z

b

0

Z

u(b)

g(u(x))u (x) dx = a

g(u) du. u(a)

(3.5.11)

3.6. OTHER RESULTS FROM CALCULUS

59

Proof. Problem 3.5.3. As the reader may (fondly?) recall, (3.5.11) provides a useful notation for doing multiple substitutions; e.g., we might next substitute w = w(u), and so on. We conclude our discussion of FTC by recovering (complex-valued) integration by parts. While this may seem like an afterthought, the reader unfamiliar with our subsequent material may be surprised at how useful, or even crucial, it turns out to be for us. Theorem 3.5.6 (Integration by parts). Let f, g : [a, b] → C be continuous on [a, b] and continuously differentiable on (a, b). Then Z

b

0

Z

f (x)g (x) dx = f (b)g(b) − f (a)g(a) −

b

g(x)f 0 (x) dx.

(3.5.12)

a

a

Proof. Problem 3.5.4.

Problems 3.5.1. (Proves Theorem 3.5.3 ) We know that Theorem 3.4.4 holds when a < b < c. Using Definition 3.5.1, prove that it also holds for the other five possible orderings of {a, b, c}. 3.5.2. (Proves Theorem 3.5.4 ) Let F : [a, b] → C be continuous on [a, b] and differentiable dF on (a, b), and suppose that is continuous on (a, b). dx Z x (a) Let G(x) = F 0 (t) dt and let H(x) = F (x) − G(x). Find the value of H 0 (x) for a

x ∈ (a, b) (with proof). What conclusion can you draw, and why? (b) Prove (3.5.9). (Suggestion: What is H(a)?) 3.5.3. (Proves Theorem 3.5.5 ) Let X be a subset of C, and let u : [a, b] → X and f : X → C be continuously differentiable. (a) Prove (3.5.10). (Suggestion: Chain rule.) (b) Now further assume that X is a subinterval of R and g : X → C is continuous. Prove (3.5.11). (Suggestion: Apply (3.5.10) to some F (x) such that F 0 (x) = g(x).) 3.5.4. (Proves Theorem 3.5.6 ) Let f, g : [a, b] → C be continuous on [a, b] and continuously differentiable on (a, b). Prove (3.5.12). (Suggestion: Combine the product rule and FTC II.)

3.6

Other results from calculus

In this section, we discuss miscellaneous results from calculus that are tangential to our main story, but will nevertheless be useful. The key definitions and results are:

60

CHAPTER 3. COMPLEX-VALUED CALCULUS • The asymptotic behavior of functions and sequences (Definition 3.6.7 and Theorems 3.6.9 and 3.6.12); • Fubini’s Theorem (Theorem 3.6.21); and • Differentiating under the integral sign (Theorem 3.6.23).

Because Fubini’s Theorem and differentiating under the integral sign both involve functions of two variables, we also briefly review some facts from multivariable calculus. In any case, we recommend, at least on a first reading, that the reader merely absorb the definitions and theorem statements listed above, and not worry about the details and proofs, other than perhaps out of sheer curiosity.

3.6.1

Asymptotics and L’Hˆ opital’s Rule

We begin our discussion of asymptotics by defining infinite limits of various types. Definition 3.6.1. For some a ∈ R, let f : (a, +∞) → C be a function, and let L be a complex number. To say that lim f (x) = L means that for every > 0, there exists some x→+∞

N () > 0 such that if x > N () then |f (x) − L| < . Comparing Definition 2.4.3, we see that if lim f (x) = L, then lim f (n) = L a fortiori. x→+∞

n→∞

We will also need to consider infinite-valued limits. Definition 3.6.2. For a real-valued sequence an , to say that lim an = +∞ means that n→∞

for every M > 0, there exists some N () ∈ R such that if n > N (), then an > M . Definition 3.6.3. For f : X → R, with X ⊆ R and a a limit point of X, to say that lim f (x) = +∞ means that for every M > 0, there exists some δ(M ) > 0 such that if x→a

|x − a| < δ() and x 6= a, then f (x) > M . Similarly, for a ∈ R and f : (a, +∞) → R, to say that lim f (x) = +∞ means that for every M > 0, there exists some N (M ) > 0 such x→+∞

that if x > N (M ) then f (x) > M . We observe that: Lemma 3.6.4. Retaining the notation of Definition 3.6.3, for a ∈ R or a = +∞, if 1 lim f (x) = +∞, then lim = 0. x→a x→a f (x) Proof. Problem 3.6.1. L’Hˆopital’s Rule is quite general, but we will only use the following cases. Note that in any case, this is definitely a result about real-valued functions. Theorem 3.6.5 (L’Hˆ opital’s Rule). Let f and g be real-valued differentiable functions on some X ⊆ R such that g 0 (x) 6= 0 for all x ∈ X and g(x) is strictly monotone (i.e., either strictly increasing or strictly decreasing) on X.

3.6. OTHER RESULTS FROM CALCULUS

61

1. If X = (a, b) and for some L ∈ R, lim f (x) = 0,

x→a+

then lim

x→a+

lim g(x) = 0,

x→a+

lim

x→a+

f 0 (x) = L, g 0 (x)

(3.6.1)

f (x) = L. g(x)

2. If X = (a, +∞) and for some L ∈ R, lim f (x) = +∞,

x→+∞

then lim

x→a+

lim g(x) = +∞,

x→+∞

f 0 (x) = L, x→+∞ g 0 (x)

(3.6.2)

lim

f (x) = L. g(x)

The reader may safely skip the proof of L’Hˆopital’s Rule, but we include it for completeness. Proof. We will require the following extension of the Mean Value Theorem: Let f, g : [a, b] → R be differentiable on (a, b) and continuous on [a, b]. Then there exists some c ∈ (a, b) such that (f (b) − f (a))g 0 (c) = (g(b) − g(a))f 0 (c).

(3.6.3)

See Problem 3.6.2 for a proof. Note that under our hypotheses, (3.6.3) becomes f (b) − f (a) f 0 (c) = 0 . g(b) − g(a) g (c)

(3.6.4)

f 0 (x) = L, there exists some δ > 0 such that x→a+ g 0 (x) 0 f (t) (3.6.5) g 0 (t) − L < /2

In case (1), given > 0, since lim

for all t ∈ (a, a + δ). Therefore, by (3.6.4), we see that for a < x < y < a + δ, we have 0 f (t) f (y) − f (x) (3.6.6) g(y) − g(x) − L = g 0 (t) − L < /2 for some t ∈ (x, y). Therefore, taking lim of both sides of (3.6.6), we see that x→a+

f (y) g(y) − L ≤ /2 < , and case (1) follows.

(3.6.7)

62

CHAPTER 3. COMPLEX-VALUED CALCULUS Similarly, for case (2), given > 0, there exists some M1 > 0 such that 0 f (t) < /3 − L g 0 (t)

(3.6.8)

for all t > M1 . Therefore, again, by (3.6.4), we see that for M1 < M2 < y, we have f (y) − f (M2 ) (3.6.9) g(y) − g(M2 ) − L < /3. We now fix M2 > M1 as above, and let h(y) =

g(y) − g(M2 ) , g(y)

(3.6.10)

observing that lim h(y) = 1. In particular, there exists some M3 > 0 such that h(y) < 3/2 y→+∞

for y > M3 . So now, multiplying (3.6.9) on both sides by |h(y)|, we see that f (y) − f (M2 ) < |h(y)| − Lh(y) 3 g(y)

(3.6.11)

for all y > M2 . Therefore, if we let f (M2 ) + |L| |h(y) − 1| , E(y) = g(y) then because

(3.6.12)

lim E(y) = 0, there exists some M4 > 0 such that E(y) < /3 for all

y→+∞

y > M4 . Therefore, if y > max {M1 , M2 , M3 , M4 }, by the triangle inequality, we see that f (y) − f (M2 ) f (M2 ) f (y) ≤ + − L − Lh(y) g(y) + |Lh(y) − L| g(y) g(y) (3.6.13) 3 < |h(y)| + E(y) < + < . 3 3 2 3 The theorem follows. Remark 3.6.6. In fact, the hypothesis that g(x) is strictly monotone on X is redundant, but harmless for our purposes. See Ross [Ros13, Thm. 30.2] for the details, and for a more complete statement and proof of L’Hˆopital’s Rule. The main reason we are interested in L’Hˆopital’s Rule is to study the following phenomenon. Definition 3.6.7. For positive-valued functions f, g : (a, +∞) → R, to say f (x) 0, there exists some N () independent of x ∈ X such that for any x ∈ X and n ∈ Z such that n > N (), we have |f (x) − fn (x)| < . Also, as usual, ∞ X uniform convergence of a series gn (x) is defined in terms of the uniform convergence of n=0

its sequence of partial sums fN (x) =

N X

gn (x).

n=0

For comparison, to say that fn converges pointwise to f on X means that for any x ∈ X and any > 0, there exists some N (, x) such that for any x ∈ X and n ∈ Z such that

82

CHAPTER 4. SERIES OF FUNCTIONS

n > N (, x), we have |f (x) − fn (x)| < . In other words, the difference between pointwise and uniform continuity is that in uniform continuity, there is some worst-case “rate of convergence” N () that holds for all x ∈ X simultaneously. To start our discussion of uniform convergence, we first note that the usual algebraic rules apply to uniform convergence. Theorem 4.3.2. For a nonempty X ⊆ C, let fn , gn : X → C be sequences of functions, let f, g : X → C be functions, and suppose that fn and gn converge uniformly to f and g, respectively. Then fn + gn converges uniformly to f + g, and for c ∈ C, cfn converges uniformly to cf . Proof. Problem 4.3.1. Also as usual, we can show that uniform convergence can be broken down to real and imaginary parts. Lemma 4.3.3. Let X be a nonempty subset of C, let fn : X → C be a sequence of functions, and let f : X → C be a function. Let fn (x) = un (x) + ivn (x) and f (x) = u(x) + iv(x) be the respective real-imaginary decompositions. Then fn converges uniformly to f if and only if un converges uniformly to u and vn converges uniformly to v. Proof. Problem 4.3.2. The completeness of C gives a necessary and sufficient condition for uniform convergence for which we need the following analogue to Definition 4.3.1. Definition 4.3.4. Let X be a nonempty subset of C and let fn : X → C be a sequence of functions. To say that the sequence fn is uniformly Cauchy means that for any > 0, there exists some N () independent of x ∈ X such that for any x ∈ X and n, k ∈ Z such that n, k > N (), we have |fn (x) − fk (x)| < . Theorem 4.3.5. Let X be a nonempty subset of C and let fn : X → C be a sequence of functions. Then fn converges uniformly to some f : X → C if and only if fn is uniformly Cauchy. Proof. On the one hand, if fn converges uniformly to some f : X → C, then applying the proof of Theorem 2.5.2 independently of x ∈ X, we see that fn must be uniformly Cauchy. For the converse, see Problem 4.3.3. The following necessary and sufficient condition is also sometimes useful in proving uniform or non-uniform convergence. Lemma 4.3.6. Let X be a nonempty subset of C, let fn : X → C be a sequence of functions, let f : X → C be a function, and let dn = sup {|fn (x) − f (x)| | x ∈ X} .

(4.3.1)

(If {|fn (x) − f (x)| | x ∈ X} is unbounded, we set dn = +∞.) Then fn converges uniformly to f if and only if lim dn = 0. n→∞

4.3. UNIFORM CONVERGENCE

83

As we shall see, Lemma 4.3.6 is really a rewriting of the definition of uniform continuity. The advantage in particular examples is that when fn and f are differentiable on a closed and bounded X, then dn becomes a maximum that we can compute using calculus; see Problem 4.3.6 for an example. Also, see Section 5.2 for another interpretation of dn . Proof. On the one hand, if lim dn = 0, since |fn (x) − f (x)| ≤ dn for all x ∈ X, fn (x) n→∞

converges to f (x) with rate of convergence independent of x. Conversely, suppose that for any > 0, there exists some N () such that for any x ∈ X and n ∈ Z such that n > N (), we have |f (x) − fn (x)| < . Then if n > N (/2), we have that /2 is an upper bound for the set Dn = {|fn (x) − f (x)| | x ∈ X}, so because dn is the least upper bound for Dn , dn ≤ /2 < . The lemma follows. The criterion we will use most often to prove uniform convergence is the Weierstrass M -test for uniform convergence of series. As a bonus, we will see that the M -test also yields absolute convergence, which will prove useful later. Theorem 4.3.7 (Weierstrass M -test). Let X be a nonempty subset of C, let gn : X → C be a sequence X of functions, and suppose that Mn is a sequence of nonnegative real numbers such that Mn converges (absolutely) and |gn (x)| ≤ Mn for all x ∈ X. Then

∞ X

(4.3.2)

gn (x) converges absolutely and uniformly to some f : X → C.

n=0

Note that a key feature of the M -test is that we do not need to know anything about f (x) beforehand to prove uniform convergence. Proof. For any k, m ∈ Z, k < m, and any x ∈ X, we have that m m m m X X X X gn (x) ≤ |gn (x)| ≤ Mn = Mn . n=k

n=k

n=k

(4.3.3)

n=k

X By Corollary 4.1.5, we know that Mn satisfies the Cauchy criterion for series, so (4.3.3) X implies that gn (x) satisfies the Cauchy criterion as well. Furthermore, the estimate X (4.3.3) is independent of x, so gn (x) is uniformly Cauchy, and therefore, by Theorem 4.3.5, uniformly convergent. Absolute convergence also follows because (4.3.2) relies only on |gn (x)|, and the theorem follows. In any case, given uniform convergence, we can ensure a “yes” answer to many of the questions of Section 4.2. For example: Theorem 4.3.8 (Uniform YES: QC). Let X be a nonempty subset of C and let fn : X → C be a sequence of functions, each continuous on X, such that fn converges uniformly on X to some f : X → C. Then f is continuous on X.

84

CHAPTER 4. SERIES OF FUNCTIONS

Proof. Problem 4.3.4.

Theorem 4.3.9 (Uniform YES: QI1). Let fn : [a, b] → C be a sequence of functions, each integrable on [a, b], such that fn converges uniformly on [a, b] to some f : [a, b] → C. Then f is integrable on [a, b]. Before reading this proof, the reader may wish to review Lemma 3.3.12 and the definitions of µ(f ; P, i) and E(f ; P ) given there.

Proof. Fix > 0. First, since fn converges uniformly to f , for any 0 , there exists N (0 ) such that if n> N (0 ), then |fn (x) − f (x)| < 0 for any x ∈ [a, b]. Therefore, we may choose some n > N such that for any x ∈ [a, b], |fn (x) − f (x)| < . Next, since 3(b − a) 3(b − a) fn is integrable, by Lemma 3.3.12, we may choose a partition P = {x0 = a, . . . , xm = b} of [a, b] such that E(fn ; P ) < /3.

fn(x) f(x)

fn(y) f(y)

Figure 4.3.1: A three-piece path from f (x) to f (y) The key observation is that for x, y ∈ [xi−1 , xi ] (the ith subinterval of P ), |f (x) − f (y)| ≤ |f (x) − fn (x)| + |fn (x) − fn (y)| + |fn (y) − f (y)| < + µ(fn ; P, i) + 3(b − a) 3(b − a) 2 = µ(fn ; P, i) + , 3(b − a)

(4.3.4)

where the first inequality follows by the triangle inequality (see Figure 4.3.1), and the second 2 by our choice of fn and the definition of µ(fn ; P, i). It follows that µ(fn ; P, i) + is 3(b − a) an upper bound for {|f (x) − f (y)| | x, y ∈ [xi−1 , xi ]}, which means that µ(f ; P, i) ≤ µ(fn ; P, i) +

2 . 3(b − a)

(4.3.5)

4.3. UNIFORM CONVERGENCE Therefore, E(f ; P ) = ≤

m X

µ(f ; P, i)(∆x)i

i=1 m X i=1

=

85

m X

2 µ(fn ; P, i) + 3(b − a)

µ(fn ; P, i)(∆x)i +

i=1

m X i=1

= E(fn ; P, i) +

2 3(b − a)

(∆x)i

2 (∆x)i 3(b − a)

(4.3.6)

m X (∆x)i i=1

2 < + = , 3 3 where the last (strict) inequality follows by our choice of P and the fact that

m X

(∆x)i =

i=1

b − a. The theorem follows.

Theorem 4.3.10 (Uniform YES: QI2). Let fn : [a, b] → C be a sequence of functions, each integrable on [a, b], such that fn converges uniformly on [a, b] to some f : [a, b] → C. Then b

Z

f (x) dx =

Z b

a

a

Z b lim fn (x) dx = lim fn (x) dx.

n→∞

n→∞ a

(4.3.7)

In other words, the integral of the limit is the limit of the integrals. Proof. Problem 4.3.5. xn+1 n+1 to 0 (Example 4.2.3) is uniform on [0, 1], which means that the answer to question QD2 of Section 4.2 is still NO, even if fn converges uniformly to f . As for QD1, let X = [−1, 1], and consider (Figure 4.3.2) Example 4.3.11 (Still NO: QD2 and QD1). First, we note that the convergence of

fn (x) = |x|1+(1/n) ,

f (x) = |x| .

(4.3.8)

For fixed x ≥ 0, xt is a continuous function of t ∈ R, so lim fn (x) = f (x). However, from n→∞

Problem 3.2.6, each fn (x) is differentiable on X, but f (x) is not differentiable at 0. It is nevertheless true, but harder to show, that fn converges uniformly to f . One approach is to restrict our attention to x ≥ 0 by symmetry, and let n o 1+(1/n) dn = max x − x x ∈ [0, 1] . We can then use calculus to prove that lim dn = 0 (Problem 4.3.6), implying that convern→∞

gence is uniform (Lemma 4.3.6).

86

CHAPTER 4. SERIES OF FUNCTIONS

Figure 4.3.2: Differentiable functions converging uniformly to a non-differentiable function To overcome the kind of problems we see in Examples 4.2.3 and 4.3.11 and obtain differentiability of the limit function, instead of assuming uniform convergence of the sequence fn , we assume the uniform convergence of fn0 . The following theorem is not the best possible one we could obtain at this point, but it suffices for our purposes and avoids certain complications (see Remark 4.3.15). Theorem 4.3.12 (Uniform derivative YES: QD1 and QD2). Let X be a nonempty open subset of C, and for fixed c ∈ X and L ∈ C, let fn : X → C be a sequence of differentiable functions that converges pointwise to f : X → C. Suppose each fn0 is continuous and the sequence fn0 converges uniformly to some g : X → C. Then f is differentiable on X and f 0 (x) = g(x), or in other words, d d lim fn (x) = lim fn (x) . (4.3.9) n→∞ dx dx n→∞ See Definitions 2.6.1 and 2.6.2 for relevant definitions and notation. f (x) − f (a) , it suffices to consider only x−a those x ∈ Nr (a) (Definition 2.4.7) for some r > 0, so we assume X = Nr (a) for the rest of the proof. First, for fixed x ∈ Nr (a), observe that the function ux : [0, 1] → C given by Proof. Fix a ∈ X. Since X is open, to compute lim

x→a

ux (t) = tx + (1 − t)a

(4.3.10)

is a function such that for all t ∈ [0, 1], u0x (t) = x − a and |ux (t) − a| ≤ |x − a| (Problem 4.3.7), with ux (0) = a and ux (1) = x. It follows that the image of ux is contained entirely within Nr (a).

4.3. UNIFORM CONVERGENCE

87

x r a

Figure 4.3.3: A path inside the r-neighborhood of a We now come to the key idea, which is to compute Z lim

n→∞ 0

1

fn0 (ux (t))u0x (t) dt

(4.3.11)

in two different ways. If we first apply substitution (Theorem 3.5.5), we get Z lim

n→∞ 0

1

fn0 (ux (t))u0x (t) dt = lim [fn (ux (1)) − fn (ux (0))] n→∞

(4.3.12)

= lim (fn (x) − fn (a)) = f (x) − f (a). n→∞

On the other hand, by Theorem 4.3.10 and the uniform convergence of fn0 to g, we may also Z 1 exchange lim and in (4.3.11) and obtain: n→∞

Z

1

0

lim fn0 (ux (t))u0x (t) dt =

Z

0 n→∞

1

Z g(ux (t))(x − a) dt = (x − a)

0

1

g(ux (t)) dt.

(4.3.13)

0

Equating (4.3.12) and (4.3.13), we see that for x 6= a, Z

1

g(ux (t)) dt − g(a) 0 Z 1 Z 1 = g(ux (t)) dt − g(a) dt 0 0 Z 1 = (g(ux (t)) − g(a)) dt.

f (x) − f (a) − g(a) = x−a

(4.3.14)

0

f (x) − f (a) = g(a). First, by Theorem 4.3.8 and the x→a x−a uniform convergence of the continuous functions fn0 to g, we see that g is continuous at a, and therefore, for any > 0, there exists some δ() > 0 such that if |x − a| < δ(), then |g(x) − g(a)| < . It follows from (4.3.14) and Theorem 3.4.8 that whenever |x − a| < δ(/2) It remains to prove that lim

88

CHAPTER 4. SERIES OF FUNCTIONS

and x 6= a, we have Z 1 f (x) − f (a) − g(a) = (g(ux (t)) − g(a)) dt x−a 0 Z 1 |g(ux (t)) − g(a)| dt ≤ 0 Z 1 ≤ (/2) dt = /2 < .

(4.3.15)

0

The theorem follows. Remark 4.3.13. Note that the case of Theorem X X 4.3.12 that we will use most often is the case where gn0 (x) converges uniformly and gn (x) converges, in which case we have X d X gn (x) = gn0 (x). dx

(4.3.16)

We call the interchange of infinite sum and derivative in (4.3.16) term-by-term differentiation, and we can paraphrase Theorem 4.3.12 as saying that uniform convergence of the derivative series plus convergence of the original series imply that term-by-term differentiation is valid. As an alternative to Theorem 4.3.12, we have the following real-domain version. Theorem 4.3.14. Let I be an interval in R, and for fixed c ∈ I and L ∈ C, let fn : X → C be a sequence of differentiable functions such that lim fn (c) = L. Suppose each fn0 is n→∞

continuous and the sequence fn0 converges uniformly to some g : X → C. Then f is differentiable on I and f 0 (x) = g(x). Proof. Problem 4.3.8. Remark 4.3.15. Note that Theorems 4.3.12 and 4.3.14 each have their virtues: Theorem 4.3.12 allows for complex domains, but assumes the pointwise convergence of fn on the entire domain, whereas Theorem 4.3.14 only makes sense for real domains, but also only assumes the convergence of fn at a single point c. We could try to combine the two theorems, but we would first have to develop the theory of integrating a complex-valued function along a (piecewise continuously differentiable) path, as well as a sufficient understanding of the geometry of domains in the complex plane, both of which are a matter beyond the scope of this book. See a good textbook on complex analysis for an account of both (e.g., Ahlfors [Ahl79] or Conway [Con78, Con96]).

Problems 4.3.1. (Proves Theorem 4.3.2 ) Suppose X ⊆ C is nonempty, let fn , gn : X → C be sequences of functions, let f, g : X → C be functions, and suppose that fn and gn converge uniformly to f and g, respectively.

4.3. UNIFORM CONVERGENCE

89

(a) Prove that fn + gn converges uniformly to f + g (b) Prove that for c ∈ C, cfn converges uniformly to cf . 4.3.2. (Proves Lemma 4.3.3 ) Let X be a nonempty subset of C, let fn : X → C be a sequence of functions, and let f : X → C be a function. Let fn (x) = un (x) + ivn (x) and f (x) = u(x) + iv(x) be the respective real-imaginary decompositions. Prove that fn converges uniformly to f if and only if un converges uniformly to u and vn converges uniformly to v. 4.3.3. (Proves Theorem 4.3.5 ) Let X be a nonempty subset of C and let fn : X → C be a uniformly Cauchy sequence of functions. (a) Prove that fn (x) converges pointwise to some f : X → C; in other words, prove that for any x ∈ X, lim fn (x) exists. (Suggestion: Completeness of C.) n→∞

(b) Fix x ∈ X, k ∈ Z, and 0 > 0. Prove that if |fn (x) − fk (x)| < 0 for all n > k, then |f (x) − fk (x)| ≤ 0 . (Suggestion: Theorem 2.4.9.) (c) Prove that fn converges uniformly to f . (Suggestion: Use part (b) of this problem and a well-chosen 0 .) 4.3.4. (Proves Theorem 4.3.8 ) Let X be a nonempty subset of C and let fn : X → C be a sequence of functions, each continuous on X, such that fn converges uniformly on X to some f : X → C. (a) Suppose there exists some U ⊂ X, some a ∈ U , and some α, β > 0 such that |f (x) − fn (x)| < α and |fn (x) − fn (a)| < β for all x ∈ U . Find the best possible upper bound for |f (x) − f (a)| that applies to all x ∈ U . (Suggestion: Figure 4.3.1.) (b) Prove that for any a ∈ X, f is continuous at a. (Suggestion: Given > 0, first choose an n, then a δ.) 4.3.5. (Proves Theorem 4.3.10 ) Let fn : [a, b] → C be a sequence of functions, each continuous on [a, b], such that fn converges uniformly on [a, b] to some f : [a, b] → C. Prove that Z Z b

b

f (x) dx = lim a

n→∞ a

fn (x) dx.

(4.3.17)

Z b Z b (Suggestion: Use Theorem 3.4.8 to get an upper bound for fn (x) dx − f (x) dx .) a

a

4.3.6. Let dn = max x − x1+(1/n) x ∈ [0, 1] .

(a) Fix n, and use ordinary calculus to find an expression for dn . (b) Prove that lim dn = 0. n→∞

4.3.7. (Proves Theorem 4.3.12 ) Fix a, x ∈ C and let ux : [0, 1] → C be given by ux (t) = tx + (1 − t)a. Prove that for all t ∈ [0, 1], u0x (t) = x − a and |ux (t) − a| ≤ |x − a|.

90

CHAPTER 4. SERIES OF FUNCTIONS

4.3.8. (Proves Theorem 4.3.14 ) Let I be an interval in R, and for fixed c ∈ I and L ∈ C, let fn : I → C be a sequence of differentiable functions such that lim fn (c) = L. Suppose n→∞

each fn0 is continuous and the sequence fn0 converges uniformly to some g : I → C. Z x (a) Simplify fn0 (t) dt, with proof. c

(b) Use Theorem 4.3.10 to prove that fn converges to some f : I → C. (c) Prove that for all x ∈ I, f 0 (x) = g(x).

4.4

Power series

We now apply what we have developed in this chapter so far to the following important special case. Definition 4.4.1. A power series is a (complex-valued) series of the form f (x) =

∞ X

an xn ,

n=0

where the an ∈ C are the coefficients of the power series, and we interpret x0 as the constant function 1. The reader may recall that a power series is governed by its radius of convergence. For our purposes, we will only require the following “ratio test” version of the radius of convergence. (See Remark 4.4.4 for a description of the full version.) Theorem 4.4.2. Let f (x) =

∞ X n=0

an+1 exists, an x be a power series such that ρ = lim n→∞ an n

1 and let R = , where we define R = ∞ when ρ = 0. Then: ρ 1. For any R0 such that 0 ≤ R0 < R, the power series f (x) converges uniformly on the closed disc NR0 (0). 2. It follows that f (x) converges pointwise (but not necessarily uniformly) on the open disc NR (0). bn+1 = ρ as well. 3. Let bn = nan . Then lim n→∞ bn 4. It follows that f (x) is differentiable on NR (0), and that 0

f (x) =

∞ X n=1

for any x ∈ NR (0).

nan x

n−1

=

∞ X k=0

(k + 1)ak+1 xk

(4.4.1)

4.4. POWER SERIES

91

Proof. Claim (1) is proven in Problem 4.4.1. Therefore, for |x| < R, if we let R0 = (|x| + R)/2, f (x) converges uniformly on NR0 (0), and in particular, pointwise at x. (Problem 4.4.2 shows that convergence on NR (0) need not be uniform.) Claim (3) is proven in Problem 4.4.3. If we let cn = (n + 1)an+1 , it then follows from ∞ X Claims (1)–(3) that cn xn converges pointwise on NR (0) and uniformly on NR0 (0) for n=0

any R0 such that 0 ≤ R0 < R. Therefore, for any fixed x such that |x| < R, taking R0 = (|x| + R)/2, we may apply term-by-term differentiation (Theorem 4.3.12) on the open set NR0 (0) ⊆ NR0 (0) to obtain (4.4.1). The theorem follows. Definition 4.4.3. The quantity R in Theorem 4.4.2 is called the radius of convergence of f (x). Remark 4.4.4. As mentioned in Remark 4.1.15, the root test gives an unconditional version of Theorem 4.4.2; in particular, every power series has a radius of convergence. More ∞ X p precisely, for the power series f (x) = an xn , let ρ = lim sup n |an |, which always exists n→∞

n=0

if we allow the possibility of ρ = ∞ (Remark 4.1.15). Then the rest of Theorem 4.4.2 still holds, and is proven in the same way, substituting the root test for the ratio test; we leave the details to the interested reader (or see Ross [Ros13, Thm. 23.1]). Remark 4.4.5. The reader may also wonder why we are making a distinction between closed and open discs in the statement of Theorem 4.4.2, when we could just make all of the discs open without really changing the theorem. The reason is that the phenomenon of converging uniformly on compact sets (see Definition 2.6.6 and Corollary 2.6.7) is useful elsewhere in analysis; see, for example, the study of families of holomorphic functions in complex analysis (Ahlfors [Ahl79, Ch. 5], Conway [Con78, Ch. VII]).

Problems 4.4.1. (Proves Theorem 4.4.2 ) Let f (x) = an+1 ∈ R (i.e., ρ < ∞). lim n→∞ an

∞ X

an xn be a power series such that ρ =

n=0

(a) Suppose ρ > 0, let R = 1/ρ, and choose R0 ∈ R such that 0 ≤ R0 < R. Prove that f (x) converges uniformly on the closed disc C(R0 ) = {z ∈ C | |z| ≤ R0 }. (Suggestion: First prove that f (R0 ) converges absolutely, and then use the M-test.) (b) Suppose ρ = 0, and choose any real R0 > 0. Prove that f (x) converges uniformly on the closed disc C(R0 ). 4.4.2. Let fN (x) =

N X n=0

xn =

1 − xN +1 , 1−x

f (x) =

1 . 1−x

(4.4.2)

92

CHAPTER 4. SERIES OF FUNCTIONS

In this problem, we show that fN does not converge uniformly to f on (0, 1), even though it converges pointwise on (−1, 1). 1 . (Suggestion: In 2 fact, dN = ∞, but you do not need to prove that; it suffices to find some x ∈ [0, 1) 1 such that |f (x) − fN (x)| > .) 2 (b) Prove that fN does not converge uniformly to f on (0, 1). (Suggestion: Lemma 4.3.6.) an+1 4.4.3. (Proves Theorem 4.4.2 ) Prove that if an is a sequence in C such that ρ = lim n→∞ an bn+1 = ρ as well. exists, and bn = nan , then lim n→∞ bn (a) Fix N , and let dN = sup {|f (x) − fN (x)| | x ∈ X}. Prove dN >

4.5

Exponential and trigonometric functions

As a benchmark of the progress we have made in understanding series of functions, we now define the complex exponential function and derive its usual properties, including the famous Euler formula eix = cos x + i sin x. We begin with a definition. Definition 4.5.1. For x ∈ C, we define E(x) to be the power series ∞ ∞ X xn X 1 E(x) = = xn . n! n! n=0

(4.5.1)

n=0

In other words, instead of thinking of (4.5.1) as a formula derived from the study of Taylor series, we use (4.5.1) to define the exponential function; then, once the standard properties of E(x) are fully established, we will write E(x) as ex . However, note that unlike the exponential function from ordinary calculus, we allow arbitrary complex exponents in ex , as the exponential functions we will use most often are the functions that we will later call en (x) = e2πinx . We now turn to the basic properties of E(x); all of the proofs (save a few uninteresting ones) are left as problems for the reader. We first apply our results on power series from Section 4.4. Theorem 4.5.2. The power series E(x) has radius of convergence R = ∞. Furthermore, E(0) = 1, E(x) = E(x), and for all x ∈ C, E 0 (x) = E(x). Proof. Problem 4.5.1. We may therefore think of E(x) as a function E : C → C that is infinitely differentiable, or smooth, meaning that the nth derivative E (n) (x) exists for every natural number n (and is equal to E(x)). Note also that by the (complex-valued) chain rule and Problem 3.2.1, for any α ∈ C, we have the innocent-looking but important formula d (E(αx)) = αE(αx). dx

(4.5.2)

4.5. EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS

93

Next, we show that E(x) only takes on nonzero values. Theorem 4.5.3. For any x ∈ C, E(x) 6= 0. Proof. Problem 4.5.2 shows that for any x ∈ C, E(x)E(−x) = 1, and the theorem follows immediately. Theorem 4.5.4. For x, y ∈ C, we have that E(x + y) = E(x)E(y). Proof. Problem 4.5.3. Turning to trigonometric functions, instead of deriving the Euler formula from the properties of the sine and cosine functions, we will use the Euler formula to define the cosine and sine functions and derive their properties. To avoid circular reasoning, however, we will only call these functions C(x) and S(x) until their usual properties have been proven. Definition 4.5.5. We define functions C : R → R and S : R → R to be the real and imaginary parts of E(ix), or in other words, by definition, E(ix) = C(x) + iS(x)

(4.5.3)

for all x ∈ R. Remark 4.5.6. By (2.2.5) and Theorem 4.5.2, we see that C(x) =

eix + e−ix , 2

S(x) =

eix − e−ix . 2i

(4.5.4)

We will mostly use these formulas to make it clear that certain a priori complex expressions are actually real, though the reader may later find it helpful in complex analysis to use (4.5.4) to extend C and S to all of C. Theorem 4.5.7. For any x ∈ R, we have that: 1. C(x) =

∞ X (−1)n x2n n=0

(2n)!

and S(x) =

∞ X (−1)n x2n+1 n=0

(2n + 1)!

.

2. C(0) = 1 and S(0) = 0. 3. C and S are even and odd functions, respectively; that is, C(−x) = C(x) and S(−x) = −S(x). 4. |E(ix)| = 1 and C(x)2 + S(x)2 = 1. 5. C 0 (x) = −S(x) and S 0 (x) = C(x). Proof. Since E(ix) =

∞ X (ix)n n=0

n!

= 1 + ix −

x2 ix3 − + ..., 2! 3!

(4.5.5)

Claim (1) follows from the pattern +1, +i, −1, −i in in , and Claim (2) is equivalent to the fact that E(0) = 1 + 0i. The other claims are proven in Problem 4.5.4.

94

CHAPTER 4. SERIES OF FUNCTIONS

Definition 4.5.8. We define V = {x ∈ R | x > 0 and C(x) = 0}, i.e., V is the set of all positive zeros of C(x). Lemma 4.5.9. The set V is nonempty, i.e., there exists some x > 0 such that C(x) = 0. Proof. Problem 4.5.5. Definition 4.5.10. We define π = 2 inf V , or in other words, we define π/2 to be the infimum of all positive zeros of V . Theorem 4.5.11. We have that: 1. C(π/2) = 0. 2. S(π/2) = 1, and therefore, E(πi/2) = i. 3. E(2πi) = 1. 4. For any x ∈ R, E(i(x + 2π)) = E(ix). Note that (3) is precisely Euler’s identity e2πi = 1, and (4) says that E(ix) is periodic with period 2π. Note also that after proving Theorem 4.5.11, we are now justified in using the name cos x for C(x) and sin x for S(x), and we do so in the sequel. Proof. Once we know that C(π/2) = 0 and S(π/2) = 1, it follows that E(πi/2) = 0+1i = i. Also, given Claim (3), Claim (4) follows by Theorem 4.5.4. The rest of the theorem is proven in Problem 4.5.6. The reader seeing complex exponentials for the first time may find it helpful to memorize the values of eix shown in Figure 4.5.1.

e π i/2 eπ i

e0

e3 π i/2 Figure 4.5.1: Special values of eix on the unit cirle

Problems 4.5.1. (Proves Theorem 4.5.2 ) Define E : C → C by (4.5.1). (a) Prove that E(0) = 1. (b) Prove that the radius of convergence of E is R = ∞.

4.5. EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS

95

(c) Prove that for all x ∈ C, E(x) = E(x). (Suggestion: First consider partial sums.) (d) Prove that for all x ∈ C, E 0 (x) = E(x). (Justify convergence carefully.) Suggestion for subsequent problems: From here on out, you should be able to proceed using only the properties of E(x) proved in this problem, without having to refer to the power series that defines E(x). 4.5.2. (Proves Theorem 4.5.3 ) Let f (x) = E(x)E(−x). (a) Calculate f 0 (x). What can you conclude? (b) Prove that E(x)E(−x) = 1. 4.5.3. (Proves Theorem 4.5.4 ) Fix b ∈ C, and for any x ∈ C, let f (x) =

E(x + b) . (Note E(x)

that f is differentiable because of Theorem 4.5.3.) (a) Calculate f 0 (x). (b) Prove that for all x ∈ C, f (x) = E(b). (Theorem 4.5.4 follows.) 4.5.4. (Proves Theorem 4.5.7 ) Assume x ∈ R. (a) Prove that C(−x) = C(x) and S(−x) = −S(x). (b) Prove that |E(ix)|2 = 1 and C(x)2 + S(x)2 = 1. (c) Prove that C 0 (x) = −S(x) and S 0 (x) = C(x). Suggestion for all parts: Instead of using the power series for C(x) and S(x), use E(ix), and keep E(ix) in mind. Suggestion for subsequent problems: Again, from here on out, you should be able to proceed using only the properties of C(x) and S(x) proved in this problem, without having to refer to their power series. 4.5.5. (Proves Lemma 4.5.9 ) Proceeding by contradiction in the proof of Lemma 4.5.9, assume for the entirety of this problem that C(x) > 0 for all x > 0. (a) Let m = S(1). Prove that m = S(1) > 0. (b) Prove that C 0 (x) < −m for all x > 1. (c) Prove that for x > 1, C(x) − C(1) < −m(x − 1). Obtain a contradiction with the assumption that C(x) > 0 for all x > 0. 4.5.6. (Proves Theorem 4.5.11 ) (a) Prove that there exists a sequence xn in R such that C(xn ) = 0 for all n, xn ≥ π/2, and lim xn = π/2. (Suggestion: Arbitrarily Close Criterion.) n→∞

(b) (c) (d) (e)

Prove Prove Prove Prove

that that that that

C(π/2) = 0. for 0 ≤ x < π/2, C(x) > 0. (Suggestion: Intermediate Value Theorem.) S(π/2) = 1. (Suggestion: Why is S(π/2) > 0?) E(2πi) = 1. (Suggestion: Use E(πi/2).)

96

CHAPTER 4. SERIES OF FUNCTIONS

4.6

More about exponential functions

In this section, building on Section 4.5, we establish some notation and some results that we will need later, as well as a few results we used earlier without proof. Definition 4.6.1. For n ∈ Z, we define the function en : R → C by en (x) = e2πinx .

(4.6.1)

Note that by Theorem 4.5.2, en (x) = e2πinx = e−2πinx = e−n (x). Remark 4.6.2. The reader may not realize this, but we have just made an important choice of conventions that will affect everything else we do. For example, by Theorem 4.5.7, we have that en (x + 1) = e2πin(x+1) = e2πinx+n(2πi)) = e2πinx = en (x), (4.6.2) or in other words, en is periodic with period 1. On the other hand, (4.5.2) tells us that e0n (x) = (2πin)en (x),

(4.6.3)

which means that the constant 2πi will appear in many of our derviative and integral formulas involving en (x). In contrast, other authors and practitioners of Fourier analysis may prefer to use einx or even eπinx , so be warned: In other sources, those 2π, π, and 2πi constants will appear in different places than they do here. We have the following indefinite integrals of functions related to en (x). Z e−n (x) en (x) dx = − +C 2πin Z xe−n (x) e−n (x) x en (x) dx = − − +C 2πin (2πin)2 Z x2 e−n (x) 2xe−n (x) 2e−n (x) x2 en (x) dx = − − − + C, 2πin (2πin)2 (2πin)3 or in general, k k−1 Z k! x e−n (x) k! x e−n (x) xk en (x) dx = − − − ··· k! 2πin (k − 1)! (2πin)2 k! xe−n (x) k! e−n (x) − − + C. k 1! (2πin) 0! (2πin)k+1

(4.6.4) (4.6.5) (4.6.6)

(4.6.7)

Note that we will often integrate functions of the form f (x)en (x) = f (x)e−n (x), for reasons that will become clear. We also have the following definite integrals, which turn out to be part of the foundations of Fourier analysis. ( Z 1 1 if n = k, (4.6.8) en (x) ek (x) dx = 0 otherwise. 0

4.6. MORE ABOUT EXPONENTIAL FUNCTIONS

97

See Problems 4.6.1–4.6.2 for proofs of the above integration formulas. We also have the following useful special values of en and e−n , for any n, k ∈ Z: en (k) = e−n (k) = 1 1 1 = e−n = (−1)n en 2 2 1 1 en = e−n − = in 4 4 1 1 = e−n = (−i)n . en − 4 4

(4.6.9) (4.6.10) (4.6.11) (4.6.12)

As the reader may recall from calculus or differential equations, linear differential equations with constant coefficients often have solutions expressed in terms of exponential and trig functions. To be specific: Theorem 4.6.3. Consider the interval I = [0, b] or I = [0, +∞) and its interior I0 = (0, b) or (0, +∞). 1. For α, C ∈ C, the differential equation f 0 (x) = αf (x) has exactly one solution that is continuous on I, differentiable on I0 , and satisfies f (0) = C, namely, f (x) = Ceαx . 2. For C0 , C1 ∈ C and α > 0, the differential equation f 00 (x) = −α2 f (x) has exactly one solution that is continuously differentiable on I, twice differentiable on I0 , and satisfies f (0) = C0 and f 0 (0) = C1 , namely, C1 f (x) = C0 cos(αx) + sin(αx). (4.6.13) α Proof. Problems 4.6.3 and 4.6.4. Finally, for the completist, we recover the fundamental facts about the log, power, and exponential functions promised in Lemma 3.6.10. We begin with some observations about the exponential function restricted to the domain R. Theorem 4.6.4. Let E : R → R be the restriction of ex to the real numbers. Then: 1. For all x ∈ R, E 0 (x) = E(x) > 0 2. E is a strictly increasing function on R. 3. For all b > 0, E(b) ≥ 1 + b. 4. The image of E is precisely {y ∈ R | y > 0}. Proof. Problem 4.6.5. The main point, then, is the definition of ln x and ab .

98

CHAPTER 4. SERIES OF FUNCTIONS

Definition 4.6.5. Let X = {x ∈ R | x > 0}. We define ln : X → R to be the inverse of the function E : R → X defined in Theorem 3.2.15. (Note that ln is well-defined by Theorems 3.2.15 and 4.6.4.) Also, for a > 0 and b ∈ R, we define ab = eb ln a .

(4.6.14)

In particular, for a, c > 0, the functions xa and cx are well-defined on the domains X and R, respectively. Lemma 3.6.10 then becomes a calculus problem (Problem 4.6.6).

Problems 4.6.1. Use induction on k and integration by parts to prove (4.6.7). (In particular, prove the base case (4.6.4).) 4.6.2. Prove (4.6.8). (This is primarily a calculation, but an interesting one.) 4.6.3. (Proves Theorem 4.6.3 ) Suppose α, C ∈ C, and suppose f is a function that is continuous on [0, b], differentiable on (0, b), and satisfies f 0 (x) = αf (x) on (0, b). (a) Let g(x) = f (x)e−αx . Calculate g 0 (x) for x ∈ (0, b) (with proof). (b) Prove that f (x) = f (0)eαx for all x ∈ [0, b]. 4.6.4. (Proves Theorem 4.6.3 ) Fix C0 , C1 ∈ C and α > 0. (a) Prove that f (x) = C0 cos(αx) +

C1 α

sin(αx).

(4.6.15)

satisfies f 00 (x) = −α2 f (x), f (0) = C0 , and f 0 (0) = C1 . (b) Suppose f is given by (4.6.15), and g is continuously differentiable on [0, b], twice differentiable on (0, b), and also satisfies g 00 (x) = −α2 g(x), g(0) = C0 , and g 0 (0) = C1 . Let h(x) = f (x) − g(x). Compute h(0), h0 (0) and h00 (x) in terms of h(x). (c) Let h be the function from part (b). By considering the function k(x) = α2 (h(x))2 + (h0 (x))2 , prove that h(x) = 0 for all x ∈ [0, b]. (Suggestion: Compute k 0 (x).) 4.6.5. (Proves Theorem 4.6.4 ) Let E : R → R be the restriction of ex to the real numbers. (a) Prove that for all x ∈ R, E 0 (x) = E(x) > 0. (b) Prove that for all b > 0, E(b) ≥ 1 + b. (Suggestion: Mean Value Theorem.) (c) Prove that the image of E is precisely {y ∈ R | y > 0}. (Suggestion: For y > 1, use the Intermediate Value Theorem; then for 0 < y < 1, use Problem 4.5.2.) 4.6.6. (Proves Lemma 3.6.10 ) Using Definition 4.6.5:

4.7. THE SCHWARTZ SPACE

99

d 1 (ln x) = . (Suggestion: Theorem 3.2.15.) dx x d a (b) Prove that (x ) = axa−1 . dx d x (c) Prove that (c ) = (ln c)cx . dx (a) Prove that

4.7

The Schwartz space

In most of this book, we study functions on some closed and bounded interval in R. Sometimes, however, as in Sections 7.5.2 and 8.5.2, occasionally in Chapters 10 and 11, and heavily in Chapters 12 and 13, we wish to consider functions on all of R. However, because functions need to decay as x → ±∞ in some sense in order to have well-defined integrals on R, the following class of functions turns out to be very useful. Definition 4.7.1. To say that a continuous function f : R → C is rapidly decaying means that one of the following equivalent conditions (see Problem 4.7.1) holds: 1. For any n ≥ 0, xn f (x) is bounded on R. 2. For any n ≥ 0, lim xn f (x) = 0. x→±∞

In the language of asymptotics (Definition 3.6.7), the latter condition can be written as f (x) 0, b ∈ R, and p(x) a polynomial, the function 2 +bx

f (x) = p(x)e−ax is in S(R).

(4.7.1)

100

CHAPTER 4. SERIES OF FUNCTIONS

Proof. Problem 4.7.3. 1 . With 1 + x2 g(x), even though g satisfies the limit condition in Definition 4.7.1, g ∈ / S(R) because g is not differentiable at 0. As for h(x), even though h(k) (x) exists for every k ≥ 0 and x ∈ R, h∈ / S(R) because (for example) lim x4 h(x) = +∞.

Example 4.7.5. Two functions not in S(R) are g(x) = e−|x| and h(x) =

x→±∞

Problems 4.7.1. Suppose f : R → C is continuous. This problem proves the equivalence of two conditions in Definition 4.7.1 that define what it means for f to decay rapidly. (a) Prove that if xn+1 f (x) ≤ M for all x ∈ R, then lim xn f (x) = 0. x→±∞

(b) Prove that if

n

lim x f (x) = 0, then there exists some M such that |xn f (x)| ≤ M

x→±∞

for all x ∈ R. (Suggestion: Prove that there exists some a such that for |x| > a, |xn f (x)| < 1, and then use the fact that f is continuous on [−a, a].) 4.7.2. (Proves Theorem 4.7.2 ) Suppose f, g ∈ S(R) and p is a polynomial. (a) Prove that f 0 (x) ∈ S(R). (b) Prove that f (x) + g(x) ∈ S(R). (c) Prove that f (x)g(x) ∈ S(R). (d) Prove that p(x)f (x) ∈ S(R). (Suggestion: For each part, we may regard the differentiability condition of Definition 4.7.1 as following immediately from the sum and product rules, so it suffices to prove that the (equivalent) “rapid decay” conditions of Definition 4.7.1 hold.) 4.7.3. (Proves Theorem 4.7.4 ) For the purposes of this problem only, we define a poly2 Gaussian to be a function of the form f (x) = p(x)e−ax +bx , where p(x) is a polynomial, a > 0, and b ∈ R. The following shows that every poly-Gaussian function is in S(R). (a) Prove that the derivative of a poly-Gaussian function is poly-Gaussian. (b) Prove that if f (x) is poly-Gaussian, then lim f (x) = 0. x→±∞

4.8

Integration on R

Starting occasionally in Chapters 10 and 11, and everywhere in Chapters 12 and 13, we will need to integrate functions over the entire real line. Therefore, in this section, we extend several results from Section 3.6 to (improper) integration over all of R. In order of increasing difficulty, these are: • Integration by parts (Theorem 4.8.7);

4.8. INTEGRATION ON R

101

• Differentiating under the integral sign (Theorem 4.8.8); and • Fubini’s Theorem (Theorem 4.8.11). Z ∞ 2 e−πx dx = 1 (Theorem 4.8.6). We also establish the integral −∞

As with Section 3.6, the first-time reader is encouraged to skip the proofs and absorb only Definition 4.8.2 and the statements of the main results; and as with Section 4.7, the reader only interested in Fourier series and not the Fourier transform is encouraged to skip this section altogether. In any case, we begin with the standard definitions, which the reader may recall from calculus. Definition 4.8.1. To say that f : R → C is locally integrable means that f is integrable on any closed and bounded interval in R. Definition Z ∞4.8.2. Let f : R → C be locally integrable. For a ∈ R, to say that the improper integral f (x) dx converges (or exists) means that the limit a

Z

b

lim

b→∞ a

Z

∞

Z f (x) dx =

f (x) dx

(4.8.1)

a

b

f (x) dx converges means that

exists; to say that −∞

Z lim

a→−∞ a

Z

b

Z

b

f (x) dx =

f (x) dx

(4.8.2)

−∞

∞

exists; and to say that

f (x) dx converges means that both (4.8.1) and (4.8.2) exist for −∞

some (and therefore, any) fixed a and b, in which case, we define Z c Z ∞ Z ∞ f (x) dx = f (x) dx + f (x) dx −∞

−∞

(4.8.3)

c

for some (and therefore, any) c. (Note that it follows from the usual properties of integration that (4.8.3) is independent of c.) If (4.8.3) converges, we say that f is integrable on R; for clarity, we sometimes add a phrase like “in the sense of an improper Riemann integral.” Note that it follows in a straightforward manner from Definition 4.8.2 that if f is integrable on R, Z ∞ Z N f (x) dx = lim f (x) dx. (4.8.4) −∞

N →∞ −N

Thanks to (4.8.4), we will later be able to apply our theory of sequences of functions to improper integrals.

102

CHAPTER 4. SERIES OF FUNCTIONS

Returning to the fundamentals of improper integrals, by applying the limit laws to Corollary 3.4.3 and Theorem 3.4.8, we see that improper integrals inherit the linearity and absolute value properties of proper integrals. Moreover, improper integrals satisfy the following analogues of Corollaries 4.1.5 and 4.1.7. Theorem 4.8.3 (Cauchy Criterion for Improper Integrals). Suppose f : R → C is locally integrable. Then f is integrable on R if and only > 0, there exists N () > 0 Z if for any c such that if b, c > N () or b, c < −N (), then f (x) dx < . b

Z

∞

f (x) dx, which is done in Prob-

Proof. It suffices to prove the one-sided analogue for a

lem 4.8.1.

Theorem 4.8.4 (Comparison Test for Improper Integrals). Suppose f, g : R → C are locally integrable and |f (x)| ≤ g(x) for all x ∈ R. If g is integrable on R, then so is f , and Z

∞

−∞

Z f (x) dx ≤

∞

g(x) dx.

(4.8.5)

−∞

Proof. It again suffices to prove the one-sided version; see Problem 4.8.2. Example 4.8.5. If f is locally integrable, it follows from Theorem 4.8.4 and standard results of calculus that if f is bounded and there exist constants C > 0 and k > 1 such C that |f (x)| ≤ for all x 6= 0, then f is integrable on R. In particular, this holds for all |x|k f ∈ S(R). We are now ready to begin tackling our main results. First, the following integral is used at a few critical junctures in Chapters 11 and 12. Z ∞ 2 Theorem 4.8.6. We have that e−πx dx = 1. −∞

Proof. Problem 4.8.3. The next result is a relatively straightforward matter of taking limits. Theorem 4.8.7 (Integration by Parts on R). Suppose that f, g : R → C are differentiable, f 0 and g 0 are continuous on R, f (x)g 0 (x) is integrable on R, and both lim f (x)g(x) and a→−∞

lim f (x)g(x) exist. Then g(x)f 0 (x) is integrable on R, and

b→∞

Z

∞

Z

0

∞

f (x)g (x) dx = lim f (b)g(b) − lim f (a)g(a) − −∞

b→∞

a→−∞

−∞

g(x)f 0 (x) dx.

(4.8.6)

4.8. INTEGRATION ON R

103

For brevity, we define ∞ F (x) −∞ = lim F (b) − lim F (a), a→−∞

b→∞

so we may rewrite (4.8.6) as Z Z ∞ ∞ 0 f (x)g (x) dx = f (x)g(x) −∞ −

∞

g(x)f 0 (x) dx.

(4.8.7)

(4.8.8)

−∞

−∞

Proof. Again, the one-sided case suffices; see Problem 4.8.4. Our next main result, the improper version of differentiating under the integral sign, comes from an application of our results on uniform convergence (Section 4.3). Theorem 4.8.8. Let f : [a, b] × R → C be a continuous function such that the partial ∂f derivative of f in the first variable, , is continuous on [a, b] × R (as a function of two ∂x ∂f variables), and for any fixed x0 ∈ [a, b], both f (x0 , y) and (x0 , y) are integrable on R (as ∂x functions of y). Suppose also that the sequences Z N Z N ∂f dy (4.8.9) FN (x) = f (x, y) dy, gN (x) = −N −N ∂x of functions FN , gN : [a, b] → C converge uniformly (i.e., independently of x) on [a, b] to Z ∞ Z ∞ ∂f f (x, y) dy, g(x) = F (x) = dy (4.8.10) −∞ ∂x −∞ respectively. Then for all x ∈ [a, b], Z ∞ Z ∞ ∂ ∂f 0 F (x) = f (x, y) dy = dy = g(x). ∂x −∞ −∞ ∂x

(4.8.11)

Proof. Problem 4.8.5. Our last main result is Fubini’s Theorem for integration on R (Theorem 4.8.11). In our version of Fubini, we will assume unnecessary hypotheses, like differentiability, to keep the proof both accessible and relatively brief. (For a statement of a more natural version, see the sketched proof of Theorem 12.5.8.) To be precise, we consider only the following kind of integrand. Definition 4.8.9. To say that F : R × R → C is integrable by separation means that F is continuous (as a function of two variables) and there exists a set of nonnegative real bounded functions fi , gi : R → R (1 ≤ i ≤ k), each integrable on R, such that |F (x, y)| ≤

k X i=1

for all x, y ∈ R.

fi (x)gi (y)

(4.8.12)

104

CHAPTER 4. SERIES OF FUNCTIONS

The following lemma then sums up some useful convergence properties of functions that are integrable by separation. ∂F Lemma 4.8.10. Suppose F : R × R → C is a function such that F and are both ∂x integrable by separation. Then: 1. The integral Z

y

∞

F (x) =

F (x, y) dy

(4.8.13)

−∞

converges for every x ∈ R. 2. The sequence of functions FNy (x) =

Z

N

F (x, y) dy converges uniformly (i.e., inde−N

pendently of x) on R to F y .

y 3. The function Z ZF (x) is differentiable in x and integrable on R. In particular, the double ∞

∞

−∞

−∞

integral

F (x, y) dy dx is well-defined.

4. We have that Z lim

∞

N →∞ −∞

FNy (x) dx

Z

∞

=

Z

y

∞

Z

∞

F (x) dx = −∞

F (x, y) dy dx. −∞

(4.8.14)

−∞

Also, by symmetry, the same statements all hold with x and y switched. Note that statement (4) does not follow from statement (2), as it is possible to find a sequence fn : R → R that converges uniformly to f on R such that the integrals of the fn do not converge to the integral of f (Problem 4.8.6). Proof. By Definition 4.8.9, suppose F is continuous and that fi , gi : R → R (1 ≤ i ≤ k) are continuous nonnegative real bounded functions, each integrable on R, such that |F (x, y)| ≤

k X

fi (x)gi (y)

(4.8.15)

i=1

for all x, y ∈ R. We first see that statement (1) follows from an application of the comparison test for k X improper integrals (Problem 4.8.7). Next, let f (x) = fi (x). The key point here is to i=1

prove the following claim (Problem 4.8.8). yClaim: y For every > 0, there exists K() such that if n > K(), then F (x) − F (x) < f (x). N

4.8. INTEGRATION ON R

105

Once the Claim is established, since the fi are bounded, so is f , and the Claim im∂F plies statement (2). Repeating the proof so far for instead of F also shows that the ∂x Z ∞ Z N ∂F ∂F (x, y) dy converges uniformly to the convergent integral (x, y) dy, sequence −∞ ∂x −N ∂x so Theorem 4.8.8 implies the differentiability part of statement (3). Then, if we let Z ∞ Ci = gi (y) dy, (4.8.16) −∞

since F y is continuous on R and |F y (x)| ≤

k X

Ci fi (x),

(4.8.17)

i=1

the integrability part of statement (3) follows by Theorem 4.8.4. Finally, it remains to show that the Claim implies statement (4), and this is again Problem 4.8.8. Theorem 4.8.11 (Fubini’s Theorem on R). Suppose F : R × R → C is a function such ∂F ∂F , and are all integrable by separation. Then both sides of that F , ∂x ∂y Z ∞ Z ∞ Z ∞ Z ∞ F (x, y) dy dx = F (x, y) dx dy (4.8.18) −∞

−∞

−∞

−∞

converge and are equal. Proof. Lemma 4.8.10(3) shows that both sides of (4.8.18) converge, so it remains to show that they are equal. By the bounded version of Fubini’s Theorem 3.6.21, for all K, N ∈ N, Z K Z N Z N Z K F (x, y) dy dx = F (x, y) dx dy. (4.8.19) −K

−N

−N

We now evaluate the double limit lim

−K

lim of both sides.

K→∞ N →∞

On the one hand, Z K Z lim lim

K→∞ N →∞ −K

N

−N

F (x, y) dy

Z

K

Z

dx = lim F (x, y) dy dx (*) K→∞ −K −∞ Z ∞ Z ∞ = F (x, y) dy dx, −∞

(4.8.20)

−∞

Z

∞

where (*) follows by the uniform convergence of

F (x, y) dy (Lemma 4.8.10(2)) and −∞

Theorem 4.3.10.

∞

106

CHAPTER 4. SERIES OF FUNCTIONS On the other hand, Z N Z lim lim K→∞ N →∞ −N

K

F (x, y) dx

−K

Z

∞

Z

K

F (x, y) dx dy dy = lim K→∞ −∞ −K Z ∞ Z ∞ F (x, y) dx dy, = −∞

(4.8.21)

−∞

where the last step follows by Lemma 4.8.10(4). The theorem follows.

Problems 4.8.1. (Proves Theorem 4.8.3 ) Suppose f : R → C is locally integrable. Z ∞ f (x) dx converges then for any > 0, there exists N () > 0 such that (a) Prove that if a Z c f (x) dx < . (Suggestion: Imitate the proof of Theorem 2.5.2.) if b, c > N (), then b

(b) Suppose that for any > 0, there Z ∞ exists N () > 0 such that if b, c > N (), then Z c f (x) dx converges. (Suggestion: Prove that the f (x) dx < . Prove that a b Z n Z b limit of the sequence f (x) dx exists, then for b ∈ R, approximate f (x) dx by a

a

some term in that sequence.) 4.8.2. (Proves Theorem locally integrable, |f (x)| ≤ g(x) Z ∞ 4.8.4 ) Suppose f, g : R → C Zare ∞ for all x ∈ R, and g(x) dx converges. Prove that f (x) dx converges. (Suggestion: 0

0

Imitate the proof of Corollary 4.1.7.) 4.8.3. (Proves Theorem 4.8.6 ) In this problem, we cheat a little and use standard facts about change of variables from multivariable calculus. Z ∞Z ∞ 2 2 (a) Prove that e−π(x +y ) dx dy = 1. (Suggestion: Convert to polar coordinates −∞

−∞

and take advantage of the change of variables factor dx dy = r dr, dθ.) Z ∞ 2 (b) Prove that e−πx dx = 1. −∞

4.8.4. (Proves Theorem 4.8.7 ) Fix a ∈ R, and suppose that f, g ∈ C 1 (R), f (x)g 0 (x) is integrable on R, and lim f (x)g(x) exists. Prove that b→∞

Z

∞

Z

0

f (x)g (x) dx = lim f (b)g(b) − f (a)g(a) − a

b→∞

∞

g(x)f 0 (x) dx.

(4.8.22)

a

In particular, prove that the improper integral on the right-hand side exists. (Suggestion: Carefully take limits in Theorem 3.5.6.)

4.8. INTEGRATION ON R

107

4.8.5. (Proves Theorem 4.8.8 ) Let f : [a, b] × R → C be a continuous function such that ∂f is continuous on [a, b] × R (as a function of two variables), and for any fixed x0 ∈ [a, b], ∂x ∂f both f (x0 , y) and (x0 , y) are integrable on R (as functions of y). Suppose also that the ∂x sequences Z N Z N ∂f f (x, y) dy, gN (x) = FN (x) = dy (4.8.23) −N ∂x −N of functions FN , gN : [a, b] → C converge uniformly on [a, b] to Z ∞ Z F (x) = f (x, y) dy, g(x) = −∞

∞

−∞

respectively. Then for all x ∈ [a, b], Z d Z d ∂ ∂f 0 F (x) = f (x, y) dy = dy = g(x). ∂x c c ∂x

∂f dy ∂x

(4.8.24)

(4.8.25)

(Suggestion: Theorem 3.6.23 and Section 4.3.) 4.8.6. Define fn , f : R → C by  1 fn (x) = n 0

for n ≤ x ≤ 2n,

(4.8.26)

otherwise,

and f (x) = 0. Prove that fn converges uniformly to f on R, but Z ∞ Z ∞ lim fn (x) dx 6= f (x) dx. n→∞ −∞

(4.8.27)

−∞

(Suggestion: Draw graphs and calculate.) For Problems 4.8.7 and 4.8.8, suppose F : R × R → C is continuous as a function of two variables and there exists a set of nonnegative real functions fi , gi : R → R (1 ≤ i ≤ k), each integrable on R, such that |F (x, y)| ≤

k X

fi (x)gi (y)

(4.8.28)

i=1

for all x, y ∈ R. 4.8.7. (Proves Lemma 4.8.10 ) Prove that the integral Z ∞ F y (x) = f (x, y) dy −∞

converges for every x ∈ R. (Suggestion: Theorem 4.8.4.)

(4.8.29)

108

CHAPTER 4. SERIES OF FUNCTIONS

4.8.8. (Proves Lemma 4.8.10 ) Let FNy (x)

(a) Let f (x) = K(), then

Z

N

=

F (x, y) dy,

F (x, y) dy.

(4.8.30)

−∞

fi (x). Prove that for i=1 y F (x) − F y (x) < f (x). N Z ∞

(b) Assuming that

∞

F (x) =

−N

k X

of the integrals

Z

y

every > 0, there exists K() such that if N > (Suggestion: Carefully combine the convergence

gi (y) dy.) −∞ FNy , F y : R → C

are integrable on R, prove that Z ∞ Z ∞ y F y (x) dx. lim FN (x) dx =

n→∞ −∞

−∞

(Suggestion: Use part (a) and imitate Problem 4.3.5.)

(4.8.31)

Part II

Fourier series and Hilbert spaces

109

Chapter 5

The idea of a function space . . . [T]here is a difference between numbers and numbers that matter. This is what separates data from metrics. You can’t pick your data, but you must pick your metrics. — Jeff Bladt and Bob Filbin, “Know the Difference Between Your Data and Your Metrics,” Harvard Business Review, March 4, 2013 In this brief chapter, we motivate some of the main ideas of Part II of this book. First, starting from an old conundrum due to Lewis Carroll, in Section 5.1, we introduce the question: What is a reasonable way to determine how close two functions are? More precisely, in Section 5.2, we argue that, when trying to find a“good” or “best” approximation to a given function f , we should choose a suitable function space V and look at a reasonable metric (see Section 2.3) on V to determine how good that approximation is. Finally, as it turns out, our favorite metric on function spaces is best described in terms of an abstract version of dot products, so in Section 5.3, we review the properties of ordinary Euclidean dot products, which the reader may have seen in multivariable calculus or physics.

5.1

Which clock keeps better time?

In “The Two Clocks,” Lewis Carroll asked: Which is better, a clock that is right only once a year, or a clock that is right twice every day? “The latter,” you reply, “unquestionably.” Very good, now attend. I have two clocks: one doesn’t go at all, and the other loses a minute a day: which would you prefer? Now the answer to the second question may seem obvious, but (as Mr. Carroll puts it) attend: If clock #2 loses a minute a day and no one realizes that or fixes clock #2, then after several months or years, clock #2 will be quite far off of the actual time. Of course, after several more months or years, clock #2 will be back to being nearly on time. So which clock is better? More precisely, what’s a reasonable way to quantify the question of which clock is better? One way to resolve this dilemma is to ask: 111

112

CHAPTER 5. THE IDEA OF A FUNCTION SPACE

Question 5.1.1. Which clock is, on average, less wrong? The reason this version of the clock question is convenient is that, as the reader may recall from calculus, the average value of an integrable f : [a, b] → R is defined to be 1 b−a

b

Z

f (t) dt.

(5.1.1)

a

To quantify this fully, let t be time in days, and assuming 24-hour clocks, let f (t) be the actual time, let s(t) = 0 be the time on the stopped clock, and let `(t) be the time on the lagging clock. Then assuming both clocks are correct at midnight (t = 0) of some particular day, we see that the magnitude (absolute value) of the errors of the stopped and lagging clocks are |f (t) − s(t)| = |24t| t |f (t) − `(t)| = 60

for − 12 ≤ t ≤ 12 ,

(5.1.2)

for −720 ≤ t ≤ 720,

(5.1.3)

respectively, where |f (t) − s(t)| is periodic with period 1, and |f (t) − `(t)| is periodic with period 1440. It therefore makes sense to take the average of |f (t) − s(t)| on the interval [− 12 , 21 ] and the average of |f (t) − `(t)| on the interval [−720, 720]. (The finicky reader may prefer to take both averages on [−720, 720], but the periodicity of |f (t) − s(t)| implies that we will get the same answer.) So we may now ask, precisely: Question 5.1.2. Which average error is greater: 1 1 ( 2 ) − (− 12 )

Z

1 2

|f (t) − s(t)| dt

or

− 12

1 720 − (−720)

Z

720

|f (t) − `(t)| dt?

(5.1.4)

−720

Now, there are many other possible measures of average error. For example, suppose large errors concern us more than small ones; in that case we might, for example, replace the average of the error with the average of the squared error. More generally: Question 5.1.3. For p = 2, which average error is greater: 1 1 ( 2 ) − (− 21 )

Z

1 2

p

|f (t) − s(t)| dt

− 12

or

1 720 − (−720)

Z

720

|f (t) − `(t)|p dt?

(5.1.5)

−720

How about other p > 1, p 6= 2? The answers to the above questions will be left as problems, but we hope that the reader has at least gotten a flavor of what it means to measure the distance between two functions.

5.2. FUNCTION SPACES AND METRICS

113

Problems 5.1.1. Calculate the two average errors in (5.1.4). Which clock is more accurate, on average? (Suggestion: By symmetry, we may take the integrals, or even the averages, on the positive halves [0, 12 ] and [0, 720] of the respective intervals.) 5.1.2. For p = 2, calculate the two “average squared errors” in (5.1.5). Which clock is more accurate, on average? 5.1.3. For p > 1, p 6= 2, calculate the two average “errors-raised-to-the-pth-power” in (5.1.5). Which clock is more accurate, on average, and how does that depend on p?

5.2

Function spaces and metrics

As we saw in Section 5.1, one way to make the question “Which function is closer?” (or later, “Which approximation is better?”) precise is to define what will turn out to be a metric on functions (see Section 2.3). However, to ensure that these kinds of “metrics” actually satisfy the axioms of a metric (Definition 2.3.1), we need to do several things; most prominently, we must specify a function space on which such a metric is to be defined. We therefore come to the following definition. Definition 5.2.1. Let X be a set. We define a function space on X to be a collection V of (complex-valued) functions, all with the same domain X, such that the following properties hold: 1. (Nonempty) V contains the zero function 0(x) = 0. 2. (Closed under addition) For f, g ∈ V , f + g ∈ V . 3. (Closed under scalar multiplication) For f ∈ V and c ∈ C, cf ∈ V . A subset of a function space V that is itself a function space is called a function subspace, or simply a subspace of V . If V is a function space, we sometimes call f ∈ V a function, when f is being considered in terms of analysis, and sometimes call f a vector, when we think of f as an element of an unspecified abstract function space. If the reader has any familiarity with linear algebra, then the following example will show how Definition 5.2.1 encompasses the standard examples of a first course in linear algebra. Example 5.2.2. Let V be the set of all complex-valued functions on X = {1, . . . , n}. Then certainly V satisfies the axioms of Definition 5.2.1; moreover, if we write the values of f : X → C as the vector (f (1), . . . , f (n)), we see that V can be identified with Cn , the space of all complex row vectors of length n. In this context, the zero function (0, . . . , 0) is also called the zero vector.

114

CHAPTER 5. THE IDEA OF A FUNCTION SPACE

Remark 5.2.3. The reader familiar with axiom-based linear algebra will note that, by the Subspace Theorem (Theorem B.4), a function space on X is precisely a subspace of F(X, C); see Appendix B for details. In any case, the main point of Definition 5.2.1 is that functions in a function space can be manipulated algebraically like vectors in a vector space; for example, we can form arbitrary linear combinations of functions in a function space V and still stay in V . Some familiar results can be rephrased in terms of function spaces as follows. Example 5.2.4. For X ⊆ C, it follows immediately from Theorem 3.1.4 that the set of all continuous functions on X is a function space on X. Similarly, if X ⊆ C is a set such that every point of X is a limit point of X, then by Theorem 3.2.8, the set of all differentiable functions on X is a function space on X; by Corllary 3.4.3, the set of all integrable functions on a closed interval [a, b] is a function space on [a, b]; and by Theorem 4.7.2, S(R) is a function space. In the rest of this book, we will be particularly interested in spaces of functions defined by their degrees of smoothness (continuity and differentiability), bringing us to the following definition. Definition 5.2.5. Let X be a nonempty subset of C such that every point of X is a limit point. We define C 0 (X) to be the set of all continuous f : X → C, which is a subspace by Theorem 3.1.4. Similarly, for any positive integer r, we define C r (X) to be the set of all f : X → C with continuous rth derivatives, which is a subspace by Theorems 3.2.8 and 3.1.4. Finally, we define C ∞ (X) to be the set of all f : X → C with rth derivatives for every positive integer r, which is a subspace for analogous reasons. Note that by Corollary 3.2.7, we have that: C 0 (X) ⊃ C 1 (X) ⊃ C 2 (X) ⊃ · · · ⊃ C ∞ (X).

(5.2.1)

Definition 5.2.6. We will also occasionally use multivariable versions of C r (X). Specifically, for a function of two variables on a domain X ⊆ R × R, to say that f ∈ Cxr (X) means that for any fixed y0 ∈ R, f (x, y0 ) is in C r (X) as a function of x, and similarly for f ∈ Cyr (X). Finally, to say that f ∈ C r (X) means that every rth partial derivative of f exists and is continuous; in particular, by Theorem 3.6.18, if f ∈ C 1 (X), then f is itself continuous. Example 5.2.7. To give some key examples, Theorem 4.5.2 implies that ex ∈ C ∞ (R). More generally, the chain rule and other derivative laws then imply that for any α ∈ C, k ∈ R, eαx , cos(kx), and sin(kx) are all in C ∞ (R). Example 5.2.8. For the reader who has seen the Schwartz space S(R) from Section 4.7, Theorem 4.7.2 implies that S(R) is a function space on R. We also define a special case of function spaces that will be the source of most of our main examples, at least until Part IV of this book.

5.2. FUNCTION SPACES AND METRICS

115

Definition 5.2.9. To say that the domain of a function f is the circle S 1 means that: • The domain of f is R; and • For all x ∈ R, f (x + 1) = f (x), i.e., f is periodic with period 1. We think of such functions as being defined on the circle because they are determined by their values on [0, 1], with f (0) = f (1), and identifying the ends of a closed interval gives a circle. Continuity, limits, and derivatives are all defined as usual for functions on S 1 , but integrals are defined differently: To say that f : S 1 → C is integrable means that Z

Z

Z f (x) dx =

f (x) dx = S1

1

0

1 2

f (x) dx

(5.2.2)

− 12

exists. Note that if f is integrable on either [0, 1] or − 21 , 12 , (5.2.2) holds by periodicity and additivity of domain; in fact, the integral of f may be computed on any interval in R of length 1. Example 5.2.10. To give a key example, by Theorem 4.5.7, for any n ∈ Z, the functions en (x) = e2πinx (Definition 4.6.1) are all in C ∞ (S 1 ). Convention 5.2.11. Since every x ∈ R differs from some x0 ∈ [0, 1) by an integer, a function f : S 1 → C is determined by its values on [0, 1). We will therefore often describe such an f by a formula that is only meant to apply when 0 ≤ x < 1, or similarly, is only meant to apply when − 21 ≤ x < 12 , and so on. To better connect the idea of a function space back to Section 5.1, we now give one example of a metric on a space of functions. (The reader interested in our most prominent examples of metrics on function spaces may want to glance ahead at Sections 7.2 and 7.6.) Definition 5.2.12. For X ⊆ C and f, g ∈ C 0 (X), we define d(f, g) = sup {|f (x) − g(x)| | x ∈ X} .

(5.2.3)

Theorem 5.2.13. For X ⊆ C, the function d(f, g) defined in (5.2.3) defines a metric on C 0 (X). We call d(f, g) the L∞ metric on C 0 (X). Proof. First, d(f, g) ≥ 0 because d(f, g) is the supremum of a set of nonnegative numbers. It is also clear that d(f, g) = d(g, f ) and that d(f, g) = 0 if and only if f = g, so it remains only to verify the triangle inequality, which we do in Problem 5.2.1. Note that the quantity dn that appears in the “alternate definition” of uniform convergence (Lemma 4.3.6) is, in the terms of (5.2.3), precisely dn = d(fn , f ). In fact, it follows from Lemmas 2.4.14 and 4.3.6 that fn converges to f uniformly on X if and only if fn converges to f in the L∞ metric on C 0 (X). Along those lines, we have:

116

CHAPTER 5. THE IDEA OF A FUNCTION SPACE

Theorem 5.2.14. For X ⊆ C, we have that C 0 (X) is a complete metric space (Definition 2.5.4) under the L∞ metric. Proof. First, we observe that if fn is a sequence in C 0 (X) that is Cauchy with respect to the L∞ metric, then fn is a uniformly Cauchy sequence of functions on X (Problem 5.2.2). Therefore, by Theorem 4.3.5, fn must converge uniformly to some f : X → C. By Theorem 4.3.8, a sequence of continuous functions that converges uniformly must converge to a continuous function, and the theorem follows. We also take this opportunity to introduce (or review) some terminology from linear algebra that we will use occasionally. Definition 5.2.15. For a function space V , to say that a finite subset {f1 , . . . , fn } ⊆ V is linearly independent means that if a1 f1 + · · · + an fn = 0 for some coefficients ai ∈ C, then every coefficient ai = 0. Definition 5.2.16. For a function space V and a finite subset S = {f1 , . . . , fn } ⊆ V , the span of S is defined to be the set {a1 f1 + · · · + an fn | ai ∈ C} ⊆ V . For more on function spaces in the context of axiom-based linear algebra, see Appendix B.

Problems 5.2.1. (Proves Theorem 5.2.13 ) Let X be a subset of C, and for f, g ∈ C 0 (X), define d(f, g) as in (5.2.3). (a) Find real-valued f, g, h ∈ C 0 ([0, 1]) such that f (x) ≤ g(x) ≤ h(x) for all x ∈ [0, 1] and d(f, h) 6= d(f, g) + d(g, h). (Suggestion: This does not work if f , g, and h are all constant.) (b) For f, g, h ∈ C 0 (X), prove that d(f, h) ≤ d(f, g) + d(g, h). (Suggestion: Use the sup inequality trick (Theorem 2.1.5).) 5.2.2. (Proves Theorem 5.2.14 ) Let X ⊆ C and let fn be a sequence in C 0 (X) that is Cauchy with respect to the L∞ metric. Prove that fn is a uniformly Cauchy sequence of functions on X. (Suggestion: In other words, prove that for any > 0, there exists some N () not depending on x ∈ X such that for all x ∈ X, etc.) 5.2.3. Let X be an interval in R, and for f, g ∈ C 0 (X), define d(f, g) as in (5.2.3). Prove that for a ∈ C that d(af, 0) = |a| d(f, 0). (Suggestion: The case a = 0 can be handled separately, so for a 6= 0, prove d(af, 0) ≤ |a| d(f, 0) and use symmetry.)

5.3. DOT PRODUCTS

5.3

117

Dot products

The previous sections of this chapter introduced the ideas behind two of the key tools we will use to study Fourier series, namely, function spaces and metrics defined upon them. Our favorite metric on a function space will be what is known as the L2 -metric, as that metric allows us to introduce geometry through a generalization of the dot product known as an inner product. Therefore, in this section, we briefly review the geometry of dot products. Recall that the dot product · : Rn × Rn → R is defined to be (v1 , . . . , vn ) · (w1 , . . . , wn ) = v1 w1 + · · · + vn wn

(5.3.1)

for all (v1 , . . . , vn ), (w1 , . . . , wn ) ∈ Rn . The dot product has the following algebraic properties, as the reader may recall (or check in a straightforward manner). Theorem 5.3.1. For v, w, x ∈ Rn and c ∈ R, we have the following properties: 1. v · w = w · v. 2. (v + w) · x = v · x + w · x. 3. (cv) · w = c(v · w). 4. If v = (v1 , . . . , vn ), then v · v = v12 + · · · + vn2 . Proof. Problem 5.3.1. What may be less familiar to the reader is that many key features of Euclidean geometry can be expressed in terms of dot products. For example, Theorem 5.3.1(4) shows that the √ standard Euclidean length of a vector v is v · v. It is also a fact from 2- and 3-dimensional geometry that if θ is the angle between vectors v and w, then cos θ =

v·w . kvk kwk

(5.3.2)

n n Generalizing to R ,we may instead define the angle θ between nonzero v, w ∈ R by v·w θ = cos−1 . In particular, if v · w = 0, then θ = π/2, and we say that v and w kvk kwk are orthogonal. Orthogonality turns out to be useful for many reasons. For example, to say that {v1 , . . . , vk } ⊆ Rn is orthonormal means that ( 1 if i = j, vi · vj = (5.3.3) 0 if i 6= j.

Orthonormal bases also give conveniently computed coordinates, in that if {v1 , . . . , vn } is an orthonormal set in Rn and w = a1 v1 + · · · + an vn

(5.3.4)

118

CHAPTER 5. THE IDEA OF A FUNCTION SPACE

for some w ∈ Rn , then ai = w · vi (Problem 5.3.2). As we shall see in Section 7.1, the dot product also generalizes to Cn in a straightforward manner (Example 7.1.5). For now, however, we generalize the dot product in a different manner, so we can introduce an example that is quite important later. Definition 5.3.2. Let X = N or Z. As in Definitions 4.1.1 and 4.1.2, we write a function a : X → C in sequence notation an , where if X = N, an is a sequence in the usual sense, and if X = Z, an is a “two-sided sequence”. In either case, we define `2 (X) to be the set of all an such that X kan k2 = |an |2 (5.3.5) n∈X

is finite. Note that since (5.3.5) is a series with nonnegative terms, it can be shown that the order of summation does not matter (see Appendix A) so the same definition actually works for any countable set X. It turns out that we can think of `2 (X) (X = N or Z) as a function space with a dot product on it, in the following sense. Theorem 5.3.3. For X = N or Z, `2 (X) is a function space, and for an , bn ∈ `2 (X), X han , bn i = an bn (5.3.6) n∈X

converges absolutely. Proof. For an , bn ∈ `2 (X) and c ∈ C, standard properties of series imply that kcan k = |c|2 kan k2 , and the fact that an + bn ∈ `2 (X) is proved in Problem 5.3.3. It follows that `2 (X) is a function space on X. As for the absolute convergence of (5.3.6), again, see Problem 5.3.3. See Problem 7.1.9 for an alternate, slicker proof of Theorem 5.3.3.

Problems 5.3.1. (Proves Theorem 5.3.1 ) Fix v, w, x ∈ Rn and c ∈ R. (a) Prove that v · w = w · v. (b) Prove that (v + w) · x = v · x + w · x. (c) Prove that (cv) · w = c(v · w). (d) Prove that if v = (v1 , . . . , vn ), then v · v = v12 + · · · + vn2 . 5.3.2. Suppose {v1 , . . . , vn } is an orthonormal set in Rn and w = a1 v1 + · · · + an vn for some w ∈ Rn . Prove that ai = vi · w. 5.3.3. (Proves Theorem 5.3.3 ) Let X = N or Z and suppose that an , bn ∈ `2 (X).

5.3. DOT PRODUCTS

119

(a) Prove that − |an |2 − |bn |2 ≤ an bn + bn an ≤ |an |2 + |bn |2 , 2

2

2

2

− |an | − |bn | ≤ ian bn − ibn an ≤ |an | + |bn | .

(5.3.7) (5.3.8)

(Suggestion: Consider the four quantities |an ± bn |2 , |an ± ibn |2 .) (b) Prove that an + bn ∈ `2 (X). (Suggestion: Consider |an + bn |2 again.) (c) Prove that (5.3.6) converges absolutely. (Suggestion: Separate (5.3.6) into its real and imaginary parts. How does an bn ±bn an relate to the real and imaginary parts of an bn ?)

120

CHAPTER 5. THE IDEA OF A FUNCTION SPACE

Chapter 6

Fourier series The enigma which, about 2,500 years ago, Pythagoras proposed to science, which investigates the reasons of things, “Why is consonance determined by the ratios of small whole numbers?” has been solved. . . . The resolution into partial tones, mathematically expressed, is effected by Fourier’s law, which [shows] how any periodically variable magnitude, whatever be its nature, can be expressed by a sum of the simplest periodic magnitudes. . . . Ultimately, then, the reason of the rational numerical relations of Pythagoras is to be found in the theorem of Fourier, and in one sense this theorem may be considered as the prime source of the theory of harmony. — Hermann von Helmholtz, On the Sensations of Tone In this chapter, we introduce Fourier series and prove some initial results. To begin with, we define Fourier polynomials (Section 6.1) and then Fourier series (Section 6.2). After examining the special case of real-valued functions (Section 6.3), we discuss what we can prove about pointwise convergence of Fourier series with relatively straightforward methods, and why it will be useful to have fancier and better methods available to us (Section 6.4).

6.1

Fourier polynomials

The goal of the rest of Part II is to understand how we may “best” (in a sense to be made precise later) approximate a given function f with domain S 1 with functions of the following type. Definition 6.1.1. A trigonometric polynomial of degree N is a function p : S 1 → C of the form N X p(x) = cn en (x) (6.1.1) n=−N

for some coefficients cn ∈ C. 121

122

CHAPTER 6. FOURIER SERIES

We call the function in (6.1.1) a “polynomial” because if we let q = e2πix , then en = q n , N X and the sum in (6.1.1) becomes cn q n , a Laurent polynomial (polynomial with negative n=−N

integer power terms) in q. Note that we would usually only say that the degree of such a polynomial is N if either cN 6= 0 or c−N 6= 0, but out of laziness, we allow the possibility of cN = c−N = 0, to avoid having to say “degree at most N ” repeatedly. In any case, we may now ask the imprecise question: Which trigonometric polynomial(s) best approximate a given f : S 1 → C? Better yet, keeping Chapter 5 in mind, we may ask (still imprecisely): Question 6.1.2. For a given f with domain S 1 , which trigonometric polynomials best approximate a given f : S 1 → C on average? As in Chapter 5, by “on average” we mean something like “having an absolute error function with the smallest possible integral on S 1 .” Now, we hope the reader finds it plausible that for the trigonometric polynomial p(x) in (6.1.1) to approximate f : S 1 → C well, it should at least have the same average behavior as f (x). We therefore spend the rest of this section examining the average behavior of p(x) on S 1 . For example: Theorem 6.1.3. Let p(x) be a trigonometric polynomial given by (6.1.1). Then Z

Z

1

p(x) dx =

p(x) dx = c0 .

S1

(6.1.2)

0

Proof. Problem 6.1.1. Emboldened by the success of Theorem 6.1.3 in extracting the constant term of p(x) based on its average behavior, we may ask: Can we do the same for the other coefficients of p(x)? The following theorem shows that the answer is yes. Theorem 6.1.4. Let p(x) be a trigonometric polynomial given by (6.1.1). Then for any n such that −N ≤ n ≤ N , we have Z

1

p(x) en (x) dx = cn .

(6.1.3)

0

Proof. Problem 6.1.2. We may therefore (a bit presumptuously) conclude: If a trigonometric polynomial p(x) is to approximate some integrable f : S 1 → C well on average, then we should have Z

1

Z f (x) en (x) dx =

0

This leads to the following definition.

1

p(x) en (x) dx = cn . 0

(6.1.4)

6.2. FOURIER SERIES

123

Definition 6.1.5. Let f : S 1 → C be integrable. For n ∈ Z, we define Z 1 ˆ f (x) en (x) dx f (n) =

(6.1.5)

0

to be the nth Fourier coefficient of f . We define the N th Fourier polynomial fN of f to be N X

fN (x) =

fˆ(n)en (x).

(6.1.6)

n=−N

In other words, fN (x) is the trigonometric polynomial of degree N whose coefficients are the Fourier coefficients fˆ(n).

Problems 6.1.1. (Proves Theorem 6.1.3 ) Let p(x) be given by (6.1.1). Prove (6.1.2). (Suggestion: Use the integral formulas in Section 4.6.) 6.1.2. (Proves Theorem 6.1.4 ) Let p(x) be given by (6.1.1). Prove (6.1.3), in the form Z 1 (6.1.7) p(x) ek (x) dx = ck , 0

where −N ≤ k ≤ N . (Note the change of subscript from n in (6.1.3) to k in (6.1.7), which does not change the meaning of the equation, but will help to avoid confusion between the constant subscript k and the variable subscript n appearing in the definition of p(x).)

6.2

Fourier series

Continuing our chain of plausibilities from Section 6.1, we may reason that if the Fourier polynomials of f : S 1 → C are good approximations of f , then their limit as N → ∞ will converge to f . Put another way, we may make the idea of “good approximation” precise by saying that the fN are good approximations of f if lim fN = f in some appropriate N →∞

sense: pointwise, uniform, or “on average” (a term that, again, we will make precise later). For now, we will be content to introduce one of the two principal objects of study in Part II. Definition 6.2.1. Let f : S 1 → C be integrable. For n ∈ Z, we define Z 1 fˆ(n) = f (x) en (x) dx,

(6.2.1)

0

and we define the Fourier series of f to be the limit of its Fourier polynomials as N → ∞: f (x) ∼ lim fN (x) = N →∞

∞ X n=−∞

fˆ(n)en (x) =

X n∈Z

fˆ(n)en (x).

(6.2.2)

124

CHAPTER 6. FOURIER SERIES

Note that the (standard) notation ∼ indicates merely that what is on the right-hand side is the Fourier series of f , and need not have any implications in terms of convergence (uniform, pointwise, or otherwise). Remark 6.2.2. Note that in (6.2.2), we again use our conventions on summing two-sided series, as described in Definition 4.1.2 and Remark 4.1.4. Note also that Fourier series provide natural examples where we need to be careful about how we sum two-sided series, in that there exist fˆ(n) ∈ R such that for fixed x (say, x = 0), by allowing N to go to +∞ and −∞ at different rates, we can get (6.2.2) to converge to any real number that we want, or even +∞ or −∞! See Appendix A, and Example A.5 in particular, for details. Remark 6.2.3. We will later see a fancier version of Definition 6.2.1 in Definition 8.1.3, but rest assured, Definition 6.2.1 will always work for an integrable f . We now present several examples, leaving computations to the reader (Problems 6.2.1– 6.2.5). Note that the one tricky aspect of computing Fourier coefficients is that we often have to handle fˆ(0) separately from fˆ(n), n 6= 0. Example 6.2.4 (Square wave). Let f : S 1 → C be given by ( 1 if 0 ≤ x ≤ 12 , f (x) = 0 if 12 ≤ x < 1.

(6.2.3)

Then 1 − (−1)n fˆ(n) = 2πin

1 fˆ(0) = , 2

for n 6= 0.

(6.2.4)

Example 6.2.5 (Sawtooth wave). Let f : S 1 → C be given by f (x) = x

1 1 for − ≤ x < . 2 2

(6.2.5)

Then n

(−1) fˆ(n) = − 2πin

fˆ(0) = 0,

for n 6= 0.

(6.2.6)

Example 6.2.6 (Triangle wave). Let f : S 1 → C be given by f (x) = |x|

1 1 for − ≤ x < . 2 2

(6.2.7)

Then 1 fˆ(0) = , 4

2 − 2(−1)n fˆ(n) = (2πin)2

for n 6= 0.

(6.2.8)

6.2. FOURIER SERIES

125

Example 6.2.7 (x2 periodized). Let f : S 1 → C be given by 1 1 for − ≤ x < . 2 2

f (x) = x2

(6.2.9)

Then −2(−1)n fˆ(n) = (2πin)2

1 fˆ(0) = , 12

for n 6= 0.

(6.2.10)

Example 6.2.8 (x3 periodized). Let f : S 1 → C be given by 1 1 for − ≤ x < . 2 2

f (x) = x3

(6.2.11)

Then fˆ(0) = 0,

fˆ(n) = −

6(−1)n 1 (−1)n − 4 2πin (2πin)3

for n 6= 0.

(6.2.12)

We note one last set of formulas, proven in Problem 6.2.7: For integrable f, g : S 1 → C and a ∈ C, we have d)(n) = afˆ(n), (af

(f\ + g)(n) = fˆ(n) + gˆ(n).

(6.2.13)

Problems 6.2.1. Show that the Fourier coefficients of   1 if 0 ≤ x ≤ 1 , 2 f (x) = 1  0 if ≤ x < 1, 2

(6.2.14)

are as described in Example 6.2.4. 6.2.2. Show that the Fourier coefficients of f (x) = x

1 1 for − ≤ x < 2 2

(6.2.15)

are as described in Example 6.2.5. 6.2.3. Show that the Fourier coefficients of f (x) = |x| are as described in Example 6.2.6.

1 1 for − ≤ x < 2 2

(6.2.16)

126

CHAPTER 6. FOURIER SERIES

6.2.4. Show that the Fourier coefficients of f (x) = x2

1 1 for − ≤ x < 2 2

(6.2.17)

1 1 for − ≤ x < 2 2

(6.2.18)

are as described in Example 6.2.7. 6.2.5. Show that the Fourier coefficients of f (x) = x3 are as described in Example 6.2.8. 6.2.6. Prove that if p(x) =

N X

cn en (x), then pˆ(n) = cn . (In other words, the Fourier

n=−N

series of a trigonometric polynomial p(x) is p(x) itself.) 6.2.7. For integrable f, g : S 1 → C and a, b ∈ C, prove that d)(n) = afˆ(n), (af

6.3

(f\ + g)(n) = fˆ(n) + gˆ(n).

(6.2.19)

Real Fourier series

When we take the Fourier series of a real-valued function f : S 1 → R, it turns out that we get cancellation that allows us to express that series as an infinite sum of sines and cosines. In this section, we show that any Fourier series can be rewritten in terms of sines and cosines; we show that a real-valued function has a Fourier sine/cosine series with real coefficients; and we look at the real Fourier series of odd and even extensions of functions 1 on 0, 2 . Note that this section is later used only in Section 11.4, and may otherwise be skipped. Nevertheless, the reader should be aware that many users of Fourier series express them in terms of sines and cosines, making this section useful for any reader interested in practical applications.

6.3.1

Fourier series in sines and cosines

First, we observe that since en (x) = cos(2πnx) + i sin(2πnx),

e−n (x) = cos(2πnx) − i sin(2πnx),

(6.3.1)

we have that 1 cos(2πnx) = (en (x) + e−n (x)), 2

sin(2πnx) =

1 (en (x) − e−n (x)). 2i

(6.3.2)

6.3. REAL FOURIER SERIES

127

It follows that the span of {en , e−n } is equal to the span of {cos(2πnx), sin(2πnx)}. More precisely (Problem 6.3.1), for any cn ∈ C, there exist an , bn ∈ C such that N X

N

n=−N

a0 X cn en (x) = + (an cos(2πnx) + bn sin(2πnx)). 2

(6.3.3)

n=1

Note that letting N → ∞ in (6.3.3) gives the standard order of summation of a two-sided series from Definition 4.1.2, thus perhaps belatedly justifying that definition.

6.3.2

Real Fourier series of real-valued functions

The rewriting in (6.3.3) is particularly interesting in the case of the Fourier series of a real-valued function. Theorem 6.3.1. Let f : S 1 → R be integrable and real-valued. If N X n=−N

N

a0 X + (an cos(2πnx) + bn sin(2πnx)), fˆ(n)en (x) = 2

(6.3.4)

n=1

then an = fˆ(n) + fˆ(n),

bn = i(fˆ(n) − fˆ(n)),

(6.3.5)

and both an and bn are real-valued. Proof. Problem 6.3.2. Note that the (6.3.5) still holds for n = 0.

1 in the constant term of (6.3.4) is chosen so that 2

Definition 6.3.2. Let f : S 1 → R be integrable and real-valued. We define the real Fourier series of f to be ∞ a0 X + (an cos(2πnx) + bn sin(2πnx)), (6.3.6) 2 n=1

where an , bn ∈ RX are given by (6.3.5). Note that since (6.3.6) has partial sums that are equal to those of fˆ(n)en (x), the real Fourier series of f is equal to the (complex) Fourier n∈Z

series of f by definition; (6.3.6) is just a different way to write the sum. As the reader may find elsewhere, it is certainly possible to define real Fourier series without starting from complex Fourier series. To do so, we would begin with formulas for real trigonometric functions analogous to (4.6.8), namely: ( Z 1 1/2 if n = k, cos(2πnx) cos(2πkx) dx = (6.3.7) 0 otherwise, 0 ( Z 1 1/2 if n = k, sin(2πnx) sin(2πkx) dx = (6.3.8) 0 otherwise, 0 Z 1 sin(2πnx) cos(2πkx) dx = 0. (6.3.9) 0

128

CHAPTER 6. FOURIER SERIES

See Problem 6.3.3. Proceeding analogously to Sections 6.1–6.2, we would end up defining Z

1

Z f (x) cos(2πnx) dx,

an = 2

1

f (x) sin(2πnx) dx,

bn = 2

(6.3.10)

0

0

a set of formulas that in our approach is a result of combining (6.2.1) and Theorem 6.3.1 (Problem 6.3.4). Remark 6.3.3. Note that the aesthetically unappealing factors of 2 in (6.3.10) are nevertheless correct, and forced upon us by the 1/2’s appearing in (6.3.7) and (6.3.8). In treatments where functions on S 1 have period 2π and not 1, they can be better hidden by 1 1 changing a factor of to . 2π π

6.3.3

Real Fourier series of odd and even extensions

We will later find it useful to consider variations on real Fourier series that only use cosines or sines. Recall from calculus that a function g : R → R is even if g(−x) = g(x) for all x and odd if g(−x) = −g(x) for all x. Recall also that: • The product of even functions is even; • The product of two odd functions is even; • The product of an even and an odd function is odd; Z b Z b • If g is even, then g(x) dx = 2 g(x) dx; and −b

Z

0

b

• If g is odd, then

g(x) dx = 0. −b

Definition 6.3.4. For f : 0, 21 → R, define the even and odd extensions feven , fodd : S 1 → R of f by   f (x) if 0 < x < 12 ,   (   f (x) if 0 ≤ x ≤ 12 , 0 if x = 0, feven (x) = fodd (x) = (6.3.11) 1  f (−x) if − 2 ≤ x < 0, −f (−x) if − 12 < x < 0,    0 if x = ± 12 . In other words the even (resp. odd) extension of f is the even (resp. odd) function on S 1 that agrees with f on 0, 12 , with the possible exceptions of fodd (0) and fodd ( 21 ). More visually, as shown in Figure 6.3.1, we may think of the even and odd extensions of a given f as “extending by reflection” and “extending by rotation around the origin”, respectively.

6.3. REAL FOURIER SERIES

129

Figure 6.3.1: Even and odd extensions of the same function on [0, 12 ] 1 0, 1 Remark 6.3.5. We observe that if f : 0, 12 → R isin C and f (0) = f (1/2) = 0, 2 then fodd is in C 1 (S 1 ); and similarly, if f is in C 1 0, 12 and f 0 (0) = f 0 (1/2) = 0, then feven is in C 1 (S 1 ). See Problem 6.3.5. Theorem 6.3.6. For integrable f : 0, 12 → R, let feven and fodd be the even and odd extensions of f , respectively. Then the real Fourier series of feven and fodd have the form ∞

feven (x) ∼

a0 X + an cos(nx), 2

(6.3.12)

n=1

fodd (x) ∼

∞ X

bn sin(nx),

(6.3.13)

n=1

where Z an = 4

1/2

Z f (x) cos(2πnx) dx,

0

bn = 4

1/2

f (x) sin(2πnx) dx.

(6.3.14)

0

The series in (6.3.12) and (6.3.13) are called the Fourier cosine and Fourier sine series of f , respectively. Again, they are actually just equal to the Fourier series of feven and fodd ; the point is that cancellation allows us to write them in simpler form. Proof. Problem 6.3.6. Remark 6.3.7. We will later show that if f is a function on [0, 21 ], under reasonable conditions, the Fourier sine and cosine series of f converge to f for all but finitely many values of x ∈ [0, 12 ]. (For example, this holds if f is “piecewise Lipschitz”; see Section 8.5.6.) As a result, if f (0) 6= 0, it may seem paradoxical for f to be expressed as an infinite sum of sine functions, each of which is equal to 0 at x = 0; similarly, if f ( 12 ) 6= 0, we seem to obtain much the same paradox. The explanation for this phenomenon comes from the fact that the sine series of f is really the Fourier series of fodd , and fodd (0) = fodd ( 12 ) = 0 no matter what the value of f (0) is. An analogous explanation accounts for the apparent paradox of a function with f 0 (0) 6= 0 being expressed as an infinite sum of cosine functions; in that case, since feven is not differentiable at x = 0, term-by-term differentiation at x = 0 must fail.

130

CHAPTER 6. FOURIER SERIES

Problems 6.3.1. (a) Given coefficients cn ∈ C, find formulas for an , bn ∈ C such that N X n=−N

N

cn en (x) =

a0 X + (an cos(2πnx) + bn sin(2πnx)). 2

(6.3.15)

n=1

(b) Given an , bn ∈ C, find formulas for cn ∈ C such that (6.3.15) holds. 6.3.2. (Proves Theorem 6.3.1 ) Let f : S 1 → R be integrable and real-valued. (a) For n ∈ Z, prove that fˆ(−n) = fˆ(n). (b) Prove that if n ∈ Z, then fˆ(−n)e−n (x) + fˆ(n)en (x) = an cos(2πnx) + bn sin(2πnx), where an and bn satisfy (6.3.5) and an , bn ∈ R. (Suggestion: Decompose fˆ(n) = cn + dn i. Your proof should work equally well for the case n = 0.) 6.3.3. Prove the following formulas by writing everything in terms of complex exponentials. ( Z 1 1/2 if n = k, (a) Prove that cos(2πnx) cos(2πkx) dx = 0 otherwise. 0 ( Z 1 1/2 if n = k, (b) Prove that sin(2πnx) sin(2πkx) dx = . 0 otherwise. 0 Z 1 (c) Prove that sin(2πnx) cos(2πkx) dx = 0. 0

6.3.4. Assuming our usual definition of Fourier series and Theorem 6.3.1, prove Z an = 2

1

Z f (x) cos(2πnx) dx,

0

bn = 2

1

f (x) sin(2πnx) dx.

(6.3.16)

0

(Suggestion: Combine (6.2.1) and (6.3.5).) 6.3.5. Suppose f : 0, 12 → R has continuous f 0 : 0, 12 → R. (a) Prove that if f (0) = f (1/2) = 0, then fodd is continuous on S 1 . (Suggestion: You only need to check what happens at 0 and 1/2, or equivalently, 0 and −1/2.) 0 (b) Prove that if f (0) = f (1/2) = 0, then fodd is continuous on S 1 . (Suggestion: Chain Rule.)

(c) Prove that feven is continuous on S 1 . 0 (d) Prove that if f 0 (0) = f 0 (1/2) = 0, then feven is continuous on S 1 .

6.3.6. (Proves Theorem 6.3.6 ) Prove Theorem 6.3.6. (Suggestion: Use (6.3.10).)

6.4. CONVERGENCE OF FOURIER SERIES OF DIFFERENTIABLE FUNCTIONS131

6.4

Convergence of Fourier series of differentiable functions

To recap, so far, we have defined the Fourier series of an integrable f : S 1 → C, but we have discussed neither the question of when that series converges, nor the question of what it converges to. The reason is that these questions are extremely subtle and difficult! For example, there exist continuous f : S 1 → C whose Fourier series diverge at uncountably many points (!) in S 1 . (See Remark 8.5.20 for a discussion.) As it turns out, to get the convergence results we want, we will need both more sophisticated ideas (Chapters 7 and 8) and more honest-to-goodness hard work (Chapter 8). Instead, in this section, we will restrict ourselves to showing how close we can get to convergence with only a moderate amount of effort. We begin with a seemingly unremarkable, but surprisingly critical, formula describing the Fourier coefficients of f 0 in terms of the Fourier coefficients of f . Theorem 6.4.1. For f ∈ C 1 (S 1 ) and n ∈ Z, we have that fb0 (n) = (2πin)fˆ(n).

(6.4.1)

Proof. Problem 6.4.1. Note that Theorem 6.4.1 means that Fourier coefficients have a useful feature common to many similar transforms: Taking Fourier coefficients turns a differential operation (the derivative) into an algebraic operation (multiplication by 2πin). Theorem 6.4.2. For f : S 1 → C, we have that: 0 1 1. If f is continuous (i.e., f ∈ C (S )), then there exists some real constant K0 , inde ˆ pendent of n, such that f (n) ≤ K0 for all n ∈ Z.

2. If f ∈ C 1 (S 1 ), then there exists some real constant K1 , independent of n, such that ˆ K1 for all n ∈ Z, n 6= 0. f (n) ≤ |n| 3. If f ∈ C 2 (S 1 ), then there exists some real constant K2 , independent of n, such that K2 ˆ for all n ∈ Z, n 6= 0. f (n) ≤ |n|2 Proof. Problem 6.4.2. Theorem 6.4.2 also illustrates another feature of Fourier coefficients: Local behavior of f (x) (here, continuity or differentiability) determines the global behavior of fˆ(n). Taking this example to its logical extreme, we see that the Fourier coefficients of a function f are rapidly decaying if f ∈ C ∞ (S 1 ). To be precise: Corollary 6.4.3. If f ∈ C ∞ (S 1 ), then for any integer k ≥ 0, we have that lim nk fˆ(n) = 0.

n→∞

(6.4.2)

132

CHAPTER 6. FOURIER SERIES

Proof. Problem 6.4.3. Remarkably, the converse to Corollary 6.4.3 is also true, and there is also a partial converse to Theorem 6.4.2. However, the proof requires us to understand convergence of Fourier series much better, and therefore must wait until Section 8.5. Returning to our original question of the convergence of the Fourier series of f , we can use Theorem 6.4.2 to get the following result. Theorem 6.4.4. If f ∈ C 2 (S 1 ), then the Fourier series of f converges uniformly to some continuous function g such that for all n ∈ Z, gˆ(n) = fˆ(n). Proof. Recall that by definition, the Fourier series of f is equal to lim

N →∞

N X

fˆ(n)en (x).

(6.4.3)

n=−N

Problem 6.4.4 shows that (6.4.3) converges uniformly to some g, which must be continuous by Theorem 4.3.8; and Problem 6.4.5 then shows that for all n ∈ Z, gˆ(n) = fˆ(n). Now on the one hand, Theorem 6.4.4 gives a substantial result (the Fourier series of f converges to something that seems similar to f ) while remaining relatively close to the ground, as proofs go. Frustratingly, however, we are left with the highly unsatisfying situation that even though f converges to some function g with the same Fourier coefficents, we cannot yet be sure that f = g. In fact, by the linearity of Fourier coefficients (6.2.13), this comes down to the question: ˆ Question 6.4.5. For a continuous function k : S 1 → C, if k(n) = 0 for all n ∈ Z, is k = 0? Question 6.4.5 turns out to be not so easy — could there be some magical continuous k(x) that achieves cancellation, on average, with every function en (x) on [0, 1]? As we will see, the answer is no, but it will take us two chapters of new ideas and results to get there (Chapters 7 and 8).

Problems 6.4.1. (Proves Theorem 6.4.1 ) For f ∈ C 1 (S 1 ) and n ∈ Z, prove that (Suggestion: Integration by parts.)

c df (n) = (2πin)fˆ(n). dx

6.4.2. (Proves Theorem 6.4.2 ) (a) Let f : S 1 → C be continuous. Prove that there exists some constant K0 ∈ R such that fˆ(n) ≤ K0 for all n ∈ Z. (Suggestion: Use Corollary 3.1.14 to estimate (6.2.1).) K1 (b) For f ∈ C 1 (S 1 ), prove that there exists some constant K1 ∈ R such that fˆ(n) ≤ |n| for all n ∈ Z, n 6= 0. (Suggestion: Theorem 6.4.1.)

6.4. CONVERGENCE OF FOURIER SERIES OF DIFFERENTIABLE FUNCTIONS133 K2 (c) For f ∈ C 2 (S 1 ), prove that there exists some constant K2 ∈ R such that fˆ(n) ≤ |n|2 for all n ∈ Z, n 6= 0. 6.4.3. (Proves Corollary 6.4.3 ) Assume f ∈ C ∞ (S 1 ). Use induction on k to prove that for any integer k ≥ 0, lim nk fˆ(n) = 0. n→∞

6.4.4. (Proves Theorem 6.4.4 ) If f ∈ C 2 (S 1 ), prove that the Fourier series of f (see (6.4.3)) converges absolutely and uniformly to some function g. (Suggestion: Theorem 6.4.2 and Weierstrass M -test of Theorem 4.3.7.) 6.4.5. (Proves Theorem 6.4.4 ) Suppose f : S 1 → C is continuous, and suppose that the Fourier series of f (see (6.4.3)) converges absolutely and uniformly to some function g ∈ C 0 (S 1 ). Prove that for all k ∈ Z, gˆ(k) = fˆ(k). (Suggestion: Theorem 4.3.10.)

134

CHAPTER 6. FOURIER SERIES

Chapter 7

Hilbert spaces The method of “postulating” what we want has many advantages; they are the same as the advantages of theft over honest toil. — Bertrand Russell, Introduction to Mathematical Philosophy In this chapter, to understand the convergence of Fourier series, we introduce two new ideas: An algebraic structure known as an inner product space (Sections 7.1–7.3) and an extension of the Riemann integral known as the Lebesgue integral (Sections 7.5–7.6). Combining inner product spaces with a notion of (metric) completeness fulfilled by the Lebesgue integral, we obtain one of the central ideas of this book, namely, a Hilbert space (Section 7.6). The reader should be advised up front that our treatment of the Lebesgue integral is this book’s big cheat: To avoid spending half of the book talking about Lebesgue integration, we simply axiomatize the necessary properties of the Lebesgue integral and leave the proof of its existence to a later class. (See the Introduction for an explanation of our “eat dessert first” philosophy.)

7.1

Inner product spaces

We briefly pause our study of analysis for an excursion into linear algebra. Specifically, our first new idea is to define the following notion. (The reader new to these ideas may find it helpful to focus on the example V = Cn (Example 5.2.2), at least at first.) Definition 7.1.1. Let V be a function space (Definition 5.2.1). We define an inner product on V to be a function h·, ·i : V × V → C that satisfies the following axioms: 1. (Linear in first variable) For any f, g, h ∈ V and a, b ∈ C, we have that haf + bg, hi = a hf, hi + b hg, hi. 2. (Hermitian) For any f, g ∈ V , hg, f i = hf, gi. Note that consequently, for any f ∈ V , hf, f i = hf, f i must be in R. 3. (Positive definite) For any f ∈ V , hf, f i ≥ 0, and if hf, f i = 0, then f = 0. 135

136

CHAPTER 7. HILBERT SPACES

We also define an inner product space to be a function space V along with a particular choice of inner product. For brevity, to say that V is an inner product space means that its (unspecified) inner product will be denoted by h·, ·i. Note that since V is assumed to be a function space in the above definition, we will generally use names like f and g for elements of V . Definition 7.1.2. Let V be an inner product space. For f ∈ V , we define the norm of f p to be kf k = p hf, f i. When there is the possibility of other norms on V (see Section 7.2), we call kf k = hf, f i the inner product norm, or L2 norm, on V . As we shall see in Section 7.2, we can think of kf k as the length of f , which will allow us to define a metric on V . We collect a few miscellaneous straightforward properties of inner products in the following theorem. Theorem 7.1.3. Let V be an inner product space. 1. (Antilinear in second variable) For f, g, h ∈ V and a, b ∈ C, hf, ag + bhi = a hf, gi + b hf, hi .

(7.1.1)

kaf k = |a| kf k .

(7.1.2)

2. For f ∈ V and a ∈ C,

Proof. Problem 7.1.1. Remark 7.1.4. The reader interested in physics should know that in contrast with Definition 7.1.1 and Theorem 7.1.3, the physics convention is usually that inner products are linear in the second variable and antilinear in the first. It will be helpful for the reader to keep two examples in mind: First, the example that is most likely to be familiar, and second, the example that is of greatest importance to us (Example 7.1.6). Example 7.1.5. For V = Cn , the dot product h(v1 , . . . , vn ), (w1 , . . . , wn )i = v1 w1 + · · · + vn wn

(7.1.3)

is an inner product on V (Problem 7.1.2). (Compare the real-valued version from Section 5.3.) Example 7.1.6. Let X = [a, b] or S 1 , and let V = C 0 (X). Then for f, g ∈ V , Z hf, gi = f (x)g(x) dx

(7.1.4)

X

defines an inner product on V (Problem 7.1.3), which we call the L2 inner product on C 0 (X).

7.1. INNER PRODUCT SPACES

137

The benefit of having an inner product on a function space V is that it allows us to define geometry on V . Primarily, we have the following crucial idea. Definition 7.1.7. Let V be an inner product space. For f, g ∈ V , to say that f is orthogonal to g means that hf, gi = 0. We begin with some straightforward properties of orthogonality. Theorem 7.1.8. Let V be an inner product space, f, g, h ∈ V , and a, b ∈ C. 1. If f is orthogonal to g, then g is orthogonal to f . (We may therefore say simply that f and g are orthogonal.) 2. f is orthogonal to the zero vector/zero function 0. 3. If each of f and g is orthogonal to h, then af + bg is also orthogonal to h. Proof. Problem 7.1.4. If nonzero f, g are orthogonal, it is helpful to picture f and g as “vectors at right angles.” For example: Theorem 7.1.9 (Pythagorean Theorem). Let V be an inner product space. If f, g ∈ V are orthogonal, then kf + gk2 = kf k2 + kgk2 . Proof. Problem 7.1.5. A closely related idea is that of orthogonal projection: Definition 7.1.10. Let V be an inner product space, and let g be a nonzero element of V . For f ∈ V , we define the projection of f onto g to be projg (f ) =

hf, gi g. hg, gi

(7.1.5)

(Note that hg, gi = 6 0 by positive definiteness.) The important features of projection are: Theorem 7.1.11. Let V be an inner product space, and let g be a nonzero element of V . For f ∈ V , we have:

projg (f ), g = hf, gi , (7.1.6)

f − projg (f ), g = 0, (7.1.7)

f − projg (f ), projg (f ) = 0, (7.1.8)

projg (f ) ≤ kf k . (7.1.9) Note that (7.1.6)–(7.1.7) say that f is the sum of projg (f ) and a vector f − projg (f ) orthogonal to projg (f ), and (7.1.9) says that projg (f ) is never longer than f .

138

CHAPTER 7. HILBERT SPACES

Proof. Problem 7.1.6. We can now prove two of the fundamental properties of inner product spaces. (Compare Theorem 2.3.4.) Theorem 7.1.12. Let V be an inner product space. For f, g ∈ V , we have: 1. (Cauchy-Schwarz inequality) |hf, gi| ≤ kf k kgk; and 2. (Triangle inequality) kf + gk ≤ kf k + kgk. Proof. For Cauchy-Schwarz, note that when g = 0, both sides of the inequality are 0; otherwise, see Problem 7.1.7. For the triangle inequality, see Problem 7.1.8. Recall that if X = N or Z, then a : X → C, written an , is a sequence in the usual sense if X = N, and a “two-sided sequence” if X = Z; and recall also that `2 (X) is the set of all such an (n ∈ X) such that X kan k2 = |an |2 (7.1.10) n∈X

is finite (converges). In these terms, we may now use Cauchy-Schwarz to provide a slick proof of a fact previously mentioned in Section 5.3. Theorem 7.1.13. For X = N or Z, `2 (X) is a function space on X, and for an , bn ∈ `2 (X), X han , bn i = an bn (7.1.11) n∈X

converges absolutely. Proof. Since two-sided series are theoretically equivalent to ordinary series (Remark 4.1.4), it suffices to consider the case X = N, which is proved in Problem 7.1.9. We therefore have the following very useful example of an inner product space. Theorem 7.1.14. For X = N or Z, (7.1.11) defines an inner product on `2 (X). Proof. Problem 7.1.10.

Problems 7.1.1. (Proves Theorem 7.1.3 ) Let V be an inner product space, f, g, h ∈ V , and a, b ∈ C. (a) Prove that hf, ag + bhi = a hf, gi + b hf, hi. (b) Prove that kaf k = |a| kf k. 7.1.2. Verify that the dot product (7.1.3) satisfies the axioms of an inner product. (Suggestion: Theorem 2.2.3.)

7.1. INNER PRODUCT SPACES

139

7.1.3. Let X = [a, b] or S 1 . Verify that the L2 inner product (7.1.4) satisfies the axioms of an inner product on C 0 (X). (Suggestion: Use Lemma 3.4.9.) 7.1.4. (Proves Theorem 7.1.8 ) Let V be an inner product space, f, g, h ∈ V , and a, b ∈ C. (a) Prove that if f is orthogonal to g, then g is orthogonal to f . (b) Prove that f is orthogonal to the zero vector/zero function 0. (c) Prove that if each of f and g is orthogonal to h, then af + bg is also orthogonal to h. (Suggestion: Use the definitions of orthogonal and inner product.) 7.1.5. (Proves Theorem 7.1.9 ) Let V be an inner product space, and suppose f, g ∈ V are orthogonal. Prove kf + gk2 = kf k2 + kgk2 . 7.1.6. (Proves Theorem 7.1.11 ) Let V be an inner product space, let g be a nonzero element of V , and let f be in V .

(a) Prove projg (f ), g = hf, gi.

(b) Prove f − projg (f ), g = f − projg (f ), projg (f ) = 0.

(c) Prove projg (f ) ≤ kf k. (Suggestion: Pythagorean Theorem.) 7.1.7. (Proves Theorem 7.1.12 ) Let V be an inner product space, let f ∈ V , and let g be a nonzero element of V . (a) For a, b ∈ C, prove that |hag, bgi| = kagk kbgk.

(b) Prove that |hf, gi| = projg (f ) kgk. (Suggestion: Use Theorem 7.1.11.) (c) Prove that |hf, gi| ≤ kf k kgk. 7.1.8. (Proves Theorem 7.1.12 ) Let V be an inner product space. Prove that if f, g ∈ V , then kf + gk ≤ kf k + kgk. (Suggestion: Compare the square of both sides.) 7.1.9. (ProvesX Theorem 7.1.13 ) Recall that `2 (N) is the set of all such an (n ∈ N) such 2 that kan k = |an |2 is finite (converges). Suppose an , bn ∈ `2 (N). n∈N

(a) Prove that for all N ∈ N, N X an bn ≤ kan k kbn k ,

(7.1.12)

n=1

v uX u∞ where again, kan k = t |an |2 , and similarly for kbn k. (Suggestion: Use Cauchyn=1

Schwarz in

CN .)

(b) Prove that the product han , bn i defined in (7.1.11) converges absolutely. (Suggestion: Theorem 2.4.12.)

140

CHAPTER 7. HILBERT SPACES

(c) Prove that if an , bn ∈ `2 (X), then an + bn ∈ `2 (X). 7.1.10. (Proves Theorem 7.1.14 ) Let X = N or Z. Verify that the product han , bn i defined in (7.1.11) satisfies the axioms of an inner product on `2 (X).

7.2

Normed spaces

To talk about ideas like limits and continuity in an inner product space, it is helpful to consider the following more general idea. Definition 7.2.1. Let V be a function space. A norm on V is a function k·k : V → R that satisfies the following axioms: 1. (Positive definite) For all f ∈ V , kf k ≥ 0, and if kf k = 0, then f = 0. 2. (Absolute homogeneity) For all f ∈ V and a ∈ C, kaf k = |a| kf k. 3. (Triangle inequality) For all f, g ∈ V , kf + gk ≤ kf k + kgk. Analogously to Definition 7.1.1, a normed space is a function space with a (possibly unspecified) choice of norm. As before, we continue to use names like f and g for “vectors” in V . Example 7.2.2. If V is an inner product space (Definition 7.1.1), then the inner product norm on V (Definition 7.1.2) is a norm in the sense of Definition 7.2.1: Positive definiteness follows by definition, homogeneity follows by Theorem 7.1.3, and the triangle inequality follows by Theorem 7.1.12. In particular, C is a normed space with kzk = |z|. Example 7.2.3. Let X = [a, b] or S 1 , and consider the L∞ metric on V = C 0 (X) (Definition 5.2.12). If we define kf k = d(f, 0), then k·k is a norm on V . Positive definiteness and the triangle inequality follow by Theorem 5.2.13, as they are essentially 75% of the definition of metric, and homogeneity follows by Problem 5.2.3. We call this norm the L∞ norm on V . Example 7.2.4. Let X = [a, b] or S 1 , let V = C 0 (X), and define Z |f (x)| dx. kf k =

(7.2.1)

X

(Compare Section 5.1.) Positive definiteness follows by Lemma 3.4.9, homogeneity follows by the linearity of the integral, and the norm triangle inequality follows by the elementary triangle inequality |f (x) + g(x)| ≤ |f (x)| + |g(x)| and Theorem 3.4.2. We call this norm the L1 norm on V . For us, the most interesting thing about a normed space V is its associated metric, which will allow us to define convergence and continuity in V .

7.2. NORMED SPACES

141

Definition 7.2.5. Let V be a normed space. We define the norm metric on V by d(f, g) = kf − gk. Note that all of the axioms of a metric follow immediately from the definition of norm, except perhaps d(f, g) = d(g, f ), which follows because kf − gk = |−1| kg − f k by homogeneity. Recall that in Definition 2.4.13, we defined the limit of a sequence in any metric space, and normed spaces are just a special case of that definition. Nevertheless, the reader may find it helpful to see Definition 2.4.13 rewritten in our new setting, as follows. Definition 7.2.6. For a sequence fn in a normed space V and f ∈ V , to say that lim fn = n→∞

f means that for every > 0, there exists some N () ∈ R such that if n > N (), then kfn − f k < . The terms convergent, divergent, and so on, are again used as before. Remark 7.2.7. Let V = C 0 ([0, 1]), and consider a sequence fn in V . Note that we have now defined lim fn = f in four different ways: n→∞

• Pointwise convergence: For every x ∈ [0, 1], lim fn (x) = f (x). n→∞

• Uniform, or L∞ convergence: If k·k∞ is the L∞ norm on C 0 ([0, 1]), then lim kfn − f k∞ = n→∞

0, or in other words (see Lemma 4.3.6), fn converges uniformly to f on [0, 1]. Z 1 • L1 convergence: lim |fn (x) − f (x)| dx = 0. n→∞ 0

s •

L2

Z

convergence/inner product norm: lim

n→∞

1

|fn (x) − f (x)|2 dx = 0.

0

On the one hand, uniform convergence implies the other three senses of convergence, a fortiori in the case of pointwise convergence, and by Theorem 4.3.10 in the case of L1 and L2 convergence. However, the converses of those statements do not hold. Specifically, Example 4.2.5 gives an example where fn converges to 0 pointwise, but not in either L1 or L2 ; and a slight modification of Problem 7.2.1 gives an example of a sequence fn in C 0 ([0, 1]) that converges to 0 in both L1 and L2 , but not pointwise at any x ∈ [0, 1]. The upshot is that when we say that fn converges to f , we must make sure that the sense in which we mean that statement is clear. In fact, since normed spaces have the opertations of addition and scalar multiplication, we may also prove the corresponding limit theorems for such sequences by taking our proofs in C and replacing |·| with k·k. To be precise, in the following, let V be a normed space. Definition 7.2.8. To say that a nonempty subset S of V is bounded means that there exists some M > 0 such that for f ∈ S, kf k < M . To say that a sequence fn in V is bounded means that {fn } (the set of its values) is bounded. Theorem 7.2.9. If fn is a convergent sequence in V , then fn is bounded.

142

CHAPTER 7. HILBERT SPACES

Proof. Problem 7.2.2. Theorem 7.2.10. Let fn and gn be sequences in V , and suppose that lim fn = f , n→∞ lim gn = g, and c ∈ C. Then we have that:

n→∞

1. lim cfn = cf ; and n→∞

2. lim (fn + gn ) = f + g. n→∞

Proof. Problem 7.2.3. We shall also occasionally use the idea of continuous functions between normed spaces. In principle, this has been defined in Definition 3.1.2, but again, the reader may find it convenient to see it repeated in this setting: Definition 7.2.11. Let T : V → W be a function, where V and W are normed spaces (e.g., W = C). For g ∈ V , to say that T is continuous at g means that one of the following conditions holds: • (Sequential continuity) For every sequence fn in V such that lim fn = g, we have n→∞

that lim T (fn ) = T (g). n→∞

• (-δ continuity) For every > 0, there exists some δ() > 0 such that if kf − gk < δ(), then kT (f ) − T (g)k < . To say that T is continuous on V means that T is continuous at f for all f ∈ V . One immediate application for continuity is the following handy result. Theorem 7.2.12. Let V be an inner product space over F = C or R, and fix g ∈ V . Then the function Tg : V → F defined by Tg (f ) = hf, gi is continuous on V , and similarly for T g (f ) = hg, f i. In other words, an inner product is continuous in each variable. Proof. For g = 0, Tg is the zero function, so it suffices to consider the g 6= 0 case; see Problem 7.2.4. We shall usually apply Theorem 7.2.12 in the following form. Corollary 7.2.13. Let V be an inner product space, and suppose that

∞ X

fn converges to

n=1

f in the inner product norm. Then hf, gi =

∞ X n=1

hfn , gi ,

hg, f i =

∞ X

hg, fn i .

(7.2.2)

n=1

In particular, the series on the right-hand side of each equation of (7.2.2) converges.

7.2. NORMED SPACES

143

Proof. Problem 7.2.5. Finally, we remind the reader that the ideas of Cauchy sequences and Cauchy completeness hold in any metric space. Again, we repeat the definitions in our new setting. Definition 7.2.14. Let V be a normed space, and let fn be a sequence in V . To say that fn is Cauchy means that for every > 0, there exists some N () ∈ R such that if n, k > N (), then kfn − fk k < . The analogue of Lemma 2.5.5 still holds in a normed space: Lemma 7.2.15. If fn is a Cauchy sequence in a normed space V , then fn is bounded. Proof. Problem 7.2.6. Theorem 2.5.2 still implies that a convergent sequence in a normed space is Cauchy, but the converse need not be true, as shown by the following example. Example 7.2.16. Consider the sequence fn (x) = |x|1+(1/n) in V = C 1 ([−1, 1]) from Problem 4.3.6 and f (x) = |x| in C 0 ([−1, 1]), noting that f ∈ / V . As shown in Example 4.3.11, fn converges to f uniformly on [−1, 1], which means that fn converges to f in the L∞ norm (Example 7.2.3). Therefore, by Theorem 2.5.2, fn is a Cauchy sequence in the L∞ norm, even in the smaller space V . However, since the L∞ limit of fn is not in V , V is not complete. We therefore come back to the following definition, which we again restate in our new setting. Definition 7.2.17. To say that a normed space V is complete means that any Cauchy sequence in V converges to some limit in V . Now, on the one hand, we shall see that (Cauchy) completeness is a useful quality for a function space to possess. Unfortunately, the following example shows that the function space in which we are most interested so far is not complete under the L2 norm. (We omit some details for brevity.) Example 7.2.18 (Example 4.2.2 revisited). Let V = C 0 ([0, 2]), and consider the following sequence in V : ( xn if 0 ≤ x ≤ 1, fn (x) = (7.2.3) 1 if x > 1. A calculation shows that Z

2

1 n

kfn − fk k =

k 2

Z

(x − x ) dx + 0

Z =

2

0 dx 1

1

x2n − 2xn+k + x2k dx

0

x2n+1 2xn+k+1 x2k+1 = − + 2n + 1 n + k + 1 2k + 1 1 2 1 = − + . 2n + 1 n + k + 1 2k + 1

1 0

(7.2.4)

144

CHAPTER 7. HILBERT SPACES

2 , (7.2.4) plus the triangle inequality shows that 2 2 2 kfn − fk k < , or in other words, fn is Cauchy. However, if there were some f ∈ C 0 ([0, 1]) to which fn converged under the L2 norm, one can show (for example, using ideas from Sections 7.4 and 7.5) that essentially, the only possibility is ( 0 if 0 ≤ x < 1, f (x) = (7.2.5) 1 if x ≥ 1,

Therefore, for > 0, if n, k > N () =

which is not continuous. It follows that V is not complete under the L2 norm. One might then try to fix the problem in Example 7.2.18 by looking at the space of all functions that are continuous on [0, 1] except at finitely many points, or even the space of (Riemann) integrable functions on [0, 1]. However, this allows us to construct sequences whose limits are even more ill-behaved, such as Example 4.2.4. Taking this process to its logical extent, we are naturally led to expand the definition of integration to what is known as the Lebesgue integral; see Sections 7.4–7.6 for (much) more on this idea. As a final aside, for the reader who has seen the fundamentals of the topology of metric spaces (Section 2.6), we present an example of a space, namely, `2 (N) (Theorem 7.1.14), where the most natural generalization of Bolzano-Weierstrass (Corollary 2.6.7) in C does not hold. We first introduce some notation for sequences in `2 (N). Notation 7.2.19. Because the elements of `2 (N) are themselves sequences, a sequence in `2 (N) is, by definition, a sequence of sequences. We therefore use the following somewhat unorthodox notation when discussing sequences in `2 (N): We write single elements of `2 (N) in the form a(x), where the variable x ranges over N, and we write sequences in `2 (N) in the form an (x), where each an (x) for fixed n is a sequence in the variable x. Example 7.2.20. We can now describe a counterexample to the most natural generalization of Corollary 2.6.7, as promised back in Section 2.6. Let V = `2 (N), and for k ∈ N, following Notation 7.2.19, let ek (x) be the element of `2 (N) defined by ek (k) = 1 and ek (x) = 0 for x 6= k; in other words, let e1 = (1, 0, 0, . . . ), e2 = (0, 1, 0, . . . ), and so on. Then N1 (0), the closed 1-neighborhood of 0 (Definition 2.6.1), is closed and bounded (Problem 2.6.4), but the sequence ek in N1 (0) has no convergent subsequence (Problem 7.2.7).

Problems 7.2.1. For k ≥ 0 and 2k ≤ n ≤ 2k+1 − 1, define fn : [0, 1] → C by  n − 2k n + 1 − 2k  ≤ x ≤ , 1 if fn (x) = 2k 2k  0 otherwise.

(7.2.6)

Note that for any positive integer n, 2k ≤ n ≤ 2k+1 − 1 exactly when k = blog2 (n)c, so (7.2.6) gives a well-defined sequence of functions fn .

7.2. NORMED SPACES

145

(a) Draw the graph of fn for 1 ≤ n ≤ 7. Z 1 (b) Prove that lim fn (x) dx = 0, or in other words, fn converges to 0 in the L1 norm. n→∞ 0

(Suggestion: Calculate the integral as a function of k = blog2 (n)c.) (c) Prove that for any x ∈ [0, 1], lim fn (x) does not exist. (Suggestion: It is probably n→∞

better to give a qualitative description of why this happens than to use formulas.) Note: The above example does not quite live up to its billing in Remark 7.2.7, as the functions fn are not continuous. We leave it to the reader to modify the fn slightly to get the same result with continuous functions, as suggested by Figure 7.2.1. (We also suggest the reader only try to understand the basic idea, and avoid formulas and other details.)

Figure 7.2.1: Adjusting (7.2.6) to be continuous 7.2.2. (Proves Theorem 7.2.9 ) Let V be a normed space, and suppose that lim fn = f in n→∞

V . Prove that fn is bounded in V . (Suggestion: See the proof of Theorem 2.4.5.) 7.2.3. (Proves Theorem 7.2.10 ) Let V be a normed space, let a ∈ C, and suppose that lim fn = f and lim gn = g in V . n→∞

n→∞

(a) Prove that lim afn = af in V . n→∞

(b) Prove that lim (fn + gn ) = f + g in V . n→∞

(Suggestion: See Section 2.4.) 7.2.4. (Proves Theorem 7.2.12 ) Let V be an inner product space, fix g 6= 0 in V , and let Tg : V → C be defined by Tg (f ) = hf, gi. Prove that Tg is continuous on V . (Suggestion: Cauchy-Schwarz.) 7.2.5. (Proves Corollary 7.2.13 ) Let V be an inner product space, fix g 6= 0 in V , and ∞ X let Tg : V → C be defined by Tg (f ) = hf, gi. Suppose fn converges to f in the inner n=1

product norm. (a) Prove that Tg

N X

! fn

n=1

(b) Prove that Tg (f ) =

=

N X

hfn , gi.

n=1 ∞ X

hfn , gi; in particular, the series on the right-hand side con-

n=1

verges. (Suggestion: Sequential definition of continuity.)

146

CHAPTER 7. HILBERT SPACES

7.2.6. (Proves Theorem 7.2.15 ) Let V be a normed space, and suppose that fn is Cauchy with respect to the norm metric. Prove that fn is bounded in V . (Suggestion: See Problem 2.5.1.) 7.2.7. Let V = `2 (N), and for k ∈ N, following Notation 7.2.19, let ek (x) be the element of `2 (N) defined by ek (k) = 1 and ek (x) = 0 for x 6= k (i.e., e1 = (1, 0, 0, . . . ), e2 = (0, 1, 0, . . . ), etc.). Let B = {ek }. Prove that any subsequence of the sequence ek is not Cauchy, and therefore, not convergent.

7.3

Orthogonal sets and bases

Returning to the setting of inner product spaces, recall that in Section 7.1, we saw how orthogonality in an inner product space V allows us to decompose a given f ∈ V into pieces that are parallel and orthogonal to some other given g ∈ V (Theorem 7.1.11). In this section, we describe how to decompose an entire space relative to a certain type of set we call an orthogonal basis (Definition 7.3.12). We begin with the following idea. Definition 7.3.1. Let V be an inner product space, and let I be an indexing set, e.g., I = {1, . . . , N } or I = Z. To say that B = {ui | i ∈ I} ⊂ V is an orthogonal set means that for i 6= j, ui and uj are orthogonal (i.e., hui , uj i = 0). To say that B = {ei | i ∈ I} ⊂ V is an orthonormal set means that B is an orthogonal set and also, for every i ∈ I, hei , ej i = 1. Example 7.3.2. In Example 7.1.5, we saw that CN with the dot product (7.1.3) is an inner product space. Then if en is the standard nth basis vector (i.e., nth coordinate equal to 1 and all other coordinates equal to 0), we see that B = {en | n ∈ {1, . . . , N }} is an orthonormal set. Example 7.3.3. In Example 7.1.6, we saw that C 0 (S 1 ) with the L2 inner product (7.1.4) is an inner product space. Then if en (x) = e2πinx , as in (4.6.1), we see that (4.6.8) says precisely that {en | n ∈ Z} is an orthonormal set. Example 7.3.4. We also see from (6.3.7)–(6.3.9) that B = {1, cos(2πnx), sin(2πnx) | n ∈ Z, n > 0}

(7.3.1)

1 is an orthogonal set, but not an orthonormal one, as kcos(2πnx)k2 = ksin(2πnx)k2 = . 2 Many facts about orthogonality can be generalized from our discussion of orthogonality in Rn from Section 5.3. For example, we have Problems 7.3.1 and 7.3.2 and the following. Theorem 7.3.5 (Generalized Pythagorean Theorem). If {f1 , . . . , fn } is an orthogonal set in an inner product space V , then

2

n n

X

X

kfk k2 . (7.3.2) fk =

k=1

k=1

7.3. ORTHOGONAL SETS AND BASES

147

Proof. Problem 7.3.3. Our next step is to define the following geometric generalization of Fourier polynomials. Definition 7.3.6. Let V be an inner product space, and suppose B = {u1 , . . . , uN } is an orthogonal set of nonzero vectors in V . For f ∈ V and 1 ≤ n ≤ N , we define the nth generalized Fourier coefficient of f with respect to B to be hf, un i hf, un i . = fˆ(n) = hun , un i kun k2 We also define projB f =

N X

fˆ(n)un =

n=1

N X hf, un i un hun , un i

(7.3.3)

(7.3.4)

n=1

to be the projection of f onto the span of B. Note that for an orthonormal set B = {e1 , . . . , eN }, (7.3.3) becomes fˆ(n) = hf, en i. (Compare Problem 7.3.1.) In any case, letting N go to infinity, we also have the following generalization. Definition 7.3.7. Let V be an inner product space, and suppose B = {ui | i ∈ N} is an orthogonal set of nonzero vectors in V . We define f ∼ lim

N →∞

N X n=1

fˆ(n)un =

∞ X

fˆ(n)un

(7.3.5)

n=1

to be the generalized Fourier series of f with respect to B. Note that as with ordinary Fourier series, the symbol ∼ does not indicate any assumptions about convergence. Example 7.3.8. For V = C 0 (S 1 ) with the L2 inner product and BN = {e0 , e1 , e−1 , e2 , e−2 , . . . , eN , e−N } ,

(7.3.6)

the nth Fourier coefficient fˆ(n) is exactly as defined in (6.2.1) (adjusting our numbering appropriately), the projection of f onto the span of BN is precisely the N th Fourier polynomial of f , and the generalized Fourier series with respect to the (infinite) orthonormal set B = {e0 , e1 , e−1 , e2 , e−2 , . . . } is precisely the usual Fourier series of f . Example 7.3.9. For V = C 0 (S 1 ) (real-valued) with the L2 inner product and BN = {1, cos(2πx), sin(2πx), cos(4πx), sin(4πx), . . . , cos(2N πx), sin(2N πx)} ,

(7.3.7)

the generalized Fourier coefficients are an and bn as defined in (6.3.10), and the projection of f onto the span of BN is given in (6.3.4). Note that the factors of 2 found in the formulas for an and bn in (6.3.10) can be thought of as coming from the hun , un i factor in (7.3.3) and 1 the fact that hcos(2πnx), cos(2πnx)i = = hsin(2πnx), sin(2πnx)i. 2

148

CHAPTER 7. HILBERT SPACES

The key feature of the projection projB f is that, in the linear-algebraic terminology of Definition 5.2.16, it is the vector in the span of B that best approximates f , as measured by the L2 norm. To be precise, we have the following crucial theorem. Theorem 7.3.10 (Best Approximation Theorem). Let V be an inner product space, let B = {u1 , . . . , uN } be an orthogonal set of nonzero vectors in V , and let f be in V . 1. For 1 ≤ n ≤ N , the vector f − projB f is orthogonal to un . 2. For any c1 , . . . , cN ∈ C, we have

2 N N

2 X X

ˆ 2 cn un =

f − f (n) − cn hun , un i + kf − projB f k .

n=1

(7.3.8)

n=1

3. The vector projB f is the unique element in the span of B that is closest to f in the L2 metric. 4. (Bessel’s inequality) We have that kprojB f k ≤ kf k, or in other words, N X ˆ 2 f (n) hun , un i ≤ kf k .

(7.3.9)

n=1

In particular, claim (3) means that the N th Fourier polynomial of f is the unique trigonometric polynomial of degree N that is closest to f in the L2 metric, providing a complete answer to Question 6.1.2. Compare also Theorem 7.1.11. Proof. Claims (1) and (2) are proven in Problem 7.3.4. As for claim (3), since the righthand side of (7.3.8) is a sum of nonnegative terms, and kf − projB f k2 is independent of N

X

the cn , we see that f − cn un is minimized if and only if cn = fˆ(n). Bessel’s inequality

n=1 is then a special case of (7.3.8) (Problem 7.3.5). We pause to note a strange fact about convergence of orthogonal series in the inner product metric that will come in handy later: Unlike the usual situation of convergence, where a series can get closer to and then farther away from its limit, orthogonal series always get closer. To be precise: Corollary 7.3.11 (Always Better Theorem). Let V be an inner product space, and let B = {un | n ∈ N} be an orthogonal set of nonzero vectors in v. Then for f ∈ V and 1 ≤ K ≤ N , we have that

N K

X X

fˆ(n)un ≤ f − fˆ(n)un . (7.3.10)

f −

n=1

n=1

7.3. ORTHOGONAL SETS AND BASES

149

In other words, as N increases in the (generalized) Fourier series (7.3.5) (Definition 7.3.7), the L2 approximation to f never gets worse, only better. Proof. Problem 7.3.6. We may now finally define the key idea of this section. Note that we continue to adhere to our summation conventions from Remark 4.1.4. Definition 7.3.12. Let V be an inner product space. To say that B = {un | n ∈ N} ⊂ V is an orthogonal basis means that B is an orthogonal set of nonzero vectors and for any f ∈ V , the generalized Fourier series of f converges to f in the inner product metric. In other words, the latter condition says that for any f ∈ V , ∞ X n=1

fˆ(n)un = lim

N X

N →∞

fˆ(n)un = f,

(7.3.11)

n=1

where convergence is in the inner product norm (L2 norm). Similarly, to say that B = {un | n ∈ Z} ⊂ V is an orthogonal basis means that B is an orthogonal set of nonzero vectors, and, for any f ∈ V , X n∈Z

fˆ(n)un = lim

N →∞

N X

fˆ(n)un = f.

(7.3.12)

n=−N

Finally, orthonormal bases are defined analogously, replacing “orthogonal set of nonzero vectors” with “orthonormal set.” Remark 7.3.13. Note that Definition 7.3.12 is given for the general abstract setting of an inner product space, which means that only way to make sense of convergence in (7.3.11) and (7.3.12) is in the inner product norm. Nevertheless, since there will be other kinds of convergence possible in examples (see Remark 7.2.7), we carefully (if somewhat redundantly) specify convergence in the inner product norm in Definition 7.3.12. Example 7.3.14. Let V = CN with the dot product, and let en be the nth standard basis vector (1 ≤ n ≤ N ). Then B = {en | 1 ≤ n ≤ N } is an orthonormal basis for V (Problem 7.3.7). Remark 7.3.15. More generally, the reader may recall from linear algebra that if B is a finite subset of an inner product space V , then to say that B is a basis for V means that V equals the span of B and B is linearly independent (Definition 5.2.15). However, since orthogonal sets of nonzero vectors are linearly independent (Problem 7.3.2), and for finite B, the “series span” of B defined by (7.3.11) is the same as the algebraic span from Definition 5.2.16, the two definitions are equivalent. See Appendix B, especially Remark B.10, for more about the relationship between linear-algebraic bases and orthogonal bases in general.

150

CHAPTER 7. HILBERT SPACES

Example 7.3.16. Let X = N or Z, let V = `2 (X) (Definition 5.3.2), and let en ∈ V be the sequence defined by ( 1 if x = n, en (x) = (7.3.13) 0 otherwise. In other words, let en be the analogue in V of the nth standard basis vector. Then B = {en | n ∈ X} is an orthonormal basis for V (Problem 7.3.8). We will soon have much more to say about orthogonal bases. However, to take full advantage of them, we will need to consider not just inner product spaces, but inner product spaces with all of their “holes” filled in (see Example 7.2.18). This means that we can no longer put off discussing the Lebesgue integral, and we do so in the next section.

Problems 7.3.1. Let V be an inner product space, and let B = {un | n ∈ N} be an orthonormal set in ∞ X V . Suppose that f = an un , where convergence is in the inner product norm on V . Prove n=1

that ak = hf, uk i. (Suggestion: Continuity of the inner product; compare Problem 5.3.2.) 7.3.2. Let V be an inner product space and let B = {u1 , . . . , uN } be an orthogonal set of nonzero vectors in V . Prove that B is linearly independent (Definition 5.2.15). (Suggestion: What is hc1 u1 + · · · + cN uN , ui i?) 7.3.3. (Proves Theorem 7.3.5 ) Prove that if {f1 , . . . , fn } is an orthogonal set in an inner product space V , then

2 n n

X

X

kfk k2 . (7.3.14) fk =

k=1

k=1

(Suggestion: Theorem 7.1.9 and induction.) 7.3.4. (Proves Theorem 7.3.10 ) Let V be an inner product space, let B = {u1 , . . . , uN } be an orthogonal set of nonzero vectors in V , and let f be in V . (a) Prove that for 1 ≤ n ≤ N , the vector f − projB f is orthogonal to un . (b) For c1 , . . . , cN ∈ C, prove (7.3.8). (Suggestion: Pythagorean Theorem.) 7.3.5. (Proves Theorem 7.3.10 ) Let V be an inner product space, let B = {u1 , . . . , uN } be an orthogonal set of nonzero vectors in V , and let f be in V . Prove Bessel’s inequality kprojB f k ≤ kf k. (Suggestion: Calculate kprojB f k2 .) 7.3.6. (Proves Corollary 7.3.11 ) Let V be an inner product space and let B = {un } be an orthogonal set of nonzero vectors in V . Prove that for any f ∈ V and 1 ≤ K ≤ N ,

N K

X X

fˆ(n)un ≤ f − fˆ(n)un . (7.3.15)

f −

n=1

n=1

7.4. THE LEBESGUE INTEGRAL: MEASURE ZERO

151

(Suggestion: Best Approximation Theorem.) 7.3.7. Let V = CN with the dot product, and let en be the nth standard basis vector (1 ≤ n ≤ N ). Prove that B = {en | 1 ≤ n ≤ N } is an orthonormal basis for V . (Suggestion: N X The main point is to prove that f = fˆ(n)en , and since B is finite, this question is purely n=1

algebraic in nature.) 7.3.8. Let X = N or Z, let V = `2 (X) (Definition 5.3.2), and let en ∈ V be the sequence defined by ( 1 if x = n, en (x) = (7.3.16) 0 otherwise. Prove that B = {en | n ∈ X} is an orthonormal basis for V . (Suggestion: What does it mean to say an ∈ `2 (X)?)

7.4

The Lebesgue integral: Measure zero

Recall from Analysis I that the real numbers √ R are what we get when we start with the rational numbers Q and fill in “holes” like 2 or π using the completeness axiom (supremum property). Equivalently, but more abstractly, one can fill in the holes of Q by forcing Cauchy completeness (Definition 2.5.4) to hold; see, for example, Hewitt and Stromberg [HS75, Sec. 5]. As we have seen, like Q, the space of Riemann integrable functions also has “holes”: • It is possible to have a sequence of Riemann integrable functions whose pointwise limit is not Riemann integrable (Example 4.2.4). • If we look at the space V = C 0 ([a, b]) of continuous functions on a closed and bounded interval under the L2 metric, we see that V is not complete as a metric space, just like Q (Example 7.2.18). We can fill in those “holes” of the Riemann integral by defining what is known as the Lebesgue integral. However, to cover the Lebesgue integral properly requires a book of its own; see, for example, Royden and Fitzpatrick [RF10], or at a more sophisticated level, Rudin [Rud86]. We therefore come to our great cheat: Instead of developing the Lebesgue integral in detail, we will simply assume the properties of the Lebesgue integral that we will require as axioms, and leave the details of its definition and the proof of its properties for to be learned by the reader elsewhere. (Again: “Dessert first.”) Now, truth be told, the reader does need to know a few details to understand the value of the Lebesgue integral. In particular, while the reader does not need to understand the measure of a subset of R, which would be the first step in most full developments of the Lebesgue integral, we do need to discuss the idea of a set of measure zero, that is, a subset of the domain of a function f that “doesn’t matter” when we take the integral of f .

152

CHAPTER 7. HILBERT SPACES We begin with the following definitions.

Definition 7.4.1. We define the length of an open interval (a, b) to be `((a, b)) = b − a. For E ⊆ R, we define a countable open coverSof E to be a countable collection {Ui } of open intervals whose union contains E (i.e., E ⊆ i∈N Ui ). Definition 7.4.2. To say that E ⊆ R has measure zero means that for any > 0, there ∞ ∞ X X exists some open cover {Ui } of E such that `(Ui ) < . (Note that since `(Ui ) is a i=1

i=1

sum with nonnegative terms, the order of summation does not matter; see Appendix A.) Definition 7.4.3. For X ⊆ R, to say that a statement is true almost everywhere, or a.e., in X, means that the set of points in X where the statement does not hold has measure 0. Phrases like almost all points in X are defined similarly. For example, for f, g : X → C, to say that f (x) = g(x) a.e. means that the set {x ∈ X | f (x) 6= g(x)} has measure zero. The reason the reader will need some idea of what a set of measure zero is like is that if we use Lebesgue integration to prove a result about functions with domain X, then by the very nature of Lebesgue integration, those results hold only “up to sets of measure zero” in X. Moreover, with respect to Lebesgue integration, functions themselves are more formally considered as equivalence classes under the relation of being equal almost everywhere. We therefore present several results (Theorems 7.4.5 and 7.4.8, Corollary 7.4.9) designed to convey some intuition about sets of measure zero. We begin with a few concrete examples. Example 7.4.4. A set consisting of a single point of R has measure zero (Problem 7.4.1). More generally, any finite subset of R has measure zero (Problem 7.4.2). Theorem 7.4.5. Let E = {xi ∈ R | i ∈ N} be a countably infinite subset of R. Then E has measure zero. Proof. Suppose > 0. For i ∈ N, let Ui be an interval of length n+1 containing xi , i.e., 2 Ui = xi − n+2 , xi + n+2 . Then {Ui } is an open cover of of E with 2 2 ∞ X i=1

`(Ui ) =

∞ X i=1

2n+1

=

< . 2

(7.4.1)

The theorem follows. So for example, any subset of the rationals has measure zero. The following example, however, shows that a set of measure zero need not be countable. Example 7.4.6 (Cantor “middle thirds” set). Let E0 = [0, 1], E1 = [0, 1/3] ∪ [2/3, 1], E2 = [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1], and in general, given En , let En+1 be the set obtained from En by deleting the middle third of each closed interval in En . Then for any n ∈ N, En can be covered by a collection of open intervals, each slightly larger than

7.4. THE LEBESGUE INTEGRAL: MEASURE ZERO

153

a corresponding closed interval in En , T whose lengths total at most 2(2/3)n (say). Taking the limit as n → ∞, we see that E = n∈N En has measure zero. Nevertheless, it can be shown that E is uncountable, for example, by mapping the set of countable binary strings injectively into E. In addition, an argument similar to the proof of Theorem 7.4.5 shows that a countable union of (not necessarily countable) sets of measure zero still has measure zero (Problem 7.4.3). It follows that sets of measure zero can be spread everywhere in R, and at the same time, quite large in cardinality. Nevertheless, it is still true that any set of measure zero amounts to “almost nothing”. In a course on measure theory, this is shown by defining the measure, or generalized length, of a large class of subsets of R, and proving that the measure of any set is unaffected by removing a set of measure zero. We will instead settle for a much weaker statement (Theorem 7.4.8), for which we need the following “obvious” but not so conveniently proven lemma. Lemma 7.4.7. Let (A, B) be an open interval in R. If {Ui } is a countable collection of open intervals, each contained in (A, B), then there exists a countable collection {Vj } of bounded open intervals such that: 1. The Vj are pairwise disjoint, or in other words, for j 6= k, Vj ∩ Vk = ∅; 2.

∞ [ j=1

3.

∞ X j=1

Vj =

∞ [

Ui ; and

i=1

`(Vj ) ≤

∞ X

`(Ui ).

i=1

We hope that the idea of Lemma 7.4.7 seems “obvious”: By merging intervals of {Ui } with nonempty intersection, we may assume that the intervals in a countable open cover are disjoint without increasing the total length. And indeed, the reader should feel comfortable skipping the proof on a first reading. However, we include the proof here to illustrate one of the difficulties of measure theory: It can be tricky to prove “obvious” statements about things like countable collections of intervals for several reasons — not least of which is that occasionally those “obvious” statements are not true. Proof. Fix a countable collection of open intervals {Ui }. For the purposes of this proof only, we define an interval chain of length k ≥ 0 to be a finite sequence of open intervals {W0 , . . . , Wk } ⊆ {Ui } such that for 0 ≤ i ≤ k − 1, Wi ∩ Wi+1 6= ∅. To say that open intervals U and U 0 are connected by an interval chain means that there exists an interval chain {W0 , . . . , Wk } such that U = W0 and Wk = U 0 . Define a relation ∼ on {Ui } by saying that U ∼ U 0 exactly when U and U 0 are conncted by an interval chain. We claim that:

154

CHAPTER 7. HILBERT SPACES Claim 1: ∼ is an equivalence relation on {Ui }. Furthermore, if C and C 0 are different equivalence classes under ∼, then ! ! [ [ U ∩ U 0 = ∅. (7.4.2) U 0 ∈C 0

U ∈C

Claim 1 is proven in Problem 7.4.4. We next claim: Claim 2: If {W0 , . . . , Wk } is an interval chain, where Wi = (ai , bi ), a = k [ min(a0 , . . . , ak ), and b = max(b0 , . . . , bk ), then Wi is precisely the open ini=0

terval (a, b), and k X

`(Wi ) ≥ b − a,

(7.4.3)

i=0

the length of the single interval (a, b). Claim 2 is proven in Problem 7.4.5. Our final claim is: Claim 3: Suppose {Ui } is a countable collection of open intervals Ui = (ai , bi ), all of which are equivalent under ∼. Let a = inf {ai } and b = sup {bi }, both of which are finite because A and B are lower and upper bounds for {ai } and {bi }, respectively. Then ∞ [

Un = (a, b),

(7.4.4)

`(Un ) ≥ b − a.

(7.4.5)

n=1 ∞ X n=1

To prove Claim 3, on the one hand, since a ≤ an < bn ≤ b for all n ∈ N,

∞ [

Un ⊆ (a, b).

n=1

On the other hand, suppose a < x < y < b. By the Arbitrarily Close Criterion 2.1.4, there exist k, n ∈ N such that a ≤ ak < x < y < bn ≤ b. Since all of the Ui are equivalent, let {W0 , . . . , Wm } be an interval chain from W0 = (ak , bk ) to Wm = (an , bn ), and by Claim 2, m [ let Wi = (a0 , b0 ), where a ≤ a0 ≤ ak < x < y < bn ≤ b0 ≤ b. Then i=0

(x, y) ⊆ (ak , bn ) ⊆

m [ i=0

Wi ⊆

∞ [ n=1

Un ,

(7.4.6)

7.4. THE LEBESGUE INTEGRAL: MEASURE ZERO

155

and Claim 2 also implies that ∞ X n=1

`(Un ) ≥

m X

`(Wi ) ≥ b0 − a0 .

(7.4.7)

i=0

Therefore, since we may choose x and y arbitrarily close to a and b, (7.4.4) follows from (7.4.6). Furthermore, choosing x and y arbitrarily close to a and b forces b0 − a0 to be ∞ X arbitrarily close to b − a, so (7.4.7) implies that `(Un ) ≥ (b − a) − for all > 0, and n=1

(7.4.5) follows. Turning to the full Lemma, let the Vj be the unions of the (countably many) equivalence classes of {Ui }. Since each Vj is a bounded open interval by (7.4.4) in Claim 3, it remains to verify the various statements in the conclusion of the lemma. However, statement (1) of the lemma follows because unions of different equivalence classes are disjoint (Claim 1), statement (2) follows by construction of the Vj , and statement (3) follows by (7.4.5) in ∞ X Claim 3 and rearranging the order of summation in `(Ui ). The Lemma follows. i=1

Theorem 7.4.8. If E is a set of measure zero, and (a, b) is any open interval in R, then (a, b) is not contained in E. Proof. Proceeding by contradiction, suppose (a, b) ⊆ E, and suppose {Ui } is an open cover of E (and therefore, of (a, b)) with total length less than (b − a)/2. Without either changing the fact that {Ui } is an open cover of (a, b) or increasing the length of any interval Ui , we may assume that no left endpoint of an interval Ui is less than a and no right endpoint of an interval Ui is greater than b. Therefore, applying Lemma 7.4.7, we may also assume that the elements of {Ui } are pairwise disjoint. Then on the one hand, if we have some Ui = (ai , bi ) with bi < b, then the fact that the Ui are pairwise disjoint means that the point bi cannot be contained in any other Uj , and similarly, if a < ai , then the point ai cannot be contained in any other Uj . On the other hand, if some Ui = (a, b), that contradicts our assumption of total length less than (b − a)/2. Either way, we have a contradiction, and the theorem follows. Corollary 7.4.9. Suppose X = [a, b] or R and for some f, g : X → C, we have that f (x) = g(x) for almost all x ∈ X. Then for c ∈ X, if f and g are continuous at c, then f (c) = g(c). In particular, if f and g are both continuous on X and equal a.e. on X, then they are equal at every point of X. Note that Corollary 7.4.9 also applies to functions on X = S 1 , as such functions are really defined on R. Proof. Considering h(x) = f (x)−g(x), it suffices to show that if h(x) = 0 a.e. on X and h is continuous at c, then h(c) = 0. However, in that case, by Theorem 7.4.8, for every n, there

156

CHAPTER 7. HILBERT SPACES

is a point cn ∈

1 1 c − ,c + n n

such that h(cn ) = 0, and so by the sequential definition of

continuity, h(c) = lim h(cn ) = 0. n→∞

(7.4.8)

The corollary follows.

Problems 7.4.1. For E = {x0 } ⊂ R, prove that E has measure zero. 7.4.2. For E = {x1 , . . . , xn } ⊂ R, prove that E has measure zero. S 7.4.3. Suppose that Ei ⊂ R has measure zero for all i ∈ N. Prove that i∈N Ei has measure zero (i.e., a countable union of sets of measure zero still has measure zero). (Suggestion: Recall that a countable union of countable collections is still a countable collection.) 7.4.4. (Proves Lemma 7.4.7 ) Let {Ui } be a collection of open intervals, and define a relation ∼ on {Ui } as in the proof of Lemma 7.4.7. (a) Prove that ∼ is an equivalence relation. (Recall that this means proving that ∼ is reflective, symmetric, and transitive.) S S 0 is (b) Prove that if C and C 0 are equivalence classes under ∼ and U 0 ∈C 0 U U ∈C U ∩ nonempty, then C = C 0 . (Suggestion: Recall that by the general theory of equivalence classes, if some U ∈ C is equivalent to some U 0 ∈ C 0 , then C = C 0 .) 7.4.5. (Proves Lemma 7.4.7 ) Let Wi = (ai , bi ) for 0 ≤ i ≤ k, and suppose that Wi ∩Wi+1 6= ∅ for 0 ≤ i ≤ k − 1. (a) Prove that if W 0 = (a0 , b0 ) and W 00 = (a00 , b00 ) are open intervals such that W 0 ∩W 00 6= ∅, then W ∪ W 0 = (a, b), where a = min(a0 , a00 ) and b = max(b0 , b00 ), and b − a ≤ (b0 − a0 ) + (b00 − a00 ).

(7.4.9)

(Suggestion: By symmetry, we may assume a0 ≤ a00 , and then we have two cases.) S (b) Now let a = min(a0 , . . . , ak ) and b = max(b0 , . . . , bk ). Prove that ki=0 Wi is precisely the open interval (a, b) and k X `(Wi ) ≥ b − a. (7.4.10) i=0

(Suggestion: Induction.)

7.5

The Lebesgue integral: Axioms

With some intuition about sets of measure zero in hand, we now state the axiomatic properties of Lebesgue integration that we require. Throughout, let X = [a, b] or R.

7.5. THE LEBESGUE INTEGRAL: AXIOMS

7.5.1

157

Axioms about integration

Our first axiom describes the class of functions we can consider. Lebesgue Axiom 1 (Measurable functions). There exists a function space M(X) on X, called the space of measurable functions on X, with the following properties. 1. (Riemann integrable implies measurable) If f : X → R is Riemann integrable on every closed and bounded subinterval of X, then f is measurable. In particular, if f is continuous, then f is measurable, and if X = [a, b], then any Riemann integrable function is measurable. 2. (Closed under operations) If f, g ∈ M(X), then f (x)g(x) and |f (x)| are measurable. Note that by definition of function space, the same holds for f (x) + g(x) and cf (x) (c ∈ C). 3. (Closed under limits) If f (x) = lim fn (x) and each fn ∈ M(X), then f is measurable. n→∞

4. (Nonnegative integral) If f ∈ M(X) and f is real-valued and nonnegative, then there exists a extended Znonnegative real number (i.e., either a nonnegative real number or the symbol +∞)

f , called the Lebesgue integral of f on X. In particular, for any Z measurable function f ∈ M(X), |f | is well-defined. X

X

5. (Monotonicity) Z Z For real-valued nonnegative f, g ∈ M(X), if f (x) ≤ g(x) for all x ∈ X, then f≤ g. X

X

Note that one of the main advantages of considering measurable functions, and the Lebesgue integral in general, is that property (3) of Axiom 1 does not hold for differentiable functions, continuous functions, or Riemann integrable functions. (See the five NO’s in Section 4.2.) In any case, we may now define the analogue of (Riemann) integrability for the Lebesgue integral. Definition 7.5.1. To say that Z f : X → C is Lebesgue integrable, or simply integrable, means that f ∈ M(X) and |f | is finite. We also define L1 (X) to be the set of all X

p Lebesgue integrable functions on X, Z or more generally, for p ≥ 1, we define L (X) to be the set of all f ∈ M(X) such that |f |p is finite. X

Our next main axiom is that for functions in L1 (X), the Lebesgue integral has many of the same properties as the Riemann integral. To be precise: Lebesgue Axiom 2 (Integral properties). The set L1 (X) (Definition 7.5.1) is a subspace of M(X) with the following properties.

158

CHAPTER 7. HILBERT SPACES

1. (Extends Riemann integral) If f : X → C is Riemann integrable on some [a, b] ⊆ X and f (x) = 0 for all x ∈ / [a, b], then f ∈ L1 (X) and b

Z

Z

f (x) dx,

f=

(7.5.1)

a

X

where the right-hand side is the Riemann integral. In particular, if X = [a, b], (7.5.1) holds for all Riemann integrable f . Z Z Z 1 g. f +b af + bg = a 2. (Linearity) If f, g ∈ L (X) and a, b ∈ C, X

X

X

1 3. (Additivity of domain) If X = [a, b], Y = [b, c], and Z Z = Z[a, c], and Z f ∈ L (X) and f ∈ L1 (Y ) when restricted to those domains, then f= f+ f. Z

4. (Conjugates and absolute value) If f ∈

L1 (X),

X

Z then

Z f=

X

X

Y

Z Z f and f ≤ |f |. X

X

5. (Monotonicity) For real-valued f, g ∈ L1 (X), if f (x) ≤ g(x) for all x ∈ X, then Z Z f≤ g. X

X

We also assume that the Lebesgue integral ignores values on any set of measure zero in the domain, or more precisely: Lebesgue Axiom 3 (Measure zero). For any f ∈ M(X), we have following properties of Z f , either in the complex sense or the nonnegative sense. X

1. (Up to measure zero) If f = g almost everywhere in X, Zthen g isZalso measurable; and if we also have that f ∈ L1 (X), then g ∈ L1 (X) and f = g. In other words, X X Z f is “only defined up to sets of measure zero”. X

2. (ZeroZ integral of nonnegative implies zero a.e.) If f is real-valued and nonnegative and f = 0, then f = 0 almost everywhere in X. X

We further assume that Lebesgue integral has the following convergence properties. Lebesgue Axiom 4 (Convergence properties). Let fn : X → C be a sequence in M(X), and let f : X → C be a function such that lim fn (x) = f (x) a.e. in X. n→∞

1. (Monotone convergence) If Zthe fn are Znonnegative real measurable and fn (x) ≤ fn+1 (x) for all x ∈ X, then f = lim fn . X

n→∞ X

7.5. THE LEBESGUE INTEGRAL: AXIOMS

159

2. (Dominated convergence) If there exists some Lebesgue integrable Z Z g : X → R such f = lim fn . (This is called that |fn (x)| ≤ g(x) for all n ∈ N and x ∈ X, then n→∞ X

X

“dominated convergence” because g(x) dominates each fn (x).) At this point, we pause in our axiomatic description of the Lebesgue integral to observe the following computationally useful consequence of our axioms so far: Lebesgue integrals on R can often be computed as improper Riemann integrals (Section 4.8). Theorem 7.5.2. Suppose f : R → C is locally integrable (Definition 4.8.1) and f ∈ L1 (R). Then the Lebesgue integral of f on R may be computed by an improper Riemann integral: Z Z ∞ f= f (x) dx, (7.5.2) −∞

R

in the sense of Definition 4.8.2. Again, as with Lebesgue Axiom 2, the main point is that the Lebesgue integral on R is often the same as the Riemann integral on R, in terms of practical computation; the advantage of the Lebesgue integral is its superior theoretical properties. Z Z Proof. Fix a ∈ R. Since |f | ≤ |f |, by nonnegativity and additivity of domain [a,+∞)

R

(Lebesgue Axioms 1 and 2), we see that f ∈ L1 ([a, +∞)). Let ( f (x) if a ≤ x ≤ n, fn (x) = 0 if x > n.

(7.5.3)

Since |fn (x)| converges pointwise to |f (x)|, and |fn (x)| ≤ |f (x)|, Lebesgue Axiom 4 implies that Z Z |f | = lim |fn | . (7.5.4) n→∞ [a,+∞)

[a,+∞)

Therefore, given > 0, there exists some N such that Z Z > (|f | − |fN |) = [a,+∞)

|f | .

(7.5.5)

[N,+∞)

For b > N , by Lebesgue Axiom 2, we then have Z Z Z b Z Z f− f (x) dx . f = > |f | ≥ |f | ≥ [b,+∞) [a,+∞) a [N,+∞) [b,+∞) Z It follows by the definition of limit that

Z f = lim

[a,+∞)

applying a similar argument with a → −∞.

b→+∞ a

(7.5.6)

b

f . The theorem follows by

160

CHAPTER 7. HILBERT SPACES

The reader may also find it helpful to see the following examples (Examples 7.5.3–7.5.5) of functions that are not Riemann integrable on X but, because of our axioms, must be in L1 (X). Example 7.5.3. Let X = [0, 1], and recall the following non-Riemann integrable function f : X → R from Problem 3.3.5: ( 1 if x is rational, f (x) = (7.5.7) 0 if x is irrational. Since any subset of the rationals has measure zero, f = 0 almost everywhere, so by Axiom 3, with Z respect to Lebesgue integration, f is essentially the same as 0. Therefore, f = 0.

by Axiom 2, X

√ Example 7.5.4. Let X = [0, 1], and consider f : X → R defined by f (x) = 1/ x. Note that f is not actually defined at x = 0, but since a single point has measure zero, that makes no difference. Since f is not bounded, it is not Riemann integrable, but as shown in Problem 7.5.1, by our axioms, the Lebesgue integral of f on X is Z Z 1 f = lim x−1/2 dx = 2. (7.5.8) X

a→0+

a

Example 7.5.5. Similarly, let X = [1, +∞], and consider f : X → R defined by f (x) = 1/x2 . Since X is not a finite interval, the Riemann integral of f on X is not defined, but as shown in Problem 7.5.2, by our axioms, the Lebesgue integral of f on X is Z Z b f = lim x−2 dx = 1. (7.5.9) X

b→+∞ 1

We hope Examples 7.5.4 and 7.5.5 give the reader an idea of the kinds of singularities allowed in Lebesgue integrable functions. For another result along the same lines, see Theorem 7.5.12 below. Remark 7.5.6. In conclusion, the reader seeing this material for the first time should take away two main ideas from this section: 1. If a function is is Riemann integrable on X, then its Lebesgue integral on X exists and has the same value. 2. The main advantage of Lebesgue integration that we have so far is that the monotone Z convergence and dominated convergence properties give conditions under which commutes with lim .

X

n→∞

Remark 7.5.7. Interestingly, using the ideas developed in this section, we can now state a precise description of the relationship between Riemann integrability and continuity:

7.5. THE LEBESGUE INTEGRAL: AXIOMS

161

Let f : [a, b] → C be bounded. Then f is Riemann integrable if and only if f is continuous almost everywhere on [a, b]. The proof relies heavily on the details of Lebesgue integration, so we refer the reader to Rudin [Rud76, Thm. 11.33].

7.5.2

Axioms about L2

Our primary motivation for introducting Lebesgue integration is to be able to use the function space L2 (X) (Definition 7.5.1). Note that for X = [a, b] or S 1 , L2 (X) includes all Riemann integrable functions (Lebesgue Axiom 2),Zand for X = R, L2 (X) includes all func∞

tions such that the improper (Riemann) integral

|f (x)|2 dx is finite (Theorem 7.5.2).

−∞

In other words, L2 (X) includes nearly every specific example of a function that we have discussed so far, including every function in the Schwartz space S(R) (Problem 7.5.3). The trick is that L2 (X) also includes other functions that fill in the “holes” in (for example) C 0 (X). Remark 7.5.8. We again remind the reader that functions in L2 (X) are only defined “up to values on a set of measure zero.” More precisely, when we discuss f ∈ L2 (X), we actually consider an equivalence class of functions that are equal a.e. The algebraically inclined reader may wish to think of L2 (X) as the quotient space of square-integrable functions mod the subspace of functions equal Z to 0 a.e.; the point is that by the “up to measure zero” axiom (Lebesgue Axiom 3), the quotient space.

f is still well-defined on an equivalence class, or element of X

Definition 7.5.9. We take this opportunity to introduce some notation for L2 spaces of two-variable functions: Specifically, to say that f (x, y) ∈ L2x (X) means that for fixed y0 , f (x, y0 ) is in L2 (X) as a function of x, and L2y (X) is defined analogously. The main feature that makes L2 (X) special is that L2 (X) has a natural structure as an inner product space. More precisely: Theorem 7.5.10. Let X = [a, b], S 1 , or R. Then L2 (X) is a function space, and Z hf, gi = f (x)g(x) (7.5.10) X

is an inner product on L2 (X). Proof. First, we must show that (7.5.10) is well-defined. For any f, g ∈ L2 (X), since (|f (x)| − |g(x)|)2 ≥ 0, we see that 1 1 2 2 f (x)g(x) = |f (x)g(x)| ≤ |f (x)| + |g(x)| . 2 2

(7.5.11)

162

CHAPTER 7. HILBERT SPACES

Therefore, if f, g ∈ L2 (X) and a ∈ C, then by linearity and monotonicity (Lebesgue Axiom 2) and the fact that |af (x)|2 = |a|2 |f (x)|2 ,

|f (x) + g(x)|2 ≤ |f (x)|2 + 2 |f (x)g(x)| + |g(x)|2 ,

(7.5.12)

we see that af ∈ L2 (X) and f + g ∈ L2 (X). It also follows that f (x)g(x) ∈ L1 (X), which means that (7.5.10) is well-defined on L2 (X). It remains only to verify the axioms of an inner product, and this works exactly as in Problem 7.1.3, with the exception of positiveZdefiniteness. There, instead of Lemma 3.4.9, |f |2 = 0, then f (x) = 0 a.e.; otherwise, the

we use Lebesgue Axiom 3 to conclude that if X

proof stays the same.

Now, since L2 (X) is an inner product space, it is also a normed space (Definition 7.2.1) under the inner product norm (Definition 7.1.2), and therefore, also a metric space. In these terms, we can now describe the two remaining Lebesgue Axioms, each of which is a key property of L2 (X) as a metric space. The first is: Lebesgue Axiom 5 (Completeness of L2 ). Let X = [a, b], S 1 , or R. Then L2 (X) is complete in the L2 metric. To review what Lebesgue Axiom 5 means, suppose that fn : X → C is a sequence of L2 functions that is Cauchy in the L2 (norm) metric, or to be precise, that for every > 0, there exists some N () such that if n, k ∈ Z, n, k > N (), then Z 1/2 2 kfn − fk k = |fn (x) − fk (x)| < . (7.5.13) X

In other words, suppose fn and fk are eventually always very close on average. Then Lebesgue Axiom 5 posits that there exists some f ∈ L2 (X) such that Z 1/2 2 lim kfn − f k = lim |fn (x) − f (x)| = 0. (7.5.14) n→∞

n→∞

X

The final metric property of L2 (X) that we will assume is: Lebesgue Axiom 6 (Continuous functions are dense in L2 ). If X = [a, b] or S 1 , then C 0 (X) is a dense (Definition 2.4.16) subset of L2 (X). In other words, for every f ∈ L2 (X) and every > 0, there exists some g ∈ C 0 (X) with kf − gk < . We next show that Lebesgue Axiom 6 implies the analogous fact for L2 (R) (Theorem 7.5.13) that is actually a consequence of . However, the precise statement of this result is complicated by the fact that continuous functions (such as the constant function 1) need not be in L2 (R). We therefore come to the following definition. Definition 7.5.11. The support of a function f : X → C is defined to be the set of all x ∈ X such that f (x) 6= 0. The space Cc0 (R) of continuous functions with compact supportis therefore defined to be the set of all f ∈ C 0 (R) such that there exists some closed and bounded interval [a, b] such that for any x ∈ / [a, b], f (x) = 0.

7.5. THE LEBESGUE INTEGRAL: AXIOMS

163

Before we can prove Theorem 7.5.13, we need the following result, which also sheds some more light on the nature of integrable functions on R. Theorem 7.5.12. Suppose f ∈ L2 (R), and for n ∈ N, let ( f (x) if −n ≤ x ≤ n, fn (x) = 0 otherwise.

(7.5.15)

Then fn converges to f both pointwise and in the L2 metric. We include the proof of Theorem 7.5.12 here as an example of how to use dominated convergence (Lebesgue Axiom 4). Proof. First, given x ∈ R and > 0, let N () = x. Then fn (x) = f (x) for n > N (), so fn converges to f pointwise. As for convergence in the L2 norm, since |fn (x) − f (x)|2 ≤ |f (x)|2 and |f (x)|2 is Lebesgue integrable, by dominated convergence, we see that Z Z Z 2 2 lim |fn (x) − f (x)| = lim |fn (x) − f (x)| = 0 = 0. (7.5.16) n→∞ R

R n→∞

R

The theorem follows. Note that if we think of fn , as given by (7.5.15), as a “horizontal truncation” of f , then one might imagine a “vertical” analogue of Theorem 7.5.12; see Problem 7.5.4 for a statement and proof. In any case, we now come to the analogue of Lebesgue Axiom 6 for L2 (R). Theorem 7.5.13. The space of continuous functions with compact support is a dense subspace of L2 (R). Again, comparing Definition 2.4.16, this means that for every f ∈ L2 (R) and every > 0, there exists some g ∈ Cc0 (R) with kf − gk < . Proof. Problem 7.5.5. We also have the following facts about L1 and L2 for later use. Theorem 7.5.14. For a closed interval X ⊆ R (including the possibility X = R), the set L1 (X) is a function space on X. Proof. Problem 7.5.6. Theorem 7.5.15. If f ∈ L2 ([a, b]), then f ∈ L1 ([a, b]). Moreover, if lim fn = f in n→∞

L2 ([a, b]), then Z lim

n→∞ a

Proof. Problem 7.5.7.

b

|fn (x) − f (x)| dx = 0.

(7.5.17)

164

CHAPTER 7. HILBERT SPACES

We conclude our introduction of the Lebesgue integral on the domain X by briefly recapitulating our axioms in paraphrase. • Lebesgue Axiom 1: The function space Z M(X) contains almost all examples encountered in practice. For any f ∈ M(X),

|f | is a well-defined nonnegative extended X

real number.

Z

f is well-defined on the space L1 (X) • Lebesgue Axiom 2: The Lebesgue integral X Z |f | < ∞. It extends the Riemann integral and has of all f ∈ M(X) such that similar formal properties.

X

Z • Lebesgue Axiom 3: The Lebesgue integral

f is unaffected by changing the values X

of f on a set of measure zero.

• Lebesgue Axiom 4: Unlike the Riemann integral, the Lebesgue integral satisfies the monotone and dominated convergence properties. • Lebesgue Axiom 5: The function space L2 (X) is an inner product space that is complete in the L2 metric. • Lebesgue Axiom 6: Continuous functions (or continuous functions with compact support, for X = R) are dense in L2 (X).

Problems 7.5.1. Let X = [0, 1], and let f, fn : X → R (n ∈ N) be defined by  0 if x < n1 , 1 f (x) = √ , fn (x) = 1 √ x if x ≥ n1 . x

(7.5.18)

(a) Prove that for all x ∈ [0, 1], x 6= 0, lim fn (x) = f (x). n→∞

(b) Use our axioms for theZLebesgue integral to prove that f is measurable, f Lebesgue integrable on X, and f = 2. (Suggestion: Monotone convergence. Note that X

dominated convergence is not helpful here because it requires comparing f to a known Lebesgue integrable function.) 7.5.2. Let X = [1, +∞], and let f, fn : X → R (n ∈ N) be defined by   1 if x ≤ n, 1 fn (x) = x2 f (x) = 2 , 0 x if x > n.

(7.5.19)

7.6. HILBERT SPACES

165

(a) Prove that for all x ∈ [1, +∞], lim fn (x) = f (x). n→∞

(b) Use our axioms for the Z Lebesgue integral to prove that f is measurable, f Lebesgue f = 1. (Suggestion: See Problem 7.5.1.) integrable on X, and X

7.5.3. Prove that Zif f ∈ S(R) (Section 4.7, then f ∈ L2 (R). (Suggestion: To prove that the ∞

improper integral

|f (x)|2 dx is finite, compare f (x) to a function of the form K |x|−1 .)

1

7.5.4. Let X = [a, b], S 1 , or R, suppose f ∈ L2 (R), and for n ∈ N, let ( f (x) if |f (x)| ≤ n, fn (x) = 0 otherwise.

(7.5.20)

Prove that fn converges to f both pointwise and in the L2 metric. (Suggestion: Imitate the proof of Theorem 7.5.12.) 7.5.5. (Proves Theorem 7.5.13 ) Suppose f ∈ L2 (R) and > 0. Prove that there exists some g ∈ Cc0 (R) with kf − gk < . (Suggestion: First truncate (Theorem 7.5.12), then approximate (Lebesgue Axiom 6), then “make sure the ends are continuous”.) 7.5.6. (Proves Theorem 7.5.14 ) Suppose X is a closed interval in R, f, g ∈ L1 (X), and c ∈ C. Prove that f + g ∈ L1 (X) and cf ∈ L1 (X). (Suggestion: Lebesgue Axiom 2.) 7.5.7. (Proves Theorem 7.5.15 ) Let kf k denote the L2 ([a, b]) norm. Z b √ (a) Prove that |f | dx ≤ b − a kf k. (Suggestion: Consider h|f |, 1i.) a

(b) Prove that if f ∈ L2 ([a, b]), then f ∈ L1 ([a, b]). (c) Prove that if lim fn = f in L2 ([a, b]), then n→∞

Z lim

n→∞ a

7.6

b

|fn (x) − f (x)| dx = 0.

(7.5.21)

Hilbert spaces

We can now finally define the main idea of this chapter. Definition 7.6.1. A Hilbert space is an inner product space that is complete in the inner product metric. As the reader may have guessed, the Hilbert spaces in which we are most interested are of the form L2 (X), where X = [a, b], S 1 , or R; these spaces are Hilbert spaces by Theorem 7.5.10 and Lebesgue Axiom 5. To give another immediate example: Theorem 7.6.2. For X = N or Z, the inner product space `2 (X) is a Hilbert space.

166

CHAPTER 7. HILBERT SPACES

We use Notation 7.2.19 throughout the proof of Theorem 7.6.2. By our summation conventions, we also see that it suffices to consider the case X = N, though we continue to use X to distinguish between x ∈ X and the indices k, n ∈ N of a sequence in `2 (X). Proof. Suppose an (x) is a Cauchy sequence in `2 (X), or in other words, suppose that for every 0 > 0, there exists some N0 (0 ) such that for k, n ∈ N, k, n > N0 (0 ), we have that kak − an k < 0 . First, we observe that for fixed x0 ∈ X, X |ak (x0 ) − an (x0 )|2 ≤ |ak (x0 ) − an (x0 )|2 = kak − an k2 , (7.6.1) x∈X

which implies that ak (x0 ) is a Cauchy sequence (in the variable k). By the completeness of C, we see that lim ak (x0 ) = a(x0 ) for some a(x0 ) ∈ C; in other words, there exists some k→∞

a : X → C such that lim ak (x) = a(x) pointwise. It remains to show that a ∈ `2 (X) and k→∞

that an converges to a in the inner product metric. So for > 0, let N () = N0 (/2). Fix M ∈ X = N. By the definition of Cauchy sequence, for k, n > N (), we have M X

2

|ak (x) − an (x)| ≤

x=1

∞ X

|ak (x) − an (x)|2 < 2 /4.

(7.6.2)

x=1

Taking lim on both sides of (7.6.2), and applying the limit laws for sequences in k and k→∞

Theorem 2.4.9, we see that M X

2

|a(x) − an (x)| = lim

k→∞

x=1

M X

|ak (x) − an (x)|2 ≤ 2 /4.

(7.6.3)

x=1

Therefore, since (7.6.3) holds for all M ∈ N, we see that !1/2 ∞ X ka − an k = |a(x) − an (x)|2 ≤ /2 <

(7.6.4)

x=1

for all n > N (). Therefore, if a ∈ `2 (X), then an converges to a in the inner product norm, so it remains only to show that kak is finite. Note that if we already knew that a ∈ `2 (X), it would follow that kak ≤ ka − an k + kan k, but to avoid circular logic, we need to be more careful. So fix some n > N (1). For fixed M ∈ X = N, applying the triangle inequality to the inner product norm on CM , we see that !1/2 !1/2 !1/2 M M M X X X 2 2 2 |a(x)| ≤ |a(x) − an (x)| + |an (x)| (7.6.5) x=1

x=1

x=1

≤ ka − an k + kan k < 1 + kan k . Then, since the right-hand side of (7.6.5) is finite by hypothesis and independent of M , we see that kak must be finite as well. The theorem follows.

7.6. HILBERT SPACES

167

Remark 7.6.3. While `2 (X) may not obviously relate to our main problem of the convergence of Fourier series, we will see in Theorem 7.6.7 that `2 (X) is actually a canonical Hilbert space in exactly the same way that Rn is a canonical finite-dimensional (real) vector space. See Remark 7.6.8 for a precise statement. Also, while the proof of Theorem 7.6.2 is somewhat tangential to our main story, we include it because it is relatively brief, but still includes many of the main ingredients in proving that any function space V is complete: Given a Cauchy sequence fn in V , we first find some f to which fn convergence pointwise, and then verify both that f is in V and also that fn converges to f in the relevant norm. We now return to our discussion of orthogonal bases from Section 7.3 with the notion of Hilbert space at our disposal. For notational convenience, in the rest of this section, we always take our indexing set for orthogonal sets to be N, but the analogous results hold for bases indexed by Z or {1, . . . , n}. In any case, the point is that the advantage of studying orthogonality on a Hilbert space H is that the completeness of H allows us to get more information out of the existence of an orthogonal basis, mainly via the following useful result. Theorem 7.6.4 (Hilbert Space Absolute Convergence Theorem). Let H be a Hilbert space, let {un | n ∈ N} be an orthogonal set of nonzero vectors in H, and let cn ∈ C be coefficients. Then the following are equivalent. 1. The infinite series

∞ X

cn un converges to some element of H.

n=1

2. The infinite series

∞ X

|cn |2 kun k2 converges in R.

n=1

We can think of this as saying that any infinite sum of orthogonal vectors converges in a Hilbert space if and only if it converges absolutely. (Compare Corollary 4.1.8 and Definition 4.1.9.) Proof. Problem 7.6.2. Absolute convergence in Hilbert spaces has several immediate applications. For example, we can show that a generalized Fourier series (Definition 7.3.7) of some f in a Hilbert space always converges, though not necessarily to f . Corollary 7.6.5. Let H be a Hilbert space, let B = {un | n ∈ N} be an orthogonal set of hf, un i nonzero vectors in H, and for f ∈ H, let fˆ(n) = be the nth generalized Fourier hun , un i ∞ X coefficient of f relative to B. Then fˆ(n)un converges to some g ∈ H. n=1

168

CHAPTER 7. HILBERT SPACES

Proof. By Bessel’s inequality (7.3.9),

N X ˆ 2 2 f (n) kun k ≤ kf k for all N ∈ N. It follows that n=1

N X ˆ 2 2 f (n) kun k is a monotone sequence (in the variable N ) that is bounded above by kf k, n=1

which means that

∞ X ˆ 2 2 f (n) kun k converges by the convergence of monotone sequences n=1

(Theorem 2.4.12). Then by Theorem 7.6.4,

∞ X

fˆ(n)un converges to some g ∈ H.

n=1

We also have the following corollary of, and analogue of, the comparison test for series (Corollary 4.1.7). Corollary 7.6.6 (Hilbert Space Comparision Test). Let H be a Hilbert space, and let B = {un | n ∈ N} be an orthogonal set of nonzero vectors in H. If bn , cn ∈ C, |bn | ≤ |cn | for all n ∈ N, then

∞ X

(7.6.6) ∞ X

cn un converges in H, and

n=1

bn un also converges in H.

n=1

Proof. Problem 7.6.3. For another application of Theorem 7.6.4, see Problem 7.6.4. The full power of the existence of an orthogonal basis can now be demonstrated in the following result, which we call the Isomorphism Theorem for Fourier Series. Theorem 7.6.7 (Isomorphism Theorem for Fourier Series). Let H be a Hilbert space, and let B = {un | i ∈ N} ⊂ V be an orthogonal set of nonzero vectors. Then the following are equivalent. 1. B is an orthogonal basis for V . 2. (Parseval identity) For any f, g ∈ H, hf, gi =

∞ X

fˆ(n)ˆ g (n) hun , un i .

(7.6.7)

n=1

3. (Parseval identity) For any f ∈ H, kf k2 =

∞ X ˆ 2 f (n) hun , un i . n=1

(7.6.8)

7.6. HILBERT SPACES

169

4. For any f ∈ H, if hf, un i = 0 for all n ∈ N, then f = 0. Proof. (1) implies (2): Problem 7.6.5. (2) implies (3): Take g = f . (3) implies (4): If hf, un i = 0 for all n ∈ N, then ∞ ∞ X X ˆ 2 kf k = 0(hun , un i) = 0. f (n) hun , un i = 2

n=1

(7.6.9)

n=1

By the axioms of an inner product, f = 0. (4) implies (1): Problem 7.6.6. Remark 7.6.8. If we can find an orthogonal basis for a Hilbert space H, Theorem 7.6.7 tells us that not only does the Fourier series of any f ∈ H converge to f in the inner product metric, but also, we can compute inner products and norms purely in terms of Fourier coefficients ((7.6.7) and (7.6.8)). Put another way, if {un | n ∈ N} is an orthonormal basis for H, then the mapping f (x) 7→ fˆ(n) is a linear transformation from H to `2 (N) that preserves inner products and norms ((7.6.7) and (7.6.8)), which we might naturally call an isomorphism of Hilbert spaces. As promised in Remark 7.6.3, this is exactly the sense in which `2 (N) is the canonical Hilbert space with a countably infinite orthogonal basis. Remark 7.6.9. The reader may also have noticed that we have yet to demonstrate an example of an orthogonal basis for any L2 (X) arising “in nature”! This is where abstract nonsense reaches its limit and hard work comes in, and so proving that {en | n ∈ Z} is an orthonormal basis for L2 (S 1 ) becomes the main topic of the next chapter.

Problems 7.6.1. Let V be an inner product space and let {un | n ∈ N} be an orthogonal set of nonzero vectors in V . (a) Prove that for cn ∈ C, if

∞ X

cn un = f ∈ V , then the generalized Fourier coefficient

n=1

fˆ(k) = ck . (b) Prove that for bn , cn ∈ C, if

∞ X

bn un =

n=1

∞ X

cn un , then bn = cn for all n ∈ N.

n=1

7.6.2. (Proves Theorem 7.6.4 ) Let H be a Hilbert space, let {un | n ∈ N} be an orthogonal set of nonzero vectors in H, and let cn ∈ C be coefficients. (a) Prove that if

∞ X n=1

cn un = f ∈ H, then

∞ X

|cn |2 kun k2 converges. (Suggestion: Bessel’s

n=1

inequality and monotone convergence for sequences/series, along with Problem 7.6.1.)

170

CHAPTER 7. HILBERT SPACES ∞ X

(b) Prove that if

2

2

|cn | kun k converges, then

n=1

∞ X

cn un converges to some f ∈ H. (Sug-

n=1

gestion: Imitate the proof of the Comparison Test (Corollary 4.1.7).) 7.6.3. (Proves Corollary 7.6.6 ) Let H be a Hilbert space, and let B = {un | n ∈ N} be an ∞ X orthogonal set of nonzero vectors in H. Prove that if bn , cn ∈ C, cn un converges in H, and |bn | ≤ |cn | for all n ∈ N, then

∞ X

n=1

bn un also converges in H. (Suggestion: Absolute

n=1

convergence.) 7.6.4. Let H be a Hilbert space, and let B = {un | n ∈ N} be an orthonormal set of vectors ∞ X K for all n ∈ N, then cn un in H. Prove that if there exists some k ∈ R such that |cn | ≤ n n=1 converges in H. (Suggestion: Absolute convergence.) 7.6.5. (Proves Theorem 7.6.7 ) Let V be an inner product space, and let B = {un | i ∈ N} ⊂ V be an orthogonal basis for V , which, we recall, means that for any f ∈ V , ∞ X

fˆ(n)un = lim

N →∞

n=1

N X

fˆ(n)un = f

(7.6.10)

n=1

in the inner product metric. (a) Prove that if f, g ∈ V and N, K ∈ N, then *

N X

fˆ(n)un ,

n=1

K X

min(N,K)

+ gˆ(k)uk

k=1

=

X

g (n) hun , un i . fˆ(n)ˆ

(7.6.11)

n=1

(b) Prove Parseval’s identity (7.6.7). (Suggestion: Use the continuity of the inner product.) 7.6.6. (Proves Theorem 7.6.7 ) Let H be a Hilbert space, let {un | n ∈ N} be an orthogonal set of nonzero vectors in H, and suppose we know that if hf, un i = 0 for all n ∈ N, then ∞ X f = 0. Prove that fˆ(n)un = f . (Suggestion: The left-hand side must converge to some n=1

g ∈ H (why?). What is hf − g, uk i?)

Chapter 8

Convergence of Fourier series . . . [I]n Analytic Theory of Heat Fourier supplies the first modern definition of convergence, as well as introduces the vital idea of convergence in an interval. But. . . Fourier fails to give a rigorous proof, or even to spell out convergencecriteria that would make such a proof possible. So the thing is now we’re talking about convergence. — David Foster Wallace, Everything and More In this chapter, we complete the task we set for ourselves in Part II of this book, namely, to establish natural conditions under which the Fourier series of f : S 1 → C converges to f . After a brief review of our story so far (Section 8.1), we introduce the ideas we will need to prove our main theorems (Theorems 8.1.1 and 8.1.2): Convolutions (Section 8.2) and Dirac kernels (Section 8.3). We then prove the main results of the chapter (Section 8.4) and discuss some consequences (Section 8.5).

8.1

Fourier series in L2 (S 1 )

In this section we review what happens when we apply Sections 7.3 and 7.6 in the case of the Hilbert space L2 (S 1 ). Nothing new will be presented, but it may help the reader to have it all in one place. Let B be the orthonormal set {en | n ∈ Z} in L2 (S 1 ), where en (x) = e2πinx . For f ∈ L2 (S 1 ) and n ∈ Z, since en ∈ L2 (S 1 ), we define fˆ(n) = hf, en i =

Z

1

f (x) en (x) dx.

(8.1.1)

0

We define the N th Fourier polynomial of f to be the projection of f onto {e−N , . . . , eN }, or N X fN (x) = fˆ(n)en (x), (8.1.2) n=−N

171

172

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

and we define the Fourier series of f to be f (x) ∼ lim fN (x) = N →∞

X

fˆ(n)en (x).

(8.1.3)

n∈Z

As before (Definition 6.2.1), ∼ indicates merely that what is on the right-hand side is the Fourier series of f , and so far as we have proven until now, need not have any implications in terms of convergence. We have proven a few things about Fourier polynomials and series, however. • Comparing (8.1.1), (8.1.2), and the Best Approximation Theorem 7.3.10, we see that for any f ∈ L2 (S 1 ), the Fourier polynomial of f of degree N is the trigonometric polynomial of degree N that is closest to f in the L2 metric. Furthermore, we always have kfN k ≤ kf k (Bessel’s inequality). • Suppose we can prove that B = {en | n ∈ Z} is an orthonormal basis for L2 (S 1 ), or in other words, that for any f ∈ L2 (S 1 ), the Fourier series of f converges to f in the L2 metric. If we can manage to do that, we will know that all of the equivalent conditions of the Isomorphism Theorem 7.6.7 hold; for example, we will know that hf, gi can be computed by taking the “infinite series dot product” (7.6.7) of fˆ(n) and gˆ(n). Therefore, the principal remaining problem in establishing a satisfactory theory of Fourier series in L2 is to prove the following theorem. Theorem 8.1.1 (Inversion Theorem for Fourier Series). If f ∈ L2 (S 1 ), then the Fourier series of f converges to f in the L2 metric. The Inversion Theorem is so named because we can also think of it as indicating that any f ∈ L2 (S 1 ) can be recovered from its Fourier coefficients fˆ(n), or in other words, the transformation f (x) 7→ fˆ(n) has a well-defined inverse. Now, a sequence of functions may converge in L2 without converging pointwise anywhere (see Problem 7.2.1). Nevertheless, we can use convergence in the L2 metric to obtain results about pointwise, or even uniform, convergence for sufficiently smooth functions. For example, recall that in Theorem 6.4.4, we reached the frustrating conclusion that if f ∈ C 2 (S 1 ), then the Fourier series of f converges uniformly to some continuous function g with the same Fourier coefficients as f . However, thanks to Theorem 8.1.1 and what we call the Extra Derivative Lemma (Lemma 8.4.7), we will show that: Theorem 8.1.2. If f ∈ C 1 (S 1 ), then the Fourier series of f converges absolutely and uniformly to f . In fact, we will even show that if f ∈ C 0 (S 1 ), then the Fourier series of f converges uniformly to f , if we sum it in a different way; see Corollary 8.4.4.

8.2. CONVOLUTIONS

8.2

173

Convolutions

We now switch back, for the moment, from the fancy-pants world of L2 (S 1 ) to the world of continuous functions and ordinary calculus. We begin by observing that integrals on S 1 are translation invariant, or to be precise: Lemma 8.2.1. For f ∈ C 0 (S 1 ) and a ∈ R, we have that Z 1 Z 1 f (x) dx. f (x + a) dx =

(8.2.1)

0

0

Proof. Problem 8.2.1. To prove our main results (Theorems 8.1.1 and 8.1.2), we will use two new ideas. The first of these ideas, convolution, turns out to be quite useful in Fourier analysis and other situations. While convolutions can be defined on many different domains, we will only use two cases, one of which is the following. Definition 8.2.2. For f, g ∈ C 0 (S 1 ), the convolution f ∗ g : S 1 → C is defined by the formula Z 1 (f ∗ g)(x) = f (x − t)g(t) dt. (8.2.2) 0

Remark 8.2.3. Definition 8.2.2 actually works for f, g ∈ L2 (S 1 ), or even (much less obviously) f, g ∈ L1 (S 1 ), but for simplicity, we stick to the case of continuous f and g. The reader should also check that the integrand f (x−t)g(t) is periodic of period 1 in the variable t, making it a genuine function on S 1 . Theorem 8.2.4. Convolution has the following properties: 1. For f, g ∈ C 0 (S 1 ), (f ∗ g)(x) = (g ∗ f )(x). 2. For f, g, h ∈ C 0 (S 1 ), ((f ∗ g) ∗ h)(x) = (f ∗ (g ∗ h))(x). 3. For f ∈ C 1 (S 1 ) and g ∈ C 0 (S 1 ), we have

d ((f ∗ g)(x)) = dx

df ∗ g (x). dx

4. For f, g ∈ C 0 (S 1 ), f[ ∗ g(n) = fˆ(n)ˆ g (n). Proof. Problems 8.2.2–8.2.5. Remark 8.2.5. Property (4) of Theorem 8.2.4 is perhaps the signature property of convolution, in that it states that convolution of functions corresponds with multiplication of the Fourier coefficients. For the reader interested in acoustics (Section 1.2) or signal processing (Section 14.2), this means that if f, g ∈ C 0 (S 1 ) represent tones (idealized periodic sound waves), then f ∗ g is the tone f enhanced in the strong harmonics of g and diminished in the weak harmonics of g. Somewhat surprisingly, by Theorem 8.2.4, the opposite is true as well: f ∗ g = g ∗ f is g enhanced in the strong harmonics of f and diminished in the weak harmonics of f . See Sections 12.2, 13.3, 14.2, and 14.6 for other interpretations of convolution and further related discussion.

174

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

Problems Z 1 f (x + a) dx = 8.2.1. (Proves Lemma 8.2.1 ) For f ∈ C 0 (S 1 ) and a ∈ R, prove that 0 Z 1 f (x) dx. (Suggestion: Substitution and the periodicity of f . Be careful the with limits 0

of integration!) 8.2.2. (Proves Theorem 8.2.4 ) For f, g ∈ C 0 (S 1 ), prove that (f ∗ g)(x) = (g ∗ f )(x). (Suggestion: Substitution and translation invariance.) 8.2.3. (Proves Theorem 8.2.4 ) For f, g, h ∈ C 0 (S 1 ), prove that ((f ∗ g) ∗ h)(x) = (f ∗ (g ∗ h))(x).

(8.2.3)

(Suggestion: Substitution, translation invariance, Fubini’s Theorem (Theorem 3.6.21).) 8.2.4. (Proves Theorem 8.2.4 ) For f ∈ C 1 (S 1 ) and g ∈ C 0 (S 1 ), prove that d df ((f ∗ g)(x)) = ∗ g (x). dx dx

(8.2.4)

(Suggestion: Theorem 3.6.23.) 8.2.5. (Proves Theorem 8.2.4 ) For f, g ∈ C 0 (S 1 ), f[ ∗ g(n) = fˆ(n)ˆ g (n). (Suggestion: Fubini, translation invariance.)

8.3

Dirac kernels

The other main new idea we will use to prove our convergence theorems is motivated by the following wishful thinking. ENTERING THE LAND OF WISHFUL THINKING Suppose we have a function δ(x), called the Dirac delta function, with the property that for f ∈ C 0 (S 1 ), we have Z 1/2 δ(x)f (x) dx = f (0). (8.3.1) −1/2

If we could imagine the graph of δ, it would be at 0 for all nonzero values of x, with a “spike to infinity” at x = 0, as shown in Figure 8.3.1. Now, you may object to this spike to infinity; moreover, you may object that since an integral (in either the Riemann or Lebesgue sense) is not affected by changing the value of the integrand at one value of x, no such function δ could exist. To which we reply: Give us a break, we’re in the land of wishful thinking! Continuing our wishful thinking, given f ∈ C 0 (S 1 ), we now compute the convolution f ∗ δ: Z 1/2 (f ∗ δ)(x) = f (x − t)δ(t) dt = f (x − 0) = f (x). (8.3.2) −1/2

8.3. DIRAC KERNELS

175

x 0 Figure 8.3.1: Graph of the Dirac delta “function” In other words, if we think of convolution as a sort of multiplication, then δ(x) is the identity element with respect to convolution. Suppose that now we have a sequence of functions fn , and we want to prove that lim fn = f . If we just so happen to come across a sequence of functions Kn such that n→∞ f ∗ Kn = fn and lim Kn = δ, then it follows that n→∞

lim fn = lim (f ∗ Kn ) = f ∗ ( lim Kn ) = f ∗ δ = f.

n→∞

n→∞

n→∞

(8.3.3)

(The second equality is explained by the fact that in the land of wishful thinking, limits commute with every operation.) EXITING THE LAND OF WISHFUL THINKING While the above discussion is truly a matter of wishful thinking, remarkbly, the following definition (Definition 8.3.1) actually gives a way to make that discussion rigorous. Definition 8.3.1. A Dirac kernel on S 1 is a sequence of continuous functions Kn : − 21 , 12 → R such that 1. For all n and all x ∈ − 12 , 21 , Kn (x) ≥ 0. Z 1/2 2. For all n, Kn (x) dx = 1; and −1/2

3. For any fixed δ > 0, we have Z lim

n→∞ δ≤|x|≤ 1 2

Kn (x) dx = 0.

(8.3.4)

In other words, for any δ > 0 and > 0, there exists some N () such that for n > N (), we have Z δ

1−<

Kn (x) dx ≤ 1.

(8.3.5)

−δ

Again remarkably, we have the following useful examples. (Only the second is a Dirac kernel, but the first is close to being one, as we shall see in Section 8.5.6.)

176

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

Example 8.3.2. The Dirichlet kernel {DN } is DN (x) =

N X

en (x).

(8.3.6)

n=−N

Example 8.3.3. The Fej´er kernel {FN } is FN (x) =

D0 (x) + · · · + DN −1 (x) . N

Figure 8.3.2: Dirichlet kernel, N = 6, 21, 36

Figure 8.3.3: Fej´er kernel, N = 6, 21, 36

(8.3.7)

8.3. DIRAC KERNELS

177

The Dirichlet and Fej´er kernels (Figures 8.3.2 and 8.3.3, respectively) are useful to us because of two remarkable facts. First, convolution with each of those sequences gives a partial sum of a Fourier series, in the following sense: Theorem 8.3.4. For f ∈ C 0 (S 1 ), we have that f ∗ DN = fN , the N th Fourier polynomial of f , and f0 (x) + · · · + fN −1 (x) (f ∗ FN )(x) = , (8.3.8) N the average of the Fourier polynomials f0 , . . . , fN −1 . Proof. Problem 8.3.1. Definition 8.3.5. The sum on the right-hand side of (8.3.8) is called the N th Ces` aro sum of the Fourier series of f . Another remarkable fact is that the (finite) series defining DN and FN can be summed usefully: Lemma 8.3.6. For x ∈ S 1 , we have that   sin((2n + 1)πx) sin(πx) Dn (x) =  2n + 1  2   1 sin (N πx) if FN (x) = N sin2 (πx)  N if

if x 6= 0,

(8.3.9)

if x = 0, x 6= 0,

(8.3.10)

x = 0.

Proof. Let q = eπix . By the formula for a finite geometric series,

Dn (x) =

n X

q 2k

k=−n

However, since

 −2n − q 2n+2 q = 1 − q2  2n + 1

if x 6= 0,

(8.3.11)

if x = 0.

q −2n − q 2n+2 q 2n+1 − q −2n−1 = and q k − q −k = 2i sin(kπx), we obtain 1 − q2 q − q −1

(8.3.9). Next, averaging (8.3.11) as n goes from 0 to N − 1, we first see that FN (0) =

1 N (1 + 2N − 1) · = N. N 2

(8.3.12)

178

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

For x 6= 0, averaging (8.3.11) and applying geometric series again, we get ! NX −1 N −1 X 1 1 q −2n − q 2n+2 FN (x) = N 1 − q2 n=0 n=0 −2N 1 1 1−q q 2 − q 2N +2 = − N 1 − q2 1 − q −2 1 − q2 1 1 1 − q −2N 1 − q 2N = + N 1 − q2 1 − q −2 1 − q −2 −2N + 2 − q 2N 1 −q . = N −q 2 + 2 − q −2

(8.3.13)

However, since sin2 (N πx) (q N − q −N )2 q 2N − 2 + q −2N = = , (q − q −1 )2 q 2 − 2 + q −2 sin2 (πx)

(8.3.14)

(8.3.10) follows. As promised, we now prove that we have at least one example of our theory. Theorem 8.3.7. The Fej´er kernel FN is a Dirac kernel. Proof. First, the fact that FN (x) ≥ 0 follows from (8.3.10). Problem 8.3.2 shows that for Z 1/2 all N , FN (x) dx = 1. Finally, Problem 8.3.3 shows that for any fixed δ > 0, −1/2

Z lim

N →∞ δ≤|x|≤ 1 2

FN (x) dx = 0.

(8.3.15)

The theorem follows. Remark 8.3.8. The reader should be aware that by an unfortunate linguistic coincidence, kernels in analysis share their name with an unrelated idea in linear algebra and abstract algebra. Alas, it is too late to change the names in either subject now.

Problems 8.3.1. (Proves Theorem 8.3.4 ) Suppose f ∈ C 0 (S 1 ). (a) Prove that DN ∗ f = fN . f0 (x) + · · · + fN −1 (x) (b) Prove that FN ∗ f = . N 8.3.2. (Proves Theorem 8.3.7 ) Z 1/2 (a) Prove that for all n, Dn (x) dx = 1. (Suggestion: Compute hDn , 1i.) −1/2

8.4. CONVERGENCE OF FOURIER SERIES Z

179

1/2

(b) Prove that for all N ,

FN (x) dx = 1. (Suggestion: (8.3.7).) −1/2

8.3.3. (Proves Theorem 8.3.7 ) Fix some δ such that 0 < δ < 1/2. (a) Prove that for δ ≤ |x| ≤ 21 , FN (x) ≤

1 1 . 2 N sin (πδ)

(8.3.16)

(Suggestion: Use well-known properties of sin.) Z FN (x) dx = 0. (Suggestion: Use (8.3.16) and then compare (b) Prove that lim with an

8.4

N →∞ δ≤|x|≤ 1 2 integral on [− 12 , 21 ].)

Convergence of Fourier series

The main results of this chapter now stem from the following general result, which makes the wishful thinking of Section 8.3 come true. Theorem 8.4.1. If {KN } is a Dirac kernel, and f ∈ C 0 (S 1 ), then lim (f ∗ KN )(x) = f (x),

N →∞

(8.4.1)

where convergence is uniform on S 1 . The proof of Theorem 8.4.1 is somewhat intricate, so we break it down into several lemmas, always assuming that {KN } is a Dirac kernel and f : S 1 → C is continuous. The first lemma states that by keeping t near 0, we can force the integral of |f (x − t) − f (x)| |Kn (t)| with respect to t to be as small as we like, independent of x and n. Lemma 8.4.2. For any 1 > 0, there exists some δ1 (1 ) < 1/2 such that for any δ < δ1 (1 ), any x ∈ S 1 , and any n ∈ N, Z δ |f (x − t) − f (x)| |Kn (t)| dt < 1 . −δ

Proof. Problem 8.4.1. The second lemma states that by keeping t away from 0 and letting n → +∞, we can also force the integral of |f (x − t) − f (x)| |Kn (t)| to be as small as we like, independent of x. Lemma 8.4.3. For any fixed δ > 0 and 2 > 0, there exists some N2 (δ, 2 ) such that for n > N2 (δ, 2 ) and any x ∈ S 1 , we have Z |f (x − t) − f (x)| |Kn (t)| dt < 2 . (8.4.2) δ≤|t|≤ 12

180

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

Proof. Problem 8.4.2. Combining Lemmas 8.4.2 and 8.4.3 carefully, we get: Proof of Theorem 8.4.1. Problem 8.4.3. For the case where KN is the Fej´er kernel FN , combining Theorem 8.4.1 with Theorem 8.3.4, we have the following result. Corollary 8.4.4. For f ∈ C 0 (S 1 ), let SN (x) be the N th Ces` aro sum of the Fourier series of f . Then SN converges uniformly to f on S 1 .

Figure 8.4.1: Standard and C´esaro sums, N = 5, 15, 30 Remark 8.4.5. It is interesting to compare the virtues and vices of Ces`aro summation versus ordinary summation of Fourier series. The differences can perhaps be seen more clearly in a discontinuous example, such as Figure 8.4.1, where we see that the Ces`aro sums of the Fourier series of f converge more slowly to f , overall, but have less dramatic errors, whereas the usual Fourier sums converge more quickly to f , on average, but for a small number of values of x, have more dramatic errors. Indeed, this is not just some tactical error, but actually a natural consequence of trying to obtain the best approximation “on average”; see Davidson and Donsig [DD10, 14.4] for a discussion. We are now finally ready to prove the Inversion Theorem 8.1.1. We have done enough preparation to make the proof relatively short, though it still involves some subtle points. (The skeptical reader who slogged patiently through the abstract nonsense of Chapter 7 may regard this proof as their reward.) Proof of the Inversion Theorem for Fourier Series 8.1.1. Suppose f ∈ L2 (S 1 ) and > 0. By Lebesgue Axiom 6, there exists some g ∈ C 0 (S 1 ) such that kf − gk < . By Corol2 lary 8.4.4 and Theorem 4.3.8, since the Ces`aro sums sN of the Fourier series of g converge

8.4. CONVERGENCE OF FOURIER SERIES

181

uniformly to g, there exists some Ces`aro sum sM such that kg − sM k < . Applying the 2 triangle inequality as usual, it follows that there exists some trigonometric polynomial sM of degree M = M () such that kf − sM k ≤ kf − gk + kg − sM k < .

(8.4.3)

However, by the Best Approximation Theorem 7.3.10, the Fourier polynomial of degree M is the best approximation of f among trigonometric polynomials of degree M , which means that kf − fM k ≤ kf − sM k < . Therefore, since Fourier polynomials always give better approximations as N → ∞ (the Always Better Theorem 7.3.11), for N > M (), we have that kf − fN k ≤ kf − fM k < . The theorem follows. Corollary 8.4.6. For f, g ∈ L2 (S 1 ), the following are equivalent: • fˆ(n) = gˆ(n) for all n ∈ Z. • f = g almost everywhere on S 1 . Proof. Problem 8.4.4. Interestingly, we can now use Corollary 8.4.6, a “soft” convergence result, to complete and improve upon the “hard” convergence results we first attempted to prove back in Section 6.4. We begin with a technical result that will be useful to us several times. Lemma 8.4.7 (Extra Derivative Lemma). If g ∈ L2 (S 1 ), then the two-sided series X 1 gˆ(n) (8.4.4) 2πn n∈Z n6=0

converges absolutely (as a series of complex numbers). The name of Lemma 8.4.7 comes from the fact that, as we shall see shortly, we can sometimes use it to show that having an extra degree of differentiability allows us to obtain uniform convergence of a Fourier series. For example, Theorem 8.1.2 says that the Fourier series of some f ∈ C 1 (S 1 ) converges uniformly to f , though the same need not hold for f 0 . Proof. Problem 8.4.5. We now prove our principal “hard” convergence result (Theorem 8.1.2). (See Section 8.5.6 for a discussion of other pointwise convergence results, especially Theorem 8.5.17.) Proof of Theorem 8.1.2. Suppose f ∈ C 1 (S 1 ). We claim: Claim: The Fourier series of f converges absolutely and uniformly to some g ∈ C 0 (S 1 ).

182

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

The proof is in Problem 8.4.6. Problem 6.4.5 then shows that for all n ∈ Z, gˆ(n) = fˆ(n). However, Corollary 8.4.6 then implies that f = g a.e., and since both f and g are continuous, Corollary 7.4.9 implies that f = g everywhere. Theorem 8.1.2 also implies the following term-by-term differentiation property for Fourier series. Corollary 8.4.8. For k ≥ 1, if f ∈ C k (S 1 ), then for 0 ≤ j ≤ k − 1, the Fourier series of f (j) , X f (j) (x) = (2πin)j fˆ(n)en (x), (8.4.5) n∈Z

converges absolutely and uniformly to f (j) . In particular, for f ∈ C ∞ (S 1 ), we may take arbitrary term-by-term derivatives of the Fourier series of f . In other words, if f has a continuous kth derivative, then we may take up to k − 1 term-by-term derivatives of the Fourier series of f , as per (8.4.5). Proof of Corollary 8.4.8. Because f (j) ∈ C 1 (S 1 ), by Theorem 8.1.2, the Fourier series of f (j) converges absolutely and uniformly to f (j) . Applying Theorem 6.4.1 j times, we get (8.4.5). Note that because there exist f ∈ C 0 (S 1 ) such the Fourier series of f diverges at uncountably many points (see Section 8.5.6), Corollary 8.4.8 is, in some sense, an optimal result for term-by-term differentiation, as we cannot expect to recover the 0th derivative of a C 0 function. (Though again, see Theorem 8.5.17 for one class of continuous functions where we can guarantee pointwise convergence.) Remark 8.4.9. As we saw previously in Theorem 6.4.2, the smoother (more differentiable) a function f ∈ L2 (S 1 ) is, the faster its coefficients must decay, or converge to 0 as n → ∞. Since Theorem 8.1.2 and Corollary 8.4.8 ultimately depend on facts about the decay rate of Fourier coefficients of a C k function, they can also be seen as a sort of converse to Theorem 6.4.2 and Corollary 6.4.3. For converses to Theorem 6.4.2 and Corollary 6.4.3 that use the decay rate more explictly, see Section 8.5.1.

Problems For problems 8.4.1–8.4.3, assume that {KN } is a Dirac kernel (Definition 8.3.1) and f : S 1 → C is continuous. 8.4.1. (Proves Lemma 8.4.2 ) Prove that for any 1 > 0, there exists some δ1 (1 ) < 1/2 such that for any δ < δ1 (1 ), any x ∈ S 1 , and any n ∈ N, Z δ |f (x − t) − f (x)| |Kn (t)| dt < 1 . (8.4.6) −δ

(Suggestion: Use the fact that f is uniformly continuous on S 1 (why?).)

8.5. APPLICATIONS OF FOURIER SERIES

183

8.4.2. (Proves Lemma 8.4.3 ) Prove that for any fixed δ > 0 and 2 > 0, there exists some N2 (δ, 2 ) such that for n > N2 (δ, 2 ) and any x ∈ S 1 , we have Z δ≤|t|≤ 12

|f (x − t) − f (x)| |Kn (t)| dt < 2 .

(8.4.7)

(Suggestion: Note that the region of integration is [− 21 , −δ] ∪ [δ, 12 ]. Use the fact that f is bounded on S 1 (why?).) 8.4.3. (Proves Theorem 8.4.1 ) Prove that for > 0, there exists some N (f, ), not depending on x ∈ S 1 , such that for all x ∈ S 1 and all n ∈ Z, if n > N (f, ), then |(f ∗ Kn )(x) − f (x)| < . In other words, prove that f ∗ Kn converges uniformly to f on S 1 . Z 1 2 (Suggestion: Use the fact that f (x) = f (x)Kn (t) dt (why?) and compare (f ∗ Kn )(x).) − 12

8.4.4. (Proves Corollary 8.4.6 ) Suppose f, g ∈ L2 (S 1 ) and fˆ(n) = gˆ(n) for all n ∈ Z. Prove that f = g a.e. (Suggestion: Use Theorem 7.6.7, and keep in mind the definition of “equals” in L2 (S 1 ).) 8.4.5. (Proves Lemma 8.4.7 ) Suppose g ∈ L2 (S 1 ). Let the double-sided sequence an be defined by   1 if n 6= 0, (8.4.8) an = |2πn|  0 if n = 0. (a) Prove that an ∈ `2 (Z). (Suggestion: Keep in mind that an is two-sided.) X 1 g (n)| converges. (Suggestion: use the inner (b) Prove that the two-sided series 2πn |ˆ n∈Z n6=0

product in

`2 (Z).)

8.4.6. (Proves Theorem 8.1.2 ) Suppose f ∈ C 1 (S 1 ). Prove that the Fourier series of f converges absolutely and uniformly to some g ∈ C 0 (S 1 ). (Suggestion: Theorem 6.4.1 and the Extra Derivative Lemma 8.4.7.)

8.5

Applications of Fourier series

Before we get to our main application in Part III, we conclude this chapter by considering several miscellaneous applications and other results about Fourier series. Note that of these results, Section 8.5.1 is used in Chapter 11, Theorem 8.5.6 is used in Chapter 12, and certainly on a first reading, it will be enough just to quote those results. All other material is optional for what follows.

184

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

8.5.1

Decay of Fourier coefficients and smoothness

Recall that Theorem 6.4.2 and Corollary 6.4.3 state that the more differentiable a function is, the faster its Fourier coefficients must decay as n → ∞. As a sort of converse, Theorem 8.1.2 and Corollary 8.4.8 tell us that we can take (k − 1) term-by-term derivatives of the Fourier series of a C k function, with absolute and uniform convergence. However, the term-by-term results in Theorem 8.1.2 and Corollary 8.4.8 rely on knowing somewhat more information about the coefficients of a Fourier series than their decay rate. The following is a similar result that assumes only knowledge of the decay rate. Theorem 8.5.1. For j ≥ 1, let f ∈ L2 (S 1 ) have the property that for some constants K K ˆ and p > j + 1 and for all n 6= 0, f (n) ≤ . Then the Fourier series of f converges |n|p absolutely and uniformly to some g ∈ C j (S 1 ) such that g (j) (x) =

X

(2πin)j fˆ(n)en (x)

(8.5.1)

n∈Z

and f = g a.e. Proof. Problem 8.5.1. Applying Theorem 8.5.1 for all j ≥ 1, we obtain the following. Corollary 8.5.2. Let f ∈ L2 (S 1 ) have the property that for all p > 2, there exists some Kp constant Kp such that for all n 6= 0, fˆ(n) ≤ . Then the Fourier series of f converges |n|p uniformly to some g ∈ C ∞ (S 1 ) such that f = g a.e. Proof. Problem 8.5.2.

8.5.2

Approximation by smooth functions

Since all trigonometric polynomials are infinitely differentiable, Theorem 8.1.1 immediately gives the following corollary. Corollary 8.5.3. The space C ∞ (S 1 ) is dense in L2 (S 1 ). (In fact, the space of trigonometric polynomials is dense in L2 (S 1 ).) Similary, Corollary 8.4.4 shows that trigonometric polynomials are dense in C 0 (S 1 ) with respect to the L∞ metric. We may also use Corollary 8.4.4 to show: Theorem 8.5.4 (Weierstrass Approximation Theorem). For every f ∈ C 0 ([0, 21 ]) and > 0, there exists some polynomial p(x) such that |f (x) − p(x)| < for all x ∈ [0, 1/2]. Proof. Problem 8.5.3.

8.5. APPLICATIONS OF FOURIER SERIES

185

When studying functions on R, in Chapter 10 and afterwards, we will also use a refinement of Theorem 7.5.13 that states that the space of Schwartz functions (Section 4.7) is dense in L2 (R). In fact, we will prove the even stronger Theorem 8.5.6, which states that not only are continuous functions with compact support dense in L2 (R), but also smooth functions with compact support. The first-time reader may effectively treat Theorem 8.5.6, or its main consequence Corollary 8.5.7, as another axiom, but the proof involves some interesting new ideas, so we include part of it here, and more details in an appendix. Perhaps the most remarkable thing about smooth functions with compact support is that there are any such functions at all! The key ingredient is the following result. Theorem 8.5.5. For a < b and δ > 0, there exists some ϕ : R → R such that: 1. ϕ ∈ C ∞ (R); 2. For a ≤ x ≤ b, ϕ(x) = 1; 3. For a − δ ≤ x ≤ a and b ≤ x ≤ b + δ, we have 0 ≤ ϕ(x) ≤ 1; and 4. For x ≤ a − δ and b + δ ≤ x, ϕ(x) = 0. A function ϕ(x) satisfying the conditions of Theorem 8.5.5 is known as a bump function; one such function is shown in Figure 8.5.1. The proof that bump functions exist is not deep, but it does take some effort, and is somewhat of a tangent to our main story, so we relegate it to Appendix C.

x a−δ a

b b+δ

Figure 8.5.1: Graph of a bump function ϕ(x) In any case, we may now combine Lebesgue Axiom 6, Corollary 8.5.3, and bump functions to prove the following theorem. Theorem 8.5.6. The space Cc∞ (R) of smooth functions with compact support is dense in L2 (R). Proof. Problem 8.5.4. Moreover, since functions with compact support certainly have a limit of 0 as x → ±∞, we see that Cc∞ (R) ⊆ S(R) ⊆ L2 (R), and therefore: Corollary 8.5.7. The Schwartz space S(R) is dense in L2 (R).

186

8.5.3

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

The Riemann zeta function

For real s > 1, the Riemann zeta function is defined by ∞ X 1 ζ(s) = . ns

(8.5.2)

n=1

This can be extended in a relatively straightforward manner to s ∈ C with 1, and with more effort, to all complex s 6= 1. The conjectured location of the zeros of the zeta function, famously known as the Riemann hypothesis, contains a wealth of information about the distribution of prime numbers, and has therefore been one of the fundamental problems in number theory for the last 150 years (as of this writing). Similarly, finding the value of ζ(s) for particular values of s has long been of interest in number theory, and it turns out that one of the main methods for investigating the zeta function and related ideas is Fourier analysis; see, for example, Edwards [Edw01, Ch. 10]. To give just one example, by applying Parseval’s identity (Theorem 7.6.7) to carefully chosen examples of f ∈ L2 (S 1 ), we obtain the following result. Theorem 8.5.8. We have that ζ(2) =

π2 , 6

ζ(4) =

π4 . 90

(8.5.3)

Proof. Problem 8.5.5. See Section 14.5 for more about the Riemann Hypothesis, including some recent developments related to the material in this book.

8.5.4

Discrete time series and the Wiener-Khinchin theorem

As mentioned in slightly different language in Section 1.2, physically we can think of Fourier series as characterizing a periodic signal f ∈ L2 (S 1 ) in terms of its “frequency response” fˆ(n) at all (discrete) frequencies n ∈ Z. The Inversion Theorem for Fourier Series then tells us that, as long as we only require convergence in L2 , we can recover f from fˆ(n). In some fields, such as statistics, we sometimes consider the reverse of that process. That is, we want to understand a discrete-time signal, or time series, x(t), which is by definition a function x : Z → C. We can think of such an x(t) as a complex signal sampled at regular time intervals, which after scaling we may take to be the integers. In this case, we reverse our usual procedure and define the discrete-time Fourier transform (DTFT) of x(t) to be its frequency response at any (continuous) frequency γ ∈ R. In other words, for x ∈ `2 (Z) and γ ∈ R, we define x ˆ(γ) = hx(t), et (γ)i =

X t∈Z

x(t)et (γ),

(8.5.4)

8.5. APPLICATIONS OF FOURIER SERIES

187

where hx(t), et (γ)i denotes the inner product of x(t) and et (γ) (as functions of t ∈ Z) in `2 (Z). Note that by definition, x ˆ naturally has domain S 1 , or in other words, the frequency response of x(t) is periodic with period 1. Note also that in contrast with Fourier series, where coefficients are defined by an integral (8.1.1) and series are inverted by summation (8.1.3), for the discrete-time Fourier transform, coefficients are defined by summation (8.5.4) and, as we shall see in Theorem 8.5.9, series are inverted by integration. To be precise, since the DTFT is precisely Fourier series inversion with a sign change, the Inversion Theorem for Fourier Series also shows that for x(t) ∈ `2 (Z), we may recover x(t) from x ˆ(γ) as follows. Theorem 8.5.9. For x(t) ∈ `2 (Z), we may recover x(t) by Z 1 x ˆ(γ)et (γ) dγ. x(t) =

(8.5.5)

0

Furthermore, X

2

2

Z

1

|x(t)| = kˆ x(γ)k =

|ˆ x(γ)|2 dγ.

(8.5.6)

0

t∈Z

Proof. Problem 8.5.6. Definition 8.5.10. Because (8.5.6) can be interpreted as saying that taking the DTFT preserves the total power of the signal x(t), we can think of the function Sx (γ) = |ˆ x(γ)|2

(8.5.7)

as describing the (continuous) distribution of signal power among all frequencies. We therefore call Sx (γ) the power spectrum of x(t). Definition 8.5.11. In statistics, we may interpret the `2 (Z) inner product hx(t), y(t)i = X x(t)y(t) as describing the extent to which x and y are “pointed in the same direction”, t∈Z

or correlated. For τ ∈ Z, it is therefore reasonable to define X rx (τ ) = hx(t), x(t − τ )i = x(t)x(t − τ )

(8.5.8)

t∈Z

to be the autocorrelation function of x(t) with time lag τ , since rx (τ ) describes how x(t) is correlated with x(t − τ ) (i.e., x(t) shifted by time τ ). In the above terms, the following theorem describes the relationship between the autocorrelation function and the power spectrum of a time series x(t). Theorem 8.5.12 (Discrete-time Wiener-Khinchin). For x(t) ∈ `2 (Z), we have that Z 1 rx (τ ) = Sx (γ)eτ (γ) dγ. (8.5.9) 0

188

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

In other words, the autocorrelation function value rx (τ ) is precisely the (−τ )th Fourier coefficient of the power spectrum Sx (γ). Proof. Problem 8.5.7. Remark 8.5.13. Theorem 8.5.12 is actually the “easy” case of the discrete-time WienerKhinchin theorem, as the point of Wiener’s theorem is to extend (8.5.9) to a situation where Fourier inversion (Theorem 8.5.9) may not actually hold, and Khinchin’s contribution is to extend Wiener’s result to the case where x(t) is not a fixed signal, but a random variable. Nevertheless, we hope that Theorem 8.5.12 gives some idea of what Wiener-Khinchin says.

8.5.5

A nowhere differentiable function

In this section, we present a version of Weierstrass’ construction of a nondifferentiable function (Theorem 8.5.15), using Fourier series!lacunarylacunary Fourier series, or in other words, Fourier series with large “gaps” between nonzero coefficients. Our presentation is based on Davidson and Donsig [DD10, Ex. 8.4.9]; see also Stein and Shakarchi [SS03, Ch. 4, Sect. 3] for a more conceptual approach. We begin with a conveient “two-sided” characterization of differentiability. + Lemma 8.5.14. If f : R → C is differentiable at c ∈ R, and x− n and xn are sequences − + − + − + such that xn ≤ c ≤ xn and xn < xn for all n ∈ N and lim xn = lim xn = c, then n→∞

− f (x+ n ) − f (xn ) = f 0 (c). − + n→∞ xn − xn

lim

n→∞

(8.5.10)

Proof. Problem 8.5.8. Theorem 8.5.15. Let a ≥ 4, let b be an even integer such that b ≥ a(a + 1), and let fk (x) = a−k cos(bk πx). Then g(x) =

∞ X k=1

fk (x) =

∞ X cos(bk πx) k=1

ak

(8.5.11)

converges uniformly to a continuous function g : S 1 → R that is not differentiable for any c ∈ R. The function g defined in (8.5.11) is pictured in Figure 8.5.2 for the case a = 4, b = 20. Note that the constraints on a and b in Theorem 8.5.15 are certainly not optimal; however, they do simplify the proof. Proof. The uniform convergence of (8.5.11) to some function g is proven in Problem 8.5.9. By Theorem 4.3.8, g is continuous.

8.5. APPLICATIONS OF FOURIER SERIES

189

Figure 8.5.2: The function from (8.5.11), a = 4, b = 20 By Lemma 8.5.14, given c ∈ R, to prove that g is not differentiable at c, it suffices to + − + − + construct sequences x− n and xn such that xn ≤ c ≤ xn and xn < xn for all n ∈ N and − + lim xn = lim xn = c, but n→∞

n→∞

− g(x+ n ) − g(xn ) − n→∞ x+ n − xn

lim

(8.5.12)

does not exist. So for n ∈ N, let dn be the greatest integer less than or equal to bn c, which means that dn dn + 1 ≤c≤ , n b bn

(8.5.13)

and let x− n =

dn , bn

x+ n =

dn + 1 . bn

(8.5.14)

− + − n − + − + Since x+ n − xn = 1/b , we see that xn ≤ c ≤ xn , xn < xn , and lim xn = lim xn = c. It n→∞

n→∞

remains to show that the limit (8.5.12) does not exist. The basic idea is to show that in − g(x+ n ) − g(xn ) =

∞ X

− (fk (x+ n ) − fk (xn )),

(8.5.15)

k=1

the k = n summand 2 − −n fk (x+ |cos((dn + 1)πx) − cos(dn πx)| = n n ) − fk (xn ) = a a dominates all of the others put together. Starting with the k < n terms, since k 0 k f (x) = πb sin(bk πx) ≤ πb , k ak ak

(8.5.16)

(8.5.17)

for k < n, the Mean Value Theorem implies that πbk + π − fk (x+ ) − f (x ) ≤ k xn − x− k n n n = n b a

k b . a

(8.5.18)

190

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

Therefore, if r = b/a ≥ a + 1 ≥ 5, we see that n−1 π X k r π − fk (x+ (rn−1 − 1) r = n n ) − fk (xn ) ≤ n b b r−1 k=1 k=1 π 5 1 1 π 5 1 1 ≤ − n−1 < ≤ n , (8.5.19) n−1 n b 4 a b a+1 4 a a

n−1 X

where the next-to-last inequality discards the

1 term and uses the fact that b ≥ a(a + 1). bn−1

For k > n, by the triangle inequality, −k − + − fk (x+ n ) − fk (xn ) ≤ fk (xn ) + fk (xn ) ≤ 2a ,

(8.5.20)

so, since a ≥ 4, ∞ ∞ X X − fk (x+ ≤ ) − f (x ) k n n k=n+1

k=n+1

=

2 ak 2

an+1

1 1 − (1/a)

2 = a−1

1 an

2 ≤ 3

1 an

. (8.5.21)

It follows by the triangle inequality, (8.5.16), (8.5.19), and (8.5.21) that + 2 1 2 1 g(xn ) − g(x− = n. n) ≥ n − n − n a a 3a 3a

(8.5.22)

− g(x+ bn n ) − g(xn ) ≥ x+ − x− 3an . n n

(8.5.23)

Therefore,

However, since the right-hand side of (8.5.23) goes to +∞ as n → ∞, the limit (8.5.12) cannot exist, and the theorem follows.

8.5.6

More on pointwise convergence

While Theorem 8.1.2 and related results in Sections 8.4 and 8.5.1 are the results on pointwise convergence of Fourier series that we will use most often, in this section, we describe a pointwise convergence result (Theorem 8.5.17) that holds for many examples occurring in practice, including all of our examples from Section 6.2. We also briefly survey what else is known about pointwise convergence in general (Remark 8.5.20). To set the reader’s expectations properly, we first note that there are examples of continuous f : S 1 → C that diverge at uncountably many points on S 1 (see Remark 8.5.20, below), so we will need to make a stronger assumption than continuity. Conversely, many natural examples (see Section 6.2) have a finite number of jump discontinuities and other points of non-differentiability. We therefore arrive at the following class of functions.

8.5. APPLICATIONS OF FOURIER SERIES

191

Definition 8.5.16. For an interval I in R, to say that f : I → C is Lipschitz means that there exists some L > 0 such that for all x, y ∈ I, |f (x) − f (y)| ≤ L |x − y| .

(8.5.24)

Piecewise Lipschitz functions are then defined analogously to piecewise continuous functions (see Definition 3.1.22). For example, if f is piecewise differentiable with bounded f 0 , then f is piecewise Lipschitz (Problem 8.5.10). To lower expectations one more time, recall that if a sequence of continuous functions converges uniformly to some f , then f must be continuous (Theorem 4.3.8). Therefore, for discontiuous f , we can, at best, only hope for pointwise convergence. Consequently, the following theorem (Theorem 8.5.17) is about as good a result as one could hope for, as it gives not only convergence at points of continuity, but also “convergence to the average limit” at jump discontinuities. Note that for brevity in the subsequent discussion, we define f (a− ) = lim f (x).

f (a+ ) = lim f (x),

x→a−

x→a+

(8.5.25)

Theorem 8.5.17. If f : S 1 → C is piecewise Lipschitz, then for any a ∈ S 1 , we have lim fN (a) =

N →∞

f (a+ ) + f (a− ) . 2

(8.5.26)

In particular, if f is also continuous at a, then lim fN (a) = f (a). N →∞

For an example of what (8.5.26) looks like at jump discontinuities of f , see our very first example (!) back in Section 1.1. See also the examples in Remark 11.4.5. Before we can prove Theorem 8.5.17, we will need two more lemmas; the first is the Riemann-Lebesgue lemma. Lemma 8.5.18. For f ∈ L2 (S 1 ), lim fˆ(n) = lim fˆ(−n) = 0.

n→∞

n→∞

Consequently, for any α ∈ R and [a, b] ⊆ [0, 1], we have Z b Z b f (x) sin((2nπ + α)x) dx = lim f (x) cos((2nπ + α)x) dx = 0. lim n→∞ a

n→∞ a

(8.5.27)

(8.5.28)

Proof. For (8.5.28), we may assume [a, b] = [0, 1] because we can extend f : [a, b] → C to the domain [0, 1] by defining f to be 0 outside [a, b]. The rest is proved in Problem 8.5.11. We also need the following Riemann integrability criterion. Lemma 8.5.19. Suppose f : [a, b] → C is bounded, and suppose that for any δ such that 0 < δ < b − a, f is (Riemann) integrable on [a + δ, b]. Then f is integrable on [a, b].

192

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

Proof. Problem 8.5.12. Proof of Theorem 8.5.17. Fix a ∈ S 1 . Recall from Example 8.3.2, Theorem 8.3.4, and Lemma 8.3.6 that the Dirichlet kernel  N  sin((2N + 1)πx) if x 6= 0, X sin(πx) DN (x) = (8.5.29) en (x) =  n=−N 2N + 1 if x = 0, has the property that Z fN (a) = (f ∗ DN )(a) =

1 2

− 12

f (a − t)DN (t) dt.

(8.5.30)

Applying the substitution u = −t and the fact that DN is an even function shows that Z fN (a) =

1 2

f (a − t)DN (t) dt +

0

Z =

1 2

=

1 2

− 21

f (a − t)DN (t) dt −

f (a − t)DN (t) dt

0

Z

0

Z

0

Z

1 2

f (a + u)DN (−u) du

(8.5.31)

(f (a + t) + f (a − t))DN (t) dt.

0

Therefore, since 1 f (a± ) = 2

1 2

Z

f (a± ) dt,

(8.5.32)

0

we see that fN (a) − Z =

1 2

f (a+ ) + f (a− ) 2

Z

+

(f (a + t) − f (a ))DN (t) dt +

0

1 2

(8.5.33) −

(f (a − t) − f (a ))DN (t) dt.

0

It will therefore suffice to show that Z lim

N →∞ 0

1 2

(f (a + t) − f (a+ ))DN (t) dt = 0,

(8.5.34)

as an analogous argument proves the same fact for f (a − t) − f (a− ). For 0 < t ≤ 21 , let f (a + t) − f (a+ ) F (t) = = sin(πt)

f (a + t) − f (a+ ) t

t sin(πt)

.

(8.5.35)

8.5. APPLICATIONS OF FOURIER SERIES

193

Since the value of an integral on [0, 12 ] is not affected by changing the value of the integrand only at t = 0, we see that Z

1 2

Z

+

(f (a + t) − f (a ))DN (t) dt =

0

1 2

F (t) sin((2N + 1)πt) dt.

(8.5.36)

0

By the Riemann-Lebesgue Lemma 8.5.18, the desired limit (8.5.34) will follow if we can show that F (t) is integrable on [0, 12 ]. Note that for 0 < δ < 12 , F (t) is piecewise continuous (and therefore Riemann integrable) on [δ, 12 ]. Therefore, by Lemma 8.5.19, it suffices to show that F (t) is bounded on [0, 12 ] (again ignoring F (0)). Furthermore, suppose [0, b] is the interval of (Lipschitz) continuity containing t = 0 for the piecewise Lipschitz function f (a + t). Then since F (t) is bounded on any interval of continuity of f (a + t), except possibly for [0, b], it suffices to show that F (t) is bounded on [0, b]. However, by the definition of piecewise Lipschitz (Definitions 3.1.22 and 8.5.16), there exists some L > 0 such that for t ∈ [0, b], f (a + t) − f (a+ ) ≤ L |t| . (8.5.37) Furthermore, if g(t) = sin(πt), then because lim

t→0+

t is continuous for 0 < t ≤ g(t)

t 1 1 = 0 = , g(t) g (0) π

1 2

and (8.5.38)

t is also bounded on [0, b]. By (8.5.35), we then see that F (t) is the product sin(πt) of two bounded functions on [0, b], and therefore bounded. The theorem follows. we see that

Remark 8.5.20. In general, the pointwise convergence of Fourier series is subtle and difficult, so we will be content to list a few important results. • On the one hand, Carleson [Car66] showed that if f ∈ L2 (S 1 ), then the Fourier series of f converges almost everywhere. In particular, this holds if f is continuous. • On the other hand, Katznelson [Kat66] showed that given any set of A ⊆ S 1 of measure zero, even an uncountable one, there exists a continuous f whose Fourier series diverges everywhere on A (and possibly other points). • At the other extreme from our L2 convergence results, Kolmogorov [Kol26] showed that there even exist f ∈ L1 (S 1 ) whose Fourier series diverge everywhere on S 1 .

Problems 8.5.1. (Proves Theorem 8.5.1 ) Suppose f ∈ L2 (S 1 ), j ≥ 1, and K and p are real constants K such that p > j + 1 and for all n 6= 0, fˆ(n) < . |n|p

194

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

(a) Prove that for 0 ≤ ` ≤ j, the series g` (x) =

X

(2πin)` fˆ(n)en (x)

(8.5.39)

n∈Z

converges absolutely and uniformly to g` ∈ C 0 (S 1 ), and prove that g0 = f a.e. (Suggestion: Imitate Problems 6.4.4 and 6.4.5, and use Section 8.4.) (b) Let g = g0 . Prove that for 0 ≤ ` ≤ j, we have that X (2πin)` fˆ(n)en (x). g (`) (x) =

(8.5.40)

n∈Z

(Suggestion: Induction on ` and term-by-term differentiation.) 8.5.2. (Proves Corollary 8.5.2 ) Suppose f ∈ L2 (S 1 ), and for all p > 2, there exists some K ˆ constant Kp such that for all n 6= 0, f (n) ≤ . Prove that the Fourier series of |n|p f converges uniformly to some g ∈ C ∞ (S 1 ) such that f = g a.e. (Suggestion: Apply Theorem 8.5.1.) 8.5.3. (Proves Theorem 8.5.4 ) Suppose f ∈ C 0 ([0, 1/2]) and > 0. (a) Explain why feven , the even extension of f , is continuous on S 1 (thinking of S 1 as the interval [−1/2, 1/2]). (b) Prove that if gN (x) =

N X

an en (x) is a trigonometric polynomial of degree N and

n=−N

1 > 0, then there exists some polynomial function p(x) such that |gN (x) − p(x)| < 1 for all x ∈ S 1 . (Suggestion: Combine the definition of en (x), facts about power series, and Theorem 4.3.2.) (c) Prove that if g ∈ C 0 (S 1 ), then there exists some polynomial p(x) such that for all x ∈ S 1 , we have |g(x) − p(x)| < . In particular, if g = feven , then |f (x) − p(x)| < for all x ∈ [0, 1/2]. 8.5.4. (Proves Theorem 8.5.6 ) Suppose f ∈ L2 (R) and > 0. Prove that there exists some g ∈ C ∞ (R) with compact support such that kf − gk < . (Suggestion: Theorem 7.5.13 and Corollary 8.5.3; then use a bump function to “truncate” a function that is smooth on an interval.) 8.5.5. (Proves Theorem 8.5.8 ) (a) For f as defined in Example 6.2.5, compute the value of ζ(2) by computing kf k2 in two different ways. (b) For f as defined in Example 6.2.7, compute the value of ζ(4) by computing kf k2 in two different ways.

8.5. APPLICATIONS OF FOURIER SERIES

195

8.5.6. (Proves Theorem 8.5.9 ) Suppose x ∈ `2 (Z). (a) Prove that the discrete-time Fourier transform X x(t)et (γ) x ˆ(γ) =

(8.5.41)

t∈Z

converges in L2 (S 1 ). (Suggestion: See Section 7.6.) (b) Prove that Z

1

x(t) =

x ˆ(γ)et (γ) dγ.

(8.5.42)

0

(Suggestion: Continuity of the inner product; be careful about complex conjugates.) (c) Prove that X

|x(t)|2 =

Z

1

|ˆ x(γ)|2 dγ.

(8.5.43)

0

t∈Z

(Suggestion: See Section 7.6.) 8.5.7. (Proves Theorem 8.5.12 ) Suppose x ∈ `2 (Z). Define Sx (γ) = |ˆ x(γ)|2 ,

(8.5.44)

rx (τ ) = hx(t), x(t − τ )i =

X

x(t)x(t − τ ).

(8.5.45)

t∈Z

Prove that Z rx (τ ) =

1

Sx (γ)eτ (γ) dγ.

(8.5.46)

0

(Suggestion: Rewrite the right-hand side of (8.5.46) and apply what we know about inner products.) 8.5.8. (Proves Lemma 8.5.14 ) Suppose f : R → C is differentiable at c ∈ R. (a) Prove that for any > 0, there exists some δ() > 0 such that for all x ≤ c ≤ y, x < y such that |x − c| < δ() and |y − c| < δ(), we have that f (y) − f (x) 0 − f (c) < . (8.5.47) y−x (Suggestion: Use local linearity (Corollary 3.2.6).) + − + − + (b) Prove that if x− n and xn are sequences such that xn ≤ c ≤ xn and xn < xn for all − + n ∈ N and lim xn = lim xn = c, then n→∞

n→∞

− f (x+ n ) − f (xn ) = f 0 (c). − n→∞ x+ n − xn

lim

(Suggestion: See the proof of Theorem 3.1.3.)

(8.5.48)

196

CHAPTER 8. CONVERGENCE OF FOURIER SERIES

8.5.9. (Proves Theorem 8.5.15 ) For a > 1, prove that g(x) =

∞ X cos(bk πx)

ak

k=1

(8.5.49)

converges uniformly on R. (Suggestion: Weierstrass M -test.) 8.5.10. Let I be an interval in R. Prove that if f is differentiable on I and f 0 is bounded, then f is Lipschitz. (Suggestion: Mean Value Theorem.) 8.5.11. (Proves Lemma 8.5.18 ) Suppose f ∈ L2 (S 1 ). (a) Prove that lim fˆ(n) = lim fˆ(−n) = 0. (Suggestion: nth term test.) n→∞

n→∞

(b) Prove that Z lim

n→∞ 0

1

Z f (x) sin((2nπ + α)x) dx = lim

n→∞ 0

1

f (x) cos((2nπ + α)x) dx = 0.

(8.5.50)

(Suggestion: Rewrite everything in terms of exponentials.) 8.5.12. (Proves Lemma 8.5.19 ) Suppose f : [a, b] → C is bounded, and suppose that for any δ > 0, f is (Riemann) integrable on [a + δ, b]. Prove that f is integrable on [a, b]. (Suggestion: In the terminology of Lemma 3.3.12, choose a partition P so the contribution µ(f ; P, 1)(∆x)1 to E(f ; P ) from the first subinterval of P is less than /2.)

Part III

Operators and differential equations

197

Chapter 9

PDE’s and diagonalization The same expression whose abstract properties geometers had considered, and which in this respect belongs to general analysis, represents as well the motion of light in the atmosphere, as it determines the laws of diffusion of heat in solid matter, and enters into all the chief problems of the theory of probability. — Joseph Fourier, The Analytical Theory of Heat [Improbability] generators were often used to break the ice at parties by making all the molecules in the hostess’s undergarments leap simultaneously one foot to the left, in accordance with the Theory of Indeterminacy. Many respectable physicists said that they weren’t going to stand for this, partly because it was a debasement of science, but mostly because they didn’t get invited to those sorts of parties. — Douglas Adams, The Hitchhiker’s Guide to the Galaxy In this chapter, we give a brief introduction to the partial differential equations (PDE’s) that we will solve as our main application of Fourier series, both coming from classical physics (Section 9.1) and quantum mechanics (Section 9.2). We also survey some finitedimensional linear algebra (Section 9.3) that we hope will give the reader some context for the infinite-dimensional linear algebra in Chapter 10.

9.1

Some PDE’s from classical physics

Joseph Fourier invented what is now known as Fourier analysis in order to solve certain partial differential equations, or PDE’s, arising from physics. In this section, we introduce and give brief derivations of the heat and wave equations from classical physics, and in the next section, we discuss a PDE coming from quantum mechanics. For reasons that will become clear (see Section 10.2), the key examples all involve the second derivative in some way, so we first discuss how to approximate f 00 (x) for a given x. Now, by Definition 3.2.2, if f is differentiable at x, for small ∆x, we have f 0 (x) ≈

f (x + ∆x) − f (x) , ∆x 199

(9.1.1)

200

CHAPTER 9. PDE’S AND DIAGONALIZATION

with equality as ∆x → 0. So if we only know the value of f (x) at evenly spaced intervals of length ∆x > 0, then the right-hand side of (9.1.1), which we hereby call m+ , gives a reasonable approximation of f 0 (x). More symmetrically, since x + ∆x/2 is the midpoint of x and x + ∆x, we can think of m+ as approximating f 0 (x + ∆x/2); and similarly, the backwards secant slope f (x) − f (x − ∆x) m− = (9.1.2) ∆x gives a reasonable approximation of f 0 (x − ∆x/2). We might therefore reasonably approximate m+ − m− f (x + ∆x) − 2f (x) + f (x − ∆x) f 00 (x) ≈ = , (9.1.3) ∆x (∆x)2 and in fact, one can prove that we get equality as ∆x → 0 (Problem 9.1.1). Turning now to the problem that first inspired Fourier, we consider the following initial value problem: Question 9.1.1. Suppose we have a 1-dimensional wire, possibly circular, such that the temperature at position x and time t = 0 is a given function u(x, 0) = f (x). Solve for u(x, t), the temperature at position x and time t > 0.

∆x x−∆ x

x+ ∆ x

x

heat flow in

heat flow in

Figure 9.1.1: Model of a heated wire We can model Question 9.1.1 using Fourier’s Law of heat conduction, which states the rate of heat flow in/out of an object in a given direction is proportional to the temperature gradient (rate of temperature change) in that direction. For example, as shown in Figure 9.1.1, the rate of heat flowing from a piece of the wire of length ∆x centered at x + ∆x to a piece of the wire of length ∆x centered at x is k1

u(x + ∆x, t) − u(x, t) ∆x

(9.1.4)

for some constant k1 > 0. (The reader should check that the signs in (9.1.4) make sense, e.g., if the piece at x + ∆x is warmer than the piece at x, then the piece at x gets warmer over time.) Let Q be the total heat contained in a piece of the wire of length ∆x centered at x. Assuming that heat is only transferred along the wire, and not through the surrounding material, by summing the heat flow from the pieces at x + ∆x and x − ∆x to the piece at

9.1. SOME PDE’S FROM CLASSICAL PHYSICS x, we see that (Figure 9.1.1) u(x + ∆x, t) − u(x, t) u(x − ∆x, t) − u(x, t) ∆Q = k1 + ∆t ∆x ∆x u(x + ∆x, t) − 2u(x, t) + u(x − ∆x, t) = k1 . ∆x

201

(9.1.5)

Furthermore, since temperature is defined to be (up to a constant depending only on the material in the wire) the average amount of heat per unit mass, we have that ∆Q = k2 (ρ∆x)∆u for some constant k2 > 0, where ρ is the (uniform) density of the wire in mass per unit length. Therefore, ∆u k1 u(x + ∆x, t) − 2u(x, t) + u(x − ∆x, t) = . (9.1.6) ∆t k2 ρ (∆x)2 Taking the limit as ∆t, ∆x → 0, applying (9.1.3), and changing units to eliminate constants, we get the heat equation: ∂2u ∂u = . (9.1.7) 2 ∂x ∂t The heat equation (9.1.7) has several other physical interpretations. For example, if u(x, t) describes the concentration of (say) a gas along a linear pipe, a mathematically similar derivation shows that under simplified conditions, (9.1.7) describes the diffusion of that gas over time. Less straightforwardly, the Black-Scholes equation is a model in mathematical finance that determines fair pricing for what is known as a European (stock) option: ∂V ∂V σ 2 s2 ∂ 2 V + rs + − rV = 0. (9.1.8) ∂t ∂s 2 ∂s2 It turns out that (9.1.8) is equivalent to the heat equation under a change of variables; see, for example, Stein and Shakarchi [SS03, p. 170]. Another of our motivating PDE’s comes from the following situation. Question 9.1.2. Suppose we have a 1-dimensional wire, held taut at both ends so that it vibrates a small amount vertically, as compared to its length. (Think of a string on a stringed instrument.) Suppose also that we know the initial height u(x, 0) = f (x) and the initial (vertical) velocity ut (x, 0) = g(x) of the wire at position x. Solve for u(x, t), the height of the string at position x and time t > 0. We can model Question 9.1.2 with the following “ball-and-spring” approximation. Imagine that our one-dimensional wire is made of individual particles (“balls”), linked to each other by springs representing the tension of the wire, and assume that the wire is pulled tight enough that the balls are effectively constrained to move only vertically, as shown in Figure 9.1.2. A real-life version of the ball-and-spring model from San Francisco’s Exploratorium museum can also be seen below (Figure 9.1.3); here, the rods are attached to a central axle so that the “balls” on the ends of the rods only move vertically.

202

CHAPTER 9. PDE’S AND DIAGONALIZATION

Figure 9.1.2: Ball-and-spring model of a vibrating wire

Figure 9.1.3: A real-life ball-and-spring model, operated by the author’s daughter

9.1. SOME PDE’S FROM CLASSICAL PHYSICS

203

The ball-and-spring approximation can be turned into a PDE as follows. Suppose the wire is being pulled with a uniform tension of magnitude τ . Then by action-reaction (Newton’s Third Law), at time t, each piece of the wire of length ∆x is pulled at both ends by a force with magnitude τ and direction determined by (roughly speaking) the slope of u(x, t) in the x direction. Specifically, the vertical force on the piece of wire of length ∆x centered at x coming from the tension pulling in the positive x direction is ! ∆u+ u(x + ∆x, t) − u(x, t) τ p ≈τ , (9.1.9) ∆x (∆x)2 + (∆u+ )2 where the approximation in (9.1.9) again comes from the assumption that the wire is pulled tight enough that vertical motion is much smaller than horizontal motion. Adding the analogous force coming from the negative x direction, and applying F = ma (Newton’s Second Law), we have u(x + ∆x, t) − u(x, t) u(x − ∆x, t) − u(x, t) ∂2u τ + = ρ∆x 2 , (9.1.10) ∆x ∆x ∂t where ρ is again the uniform density of the wire in mass per unit length. Dividing by ∆x, taking limits, and rescaling units to eliminate constants, we obtain the wave equation: ∂2u ∂2u = . ∂x2 ∂t2

(9.1.11)

Like the heat equation, the wave equation (9.1.11) has other useful interpretations. One important application comes from Maxwell’s equations for the electrical field E(x, y, z) and the magnetic field B(x, y, z) in a region of empty space: ∇×E=− ∇·E=0

1 ∂B c ∂t

1 ∂E c ∂t ∇ · B = 0.

∇×B=

(9.1.12)

One can use multivariable calculus to show that up to constants, (9.1.12) reduces to a system of three 3-dimensional wave equations (Problem 9.1.2). We may therefore think of (9.1.11) as modelling electromagnetic waves in (1-dimensional) space over time, giving, for example, plane wave solutions of the form eik(x−t) (Problem 9.1.3). Now, both the heat and wave equations can be stated for u : S 1 → C, though for the wave equation, S 1 only makes physical sense as a domain if we think of u as, for example, a periodic electromagnetic wave. For applications, however, it is perhaps more interesting to consider the following boundary value problems: Question 9.1.3. Solve the heat and wave equations on the domain [a, b] under the following boundary conditions: 1. Dirichlet boundary conditions: We require that u(a, t) = u(b, t) = 0 for all t. For the heat equation, this means holding the temperature of the wire at 0 at both ends; for the wave equation, this means holding the wire fixed at height 0 at both ends.

204

CHAPTER 9. PDE’S AND DIAGONALIZATION

2. Neumann boundary conditions: We require that ux (a, t) = ux (b, t) = 0 for all t. For the heat equation, by Fourier’s Law, this means that there is no heat flow in or out of the wire at its ends (i.e., perfect insulation); for the wave equation, this is perhaps somewhat less natural, but one might imagine the ends of the wire sliding up and down fixed vertical rods.

Problems 9.1.1. Prove that if f 00 (x) exists, then f (x + h) − 2f (x) + f (x − h) = f 00 (x). h→0 h2 (Suggestion: L’Hˆ opital’s Rule (Theorem 3.6.5).) lim

(9.1.13)

9.1.2. For a vector field F (x, y, z) = (F1 (x, y, z), F2 (x, y, z), F3 (x, y, z)) and a scalar function f (x, y, z) we define the grad of f and the curl and div of F by ∂f ∂f ∂f , , (9.1.14) ∇f = ∂x ∂y ∂z ∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1 ∇×F = − , − , − (9.1.15) ∂y ∂z ∂z ∂x ∂x ∂y ∂F1 ∂F2 ∂F3 ∇·F = + + , (9.1.16) ∂x ∂y ∂z respectively. We also define ∇2 f =

∂2f ∂2f ∂2f + + , and we define ∇2 F componentwise: ∂x2 ∂y 2 ∂z 2

∇2 F = (∇2 F1 , ∇2 F2 , ∇2 F3 ).

(9.1.17)

(a) For a vector field F (x, y, z), as above, prove that ∇ × (∇ × F ) = ∇(∇ · F ) − ∇2 F. (You may assume that partial derviatives commute, e.g.,

(9.1.18)

∂ ∂ ∂ ∂ = .) ∂x ∂y ∂y ∂x

(b) Prove that if E and B satisfy Maxwell’s equations ∇×E=−

1 ∂B c ∂t

∇·E=0

1 ∂E c ∂t ∇·B=0

∇×B=

(9.1.19)

in empty space, then 1 ∂2E , (9.1.20) c2 ∂t2 a system of three 3-dimensional wave equations. (Again, assume partial derivatives commute.) ∇2 E =

9.1.3. Verify by computation that for any k ∈ C, eik(x−t) is a solution to the wave equation ∂2u ∂2u = . ∂x2 ∂t2

¨ 9.2. SCHRODINGER’S EQUATION

9.2

205

Schr¨ odinger’s equation

Our third and final motivating PDE comes from quantum mechanics, and addresses the following question. Question 9.2.1. Explain why, when the energy levels of oxygen (O2 ) molecules are measured, those energy levels are distributed not continuously, but discretely, or in other words, in a quantized manner. The phenomenon from Question 9.2.1 is illustrated in Figure 9.2.1, which shows the energy levels (bright lines) of photons coming from a sample of oxygen molecules that are excited (e.g., heated) and then transition to a lower energy state by emitting a photon. Note that these energy levels (or actually, differences between energy levels) appear to be concentrated in narrow bands, as opposed to, say, some kind of more even or continuous distribution.

Figure 9.2.1: Emission spectrum of oxygen molecules (Image created by Teravolt, released to public domain via wikimedia.org) Now, the answer to Question 9.2.1 cannot be obtained from ordinary mechanics in the straightforward manner that we obtained the heat and wave equations. Indeed, no less an authority than Feymann [Fey11, III.16] calls Schr¨odinger’s equation a matter of inspired guesswork. Nevertheless, following a number of sources (such as Eisberg and Resnick [ER85, Ch. 5] and Holland [Hol07, pp. 284–285]), with hindsight, we can approximate Schr¨odinger’s heuristic reasoning as follows. First off, we note that the bond of a diatomic molecule like O2 can be modelled as a “spring”, as shown in Figure 9.2.2. In that model, if x is the displacement from equilibrium bond length (distance between the two oxygen nuclei), then the corresponding force F (x) applied to the system is F (x) = −k1 x (9.2.1) for some constant k1 > 0. Note that if the bond length is longer than equilibrium (x > 0), then (9.2.1) indicates that attracting forces shorten it, and vice versa.

Figure 9.2.2: A diatomic molecule is like a spring dx Applying F = ma to rewrite (9.2.1) as ma + k1 x = 0, multiplying by v = , and dt 1 integrating with respect to t gives a potential energy function V (x) = k1 x2 that satisfies 2

206

CHAPTER 9. PDE’S AND DIAGONALIZATION

the conservation of energy equation 1 1 mv 2 + k1 x2 = E, 2 2

(9.2.2)

1 mv 2 represents the kinetic energy of the electron, and E is the (constant) total 2 energy in the system. Rewriting (9.2.2) in terms of momentum p = mv, we then get

where

1 k2 p2 + k1 x2 = E. 2

(9.2.3)

Following Schr¨ odinger, with the benefit of hindsight, we can now turn (9.2.3) into an operator equation using DeBroglie’s theory of “matter waves”. In that theory, deBroglie posited that a particle expressing wave-like properties in one spatial dimension can be modeled by a plane wave u(x, t) = ei(px−Et)/k3 , (9.2.4) where p and E are the (constant) momentum and energy of the particle, respectively. (Compare Problem 9.1.3.) Differentiating (9.2.4) and rearranging terms, we see that −ik3

∂u = pu, ∂x

ik3

∂u = Eu. ∂t

(9.2.5)

If we then think of (9.2.5) as equations among operators on u, we have that −ik3

∂ = p, ∂x

ik3

∂ = E. ∂t

(9.2.6)

Substituting into (9.2.3), we get the operator equation −k4

∂2 1 ∂ + k1 x2 = ik3 . 2 ∂x 2 ∂t

(9.2.7)

So what do the two sides of (9.2.7) operate on? Well, the final quantum leap (pun intended) is to suppose that our particle is represented by a state function Ψ(x, t) whose interpretation will be discussed later. For now, we merely note that after adjusting constants, (9.2.7) gives ∂2Ψ ∂Ψ − 2 + 4π 2 x2 Ψ = i . (9.2.8) ∂x ∂t This is Schr¨ odinger’s equation for the quantum harmonic oscillator, and is our final motivating PDE problem: Question 9.2.2. Solve Schr¨ odinger’s equation for the quantum harmonic oscillator (9.2.8), given some initial complex-valued state Ψ(x, 0). Remark 9.2.3. Note that in (9.2.8), the specific forces governing the quantum harmonic 1 oscillator only appear via the potential energy function V (x) = k1 x2 . We may there2 fore obtain Schr¨ odinger’s equation for any number of other physical situations by changing the potential V (x). For many examples, and much more about the general theory of Schr¨odinger’s equation, see Teschl [Tes09].

9.3. DIAGONALIZATION

9.3

207

Diagonalization

In this section, we review some relevant material from linear algebra, to give the reader some motivation for the material on operators in Chapter 10 and beyond. In some sense, the point of linear algebra is to be able to solve matrix equations of the form Ax = b. Here, we restrict our attention to the case where A is an n × n matrix, and x and b are unknown and known n × 1 column vectors, respectively. As the reader may recall, solving Ax = b in general takes some effort (e.g., Gaussian   a11   .. reduction), but one easy case happens when A is diagonal, i.e., of the form   . ann (blank entries are 0), for then Ax = b just becomes n independent one-variable equations. It would therefore certainly be convenient to be able to reduce the general case of Ax = b to the diagonal case, in the sense that if P −1 AP = D, where D is diagonal, then a solution y of Dy = P −1 b gives a solution x = P y to Ax = b (Problem 9.3.1). Perhaps more importantly, we can obtain much information, both quantitative and qualitative, about solutions to Ax = b from the entries of D. (See below for some examples.) To get more traction on this problem of diagonalization, it is helpful to have the following abstraction available. Definition 9.3.1. A linear operator on Cn is a linear transformation T : Cn → Cn (i.e., a map such that for all x, y ∈ Cn and a, b ∈ C, T (ax + by) = aT (x) + bT (y)). For example, if A is an n × n matrix with entries in C, then T (x) = Ax defines a linear operator on Cn ; indeed, for the rest of this section, we assume that T (x) = Ax. We may also therefore restate the problem of solving Ax = b as solving T (x) = b. In these terms, the key to diagonalization is the following circle of ideas. Definition 9.3.2. Let T be a linear operator on Cn . To say that v ∈ Cn is an eigenvector of T means that v 6= 0 and T (v) = λv for some λ ∈ C. To say that λ ∈ C is an eigenvalue of T means that T (v) = λv for some eigenvector v of T . Jointly, if v 6= 0 and T (v) = λv, we say that v is an eigenvector of T with eigenvalue λ. The following term is nonstandard, but we introduce it here because we will find it to be quite useful. Definition 9.3.3. Let T be a linear operator on Cn . An eigenbasis for T is a basis {u1 , . . . , un } for Cn such that each ui is an eigenvector of T . Suppose P is an n × n matrix whose columns are {u1 , . . . , un }. We recall from linear algebra that {u1 , . . . , un } is a basis for Cn if and only if P is invertible. The following is then a necessary and sufficient condition for diagonalization. Theorem 9.3.4. Let A be an n × n matrix, and let T (x) = Ax. If P is an invertible n × n matrix with columns {u1 , . . . , un }, then the following are equivalent:

208

CHAPTER 9. PDE’S AND DIAGONALIZATION

1. P −1 AP is diagonal. 2. {u1 , . . . , un } is an eigenbasis for T . 

 λ1  .. −1 Furthermore, if those conditions hold, then D = P AP =  .

 , where λi is the λn

eigenvalue of ui . Proof. Problem 9.3.2. If A (or equivalently, T ) can be diagonalized, we obtain a number of consequences. For example: • The eigenvalues {λ1 , . . . , λn } are the only eigenvalues of T (Problem 9.3.4). • T is invertible if and only if all of the λi 6= 0 (Problem 9.3.5). In the above light, the reader can think of Chapter 10 as analogous to the finitedimensional diagonalization theory we just described. More specifically, our basic schtick in studying operators is: 1. We look at a linear operator T on a function space. Often T is defined in terms of derivatives, like T (f ) = −f 00 . (The precise definition of operator is more subtle than one might think; see Section 10.1.) 2. We find an orthogonal eigenbasis for T that we use to define (generalized) Fourier series. For T (f ) = −f 00 , one such eigenbasis is {en }, where en (x) = e2πinx . (See Section 10.4.) 3. As a consequence, we see (Theorem 10.4.3) that ! ∞ ∞ X X T cn en = λn cn en , n=1

(9.3.1)

n=1

where λn is the eigenvalue of en . Note that in the finite-dimensional case, (9.3.1) follows from the linearity of T , but in the infinite-dimensional case, (9.3.1) is far from obvious. The main idea of Chapter 11 can then be described as applying diagonalization to solve problems like those from Sections 9.1 and 9.2.

Problems 9.3.1. Suppose A, D, and P are n × n matrices, P is invertible, and P −1 AP = D. Prove that for column vectors y, b ∈ Cn , Dy = P −1 b if and only if A(P y) = b.

9.3. DIAGONALIZATION

209

9.3.2. (Proves Theorem 9.3.4 ) Let A be an n × n matrix, let T (x) = Ax, and let P be an invertible n × n matrix with columns {u1 , . . . , un }. Let D = P −1 AP . (a) Prove that D is diagonal if and only if {u1 , . . . , un } is an eigenbasis for T . (Suggestion: Consider AP = P D.)   λ1   .. (b) Prove if D is diagonal, then D =  , where λi is the eigenvalue of ui . . λn 0 0 . Prove that if P is an invertible 2 × 2 matrix, then P −1 AP is not a 1 0 diagonal matrix. (Suggestion: First prove that if D = P −1 AP is diagonal, then D must be the zero matrix.) 9.3.3. Let A =

9.3.4. Suppose T is a linear operator on Cn and {u1 , . . . , un } is an eigenbasis for T with corresponding eigenvalues {λ1 , . . . , λn }. Prove that if λ ∈ C is not equal to any of the λi and T (v) = λv, then v = 0. (Suggestion: Write out v as a linear combination of {u1 , . . . , un }, and use the fact that {u1 , . . . , un } is linearly independent.) 9.3.5. Suppose T is a linear operator on Cn and B = {e1 , . . . , en } is an eigenbasis for T with corresponding eigenvalues {λ1 , . . . , λn }. (a) Prove that if λi = 0 for some i, then T is not invertible. (b) Prove that if λi 6= 0 for all i, then T is invertible. (Suggestion: Write down a formula for T −1 .)

210

CHAPTER 9. PDE’S AND DIAGONALIZATION

Chapter 10

Operators on Hilbert spaces Operator, Operator, Operator, Operator,

this is an emergency baby, burning up on me this is an emergency operator

— Midnight Star, ”Operator” In mathematics you don’t understand things. You just get used to them. — John von Neumann In this chapter, we develop the Hilbert space analogue of the finite-dimensional theory of operator diagonalization described in Section 9.3. Beginning with the (somewhat subtle) definition of operator (Section 10.1), we define the geometric properties of being Hermitian and positive (Section 10.2) and extend the definitions of eigenvectors and eigenvalues to the Hilbert space setting (Section 10.3). We then conclude the chapter by extending the finite-dimensional theory of diagonalization to Hilbert spaces (Section 10.4).

10.1

Operators on Hilbert spaces

The first step in generalizing Section 9.3 is to define what a linear operator is. One surprise is that our very first definition is, by necessity, trickier than Definition 9.3.1. Definition 10.1.1. Let H be a Hilbert space, or more generally, a function space. We define a linear operator, or simply operator, in H to be a linear map T : D(T ) → H such that D(T ) is a subspace of H. In other words, we require the domain D(T ) of T to contain the zero function and be closed under addition and scalar multiplication in H (Definition 5.2.1); and for all f, g ∈ D(T ) and c ∈ C, we require T (cf ) = cT (f ),

T (f + g) = T (f ) + T (g).

(Compare Definition B.11.) 211

(10.1.1)

212

CHAPTER 10. OPERATORS ON HILBERT SPACES

As we shall see momentarily, in contrast with the finite-dimensional definition (Definition 9.3.1), we do not require T to be defined on all of H in Definition 10.1.1 because some of our most important examples cannot be extended to all of H in a natural manner. We reinforce this distinction by a careful use of prepositions: An operator on H is an operator whose domain is all of H, whereas in the general case, where the domain may be strictly smaller than H, we refer to an operator in H. Example 10.1.2. Let X = S 1 , [a, b], or R, and let H = L2 (X). For λ ∈ C, the map λI : H → H defined by λI(f ) = λf is a linear operator on H with D(λI) = H. (The notation λI is meant to suggest a generalization of λ times the identity matrix.) Example 10.1.3. Let X = S 1 or [a, b], and let H = L2 (X). The map D : C 1 (X) → H defined by D(f ) = −if 0 is a linear operator in H with D(D) = C 1 (X). (Note the mysterious scalar factor −i, which we will explain later; see Remarks 10.2.3 and 10.3.3) Example 10.1.4. Similarly, for H = L2 (R), the map D : Cc1 (R) → H defined by D(f ) = −if 0 is a linear operator in H, with D(D) = Cc1 (R), the space of continuously differentiable functions with compact support. (See Section 8.5.2.) Another useful domain for D is the Schwartz space S(R) of smooth functions that “decay rapidly at infinity” (Section 4.7); note that D actually maps S(R) into itself. Note that for f ∈ C 1 (S 1 ), since f 0 ∈ C 0 (S 1 ), by Theorem 6.4.1 and the Fundamental Theorem of Fourier Series (Theorem 8.1.1), we see that D(f ) =

X

2πnfˆ(n)en (x),

(10.1.2)

n∈Z

with convergence in L2 (S 1 ), though not necessarily pointwise (see Remark 8.5.20). In other words, D essentially multiplies the nth Fourier coefficient of f by 2πn. Example 10.1.5. If H = L2 ([a, b]), then the map Mx : H → H defined by Mx (f ) = xf (x) is a linear operator with domain D(Mx ) = H. More generally, for X = [a, b] or S 1 , H = L2 (X), and a piecewise continuous function g : X → C, the map Mg : H → H defined by Mg (f ) = g(x)f (x) is a linear operator with domain D(Mg ) = H. (This domain works because g(x) is bounded, which means that g(x)f (x) is bounded by a scalar multiple of |f (x)|, and therefore, is in L2 (X); in fact, the same holds if we only require g(x) to be bounded and measurable.) Example 10.1.6. Let H = L2 (R). The map Mx : Cc0 (R) → H defined by Mx (f ) = xf (x), or more generally, for a piecewise continuous function g : X → C, the map Mg : Cc0 (R) → H defined by Mg (f ) = g(x)f (x), is a linear operator with domain D(Mg ) = Cc0 (R), the space of all continuous functions with compact support (see Definition 7.5.11), by reasoning similar to that of Example 10.1.5. We next consider some more abstract examples.

10.1. OPERATORS ON HILBERT SPACES

213

Example 10.1.7. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}, and let ( ∞ ) X H0 = cn en ∈ H all but finitely many cn = 0 . (10.1.3) n=1

Then we may define linear operators µ and ι in H by the formulas ! ∞ ∞ X X µ cn en = ncn en , ι

n=1 ∞ X

! cn en

n=1

=

n=1 ∞ X n=1

cn en , n

(10.1.4)

(10.1.5)

where we take the domains of µ and ι to be D(µ) = H0 and D(ι) = H. Of course, we need to prove that (10.1.4) and (10.1.5) actually produce convergent elements of H on the corresponding domains. For ι, this is Problem 10.1.2, and for µ, this is a special case of the following theorem: Theorem 10.1.8. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}, let H0 be as defined in (10.1.3), and let a(n) be any function a : N → C. Then ! ∞ ∞ X X α cn en = a(n)cn en (10.1.6) n=1

n=1

defines a linear operator in H with domain H0 . Proof. Problem 10.1.3. We call an operator of the form described in Theorem 10.1.8 (possibly with a domain larger than H0 ) a diagonal operator with respect to the basis {en }, as α is an infinitedimensional version of multiplication by a diagonal matrix. Equivalently, we say that the basis {en } diagonalizes the operator α. For example, (10.1.2) shows that the operator D is a diagonal operator with respect to the usual basis {en | n ∈ Z} for H = L2 (S 1 ); equivalently, {en | n ∈ Z} diagonalizes D. (See Section 10.4 for much more on diagonalization.) Example 10.1.9. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}. We define the shift operator σ : H → H by ! ∞ ∞ X X σ cn en = cn en+1 . (10.1.7) n=1

n=1

In other words, σ is the “shift” of each basis vector en to en+1 , extended by (series) linearity. One surprise in the basic definitions of operator theory is that, in contrast to the usual situation of calculus, where continuous functions form the principal class of examples, some of our most important examples of operators are not continuous. To be specific, if T is an operator in the Hilbert space H, then Definition 7.2.11 defines what it means for T to be continuous, and we also have the following idea.

214

CHAPTER 10. OPERATORS ON HILBERT SPACES

Definition 10.1.10. Let T be an operator in the Hilbert space H. To say that T is bounded means that there exists some M > 0 such that for all f ∈ H, we have kT (f )k ≤ M kf k. Note that a bounded operator T is not bounded in the sense of a bounded function on R; rather, T is relatively bounded, in that elements of H are magnified by a factor of at most M. For example, the operator ι of Example 10.1.7 is bounded (Problem 10.1.4), but the differentiation operator D of Example 10.1.3 is not bounded (Problem 10.1.5). The reader new to operators may be surprised to discover that the ideas of continuity and boundedness turn out to be equivalent for operators in a Hilbert space. Theorem 10.1.11. If T is a linear operator in a Hilbert space H with domain D(T ), then the following conditions are equivalent: (UC) T is uniformly continuous over all of D(T ), or in other words, for any > 0, there exists some δ() > 0 such that if kf − gk < δ(), then kT (f ) − T (g)k < . (C0) T is continuous at 0 ∈ D(T ). (B) T is bounded. Proof. Problem 10.1.6. Therefore, since the differentiation operator D and other differential operators are our most important examples of an operator in a Hilbert space, and D and similar operators are not continuous, we must be very careful not to assume that operators are continuous. Put another way, we cannot assume that all operators commute with lim . n→∞ Finally, we describe how to take linear combinations and products of operators. Again, the key point is to be careful with definitions, especially about domains. Definition 10.1.12. Suppose S and T are operators in a Hilbert space H. For a, b ∈ C, we define a function (aS + bT ) with domain D(S) ∩ D(T ) by (aS + bT )(f ) = aS(f ) + bT (f ).

(10.1.8)

If we also have that T (D(T )) ⊆ D(S), then we define a function ST with domain D(T ) by (ST )(f ) = S(T (f )).

(10.1.9)

In other words, we write composition as a product ST . Of course, we need to verify that aS + bT and ST are indeed operators: Theorem 10.1.13. Let H be a Hilbert space, and let S and T be operators in H. 1. If a, b ∈ C, then aS + bT is an operator in H with domain D(S) ∩ D(T ). 2. If T (D(T )) ⊆ D(S), then the composition ST is an operator in H with domain D(T ). Proof. Problem 10.1.7.

10.1. OPERATORS ON HILBERT SPACES

215

Problems 10.1.1. (a) Prove that the operator D(f ) = −if 0 (Example 10.1.3) is linear (i.e., satisfies (10.1.1)). (b) Prove that the operator Mx (f ) = xf (x) (Example 10.1.5) is linear (i.e., satisfies (10.1.1)). 10.1.2. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}. Recall that if f ∈ H, then fˆ(n) = hf, en i is the nth generalized Fourier coefficient. ! ∞ X fˆ(n) (a) Prove that if f ∈ H, then en converges in H (under the L2 metric). n n=1

(b) Prove that ι

∞ X

! cn en

=

n=1

∞ X cn n=1

n

en

(10.1.10)

is an operator on H that is well-defined on all of H. 10.1.3. (Proves Theorem 10.1.8 ) Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}, let H0 be as defined in (10.1.3), and let a : N → C be a function. Prove that the formula ! ∞ ∞ X X α cn en = a(n)cn en (10.1.11) n=1

n=1

gives a well-defined linear map with domain H0 . 10.1.4. Continuing Problem 10.1.2, prove that ι is a bounded operator. 10.1.5. Prove that the operator D (Example 10.1.3), with domain D(D) = C 1 (X), is kD(un )k unbounded. (Suggestion: Find a sequence of vectors un ∈ D(D) such that → ∞.) kun k 10.1.6. (Proves Theorem 10.1.11 ) Let T be a linear operator in a Hilbert space H. Consider the following properties: (UC) T is uniformly continuous over all of D(T ), or in other words, for any > 0, there exists some δ() > 0 such that if kf − gk < δ(), then kT (f ) − T (g)k < . (C0) T is continuous at 0 ∈ D(T ). (B) T is bounded. Since (UC) implies (C0) a fortiori, the following completes the proof of Theorem 10.1.11. (a) Prove that (C0) implies (B). (Suggestion: Take = 1, and for any f ∈ D(T ), scale f to ensure it is near 0.) (b) Prove that (B) implies (UC).

216

CHAPTER 10. OPERATORS ON HILBERT SPACES

10.1.7. (Proves Theorem 10.1.13 ) Let H be a Hilbert space, let S and T be operators in H and a, b ∈ C, and define aS + bT and ST as in Definition 10.1.12. (a) Prove that aS + bT is an operator in H with domain D(S) ∩ D(T ). (b) Prove that if T (D(T )) ⊆ D(S), then ST is an operator in H with domain D(T ).

10.2

Hermitian and positive operators

In this section, we examine two useful geometric properties that many of our favorite examples possess: namely, the properties of being Hermitian and positive. We begin with the first property. Definition 10.2.1. Let T be a linear operator in a Hilbert space H with domain D(T ). To say that T is Hermitian means that for all f, g ∈ D(T ), we have that hT (f ), gi = hf, T (g)i. As it turns out, most, but not all, of the examples discussed in Section 10.1 are Hermitian. Example 10.2.2. Let D be the operator D(f ) = −if 0 on H = L2 (S 1 ) defined in Example 10.1.3, where D(D) = C 1 (S 1 ). For f, g ∈ D(D), we have: Z

1

(−i)f 0 (x)g(x) dx 1 Z 1 = (−i)f (x)g(x) − (*) (−i)f (x)g 0 (x) dx 0 0 Z 1 if (x)g 0 (x) dx = (−i)(f (1)g(1) − f (0)g(0)) + 0 Z 1 = f (x)(−i)g 0 (x) dx (**)

hD(f ), gi =

0

(10.2.1)

0

= hf, D(g)i , where step (*) follows by integration by parts, and the f (x)g(x) term cancels in (**) because f (0) = f (1) and g(0) = g(1). It follows that D is Hermitian. Remark 10.2.3. Note that the factor of −i in D(f ) = −if 0 makes the signs in step (**) of (10.2.1) work out correctly, though replacing −i with any imaginary number would work just as well. See Remark 10.3.3 for a justification of our particular choice. Example 10.2.4. Let D be the operator D(f ) = −if 0 on H = L2 (R) defined in Exam-

10.2. HERMITIAN AND POSITIVE OPERATORS ple 10.1.4, where D(D) is either Cc1 (R) or S(R). For f, g ∈ D(D), we have: Z ∞ (−i)f 0 (x)g(x) dx hD(f ), gi = −∞ Z ∞ ∞ (−i)f (x)g 0 (x) dx (*) − = (−i)f (x)g(x) −∞ −∞ Z ∞ f (x)(−i)g 0 (x) dx = (**)

217

(10.2.2)

−∞

= hf, D(g)i , where (*) is again integration by parts (in the version from Theorem 4.8.7), and the f (x)g(x) term cancels in (**) because lim f (x) = 0 = lim g(x),

x→±∞

x→±∞

(10.2.3)

which holds either because f and g have compact support or because f, g ∈ S(R). Again, in either case, D is Hermitian. Example 10.2.5. Let X = [a, b], and consider the operator Mx (f ) = xf (x) in H = L2 (X) defined in Example 10.1.5. Then Mx is Hermitian (Problem 10.2.1). Example 10.2.6. Similarly, let Mx be the operator in L2 (R) with domain D(Mx ) = S(R) (the Schwartz space) given by Mx (f ) = xf (x). Then Mx is well-defined and Hermitian (Problem 10.2.2). Example 10.2.7. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}, and let α be the diagonal operator defined by ! ∞ ∞ X X α cn en = a(n)cn en , (10.2.4) n=1

n=1

where a(n) is a real-valued function a : N → R, and D(α) = H0 as defined in (10.1.3) (Theorem 10.1.8). Then α is Hermitian (Problem 10.2.3). Example 10.2.8. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}. The shift operator σ (Example 10.1.9) on H is not Hermitian (Problem 10.2.4). To define what it means to be a positive operator, we begin with the following observation. Theorem 10.2.9. Let T be a Hermitian operator in a Hilbert space H. Then for all f ∈ D(T ), hT f , f i is a real number. Proof. Problem 10.2.5. Because of Theorem 10.2.9, the inequality in the following definition makes sense.

218

CHAPTER 10. OPERATORS ON HILBERT SPACES

Definition 10.2.10. Let T be an operator in a Hilbert space H with domain D(T ). To say that T is positive means that T is Hermitian and for all f ∈ D(T ), we have that hT (f ), f i ≥ 0. As the following examples show, some of the most natural examples of positive operators d2 come from − 2 . (This also partly explains why the inequality in Definition 10.2.10 is not dx strict, as an operator T defined by derivatives will often have T (f ) = 0.) Example 10.2.11. Let H = L2 (S 1 ), and let ∆(f ) = −f 00 be the operator in H with domain C 2 (S 1 ). We have: Z

1

f 00 (x)g(x) dx 0 1 Z 1 0 f 0 (x)g 0 (x) dx (*) = − f (x)g(x) + 0 0 Z 1 0 0 = −(f (1)g(1) − f (0)g(0)) + f 0 (x)g 0 (x) dx 0 Z 1 = f 0 (x)g 0 (x) dx (**) 0

= f 0, g0 ,

h∆(f ), gi = −

(10.2.5)

where step (*) follows by integration by parts, and the f 0 (x)g(x) term cancels in (**) because f 0 (0) = f 0 (1) and g(0) = g(1). The same idea shows that hf, ∆(g)i = hf 0 , g 0 i, and therefore, ∆ is Hermitian; moreover, (10.2.5) also shows that h∆(f ), f i = hf 0 , f 0 i ≥ 0, and therefore, ∆ is positive. The operator ∆ is called the Laplacian on S 1 . Example 10.2.12. Let H = L2 (R), and let ∆(f ) = −f 00 be the operator in H with domain either Cc2 (R), the space of twice-continuously-differentiable functions with compact support, or S(R) (see Example 10.1.4). Again, ∆ is positive (Problem 10.2.6). Example 10.2.13. Let H = L2 ([a, b]), and let ∆(f ) = −f 00 be the operator in H with one of the following domains: D(∆)Dir = {f ∈ C ∞ ([a, b]) | f (a) = 0 = f (b)} D(∆)Neu = f ∈ C ∞ ([a, b]) | f 0 (a) = 0 = f 0 (b)

(10.2.6) (10.2.7)

Note that (10.2.6) is the space of smooth functions satisfying the Dirichlet boundary conditions and (10.2.7) is the space of smooth functions satisfying the Dirichlet boundary conditions (see Question 9.1.3). The operator ∆, with either domain, is called the Laplacian on [a, b], and ∆ is once again positive (Problem 10.2.7). One may also consider more abstract examples of positive operators.

10.2. HERMITIAN AND POSITIVE OPERATORS

219

Example 10.2.14. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}, let a(n) be real-valued, and let α be the diagonal operator defined by (10.2.4) with domain D(α) = H0 as defined in (10.1.3) (Theorem 10.1.8). Then α is positive if and only if a(n) ≥ 0 for all n ∈ N (Problem 10.2.8). Recall from Definition 10.1.12 that under suitable circumstances, for a, b ∈ C, we may combine operators S and T in H to form new operators aS +bT and ST . As one might hope, suitable combinations of Hermitian operators are Hermitian, and suitable combinations of positive operators are positive. Theorem 10.2.15. Let H be a Hilbert space, and let S and T be Hermitian operators in H. 1. If a, b ∈ R, then aS + bT , with domain D(S) ∩ D(T ), is a Hermitian operator in H. 2. If D(S) = D(T ) = H0 , S(H0 ) ⊆ H0 , T (H0 ) ⊆ H0 , and ST = T S, then ST is a Hermitian operator in H with domain H0 . Proof. Problem 10.2.9. Theorem 10.2.16. Let H be a Hilbert space, and let S and T be positive operators in H. If a, b ∈ R and a, b ≥ 0, then aS + bT , with domain D(S) ∩ D(T ), is a positive operator in H. Proof. Problem 10.2.10. Remark 10.2.17. The question of when the product ST of positive operators S and T is positive is much more involved, and can be analyzed using the so-called square root of a positive operator; see, for example, Reed and Simon [RS80, VI.4].

Problems 10.2.1. Let X = S 1 or [a, b], and let Mx be the operator Mx (f ) = xf (x) in H = L2 (X) defined in Example 10.1.5. Prove that Mx is Hermitian. 10.2.2. Let Mx be the operator Mx (f ) = xf (x) in H = L2 (X) with domain D(Mx ) = S(R) (the Schwartz space), as defined in Example 10.2.6. (a) Prove that if f ∈ S(R), then xf (x) ∈ S(R) ⊂ L2 (R). (b) Prove that Mx is Hermitian. 10.2.3. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}, and let ! ∞ ∞ X X α cn en = a(n)cn en (10.2.8) n=1

n=1

be the operator from Theorem 10.1.8, where a(n) is a complex-valued function on N. Prove that α is Hermitian if and only if a(n) ∈ R for all n ∈ N. (Suggestion: Parseval.)

220

CHAPTER 10. OPERATORS ON HILBERT SPACES

10.2.4. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}. Prove that the shift operator ! ∞ ∞ X X cn en+1 . (10.2.9) σ cn en = n=1

n=1

on H is not Hermitian. (Suggestion: Try a few linear combinations of the en .) 10.2.5. (Proves Theorem 10.2.9 ) Prove that if T is a Hermitian linear operator in a Hilbert space H, then for all f ∈ D(T ), hT f , f i is a real number. 10.2.6. Let H = L2 (R), and let ∆(f ) = −f 00 be the operator in H with domain either Cc2 (R) or S(R). Prove that ∆ is Hermitian and positive. (Suggestion: See Example 10.2.11, and justify vanishing terms carefully.) 10.2.7. Let H = L2 ([a, b]), and let ∆(f ) = −f 00 . (a) Prove that ∆, with the domain D(∆)Dir = {f ∈ C ∞ ([a, b]) | f (a) = 0 = f (b)} ,

(10.2.10)

is Hermitian and positive. (b) Prove that ∆, with the domain D(∆)Neu D(∆)Neu = f ∈ C ∞ ([a, b]) | f 0 (a) = 0 = f 0 (b) ,

(10.2.11)

is Hermitian and positive. (Suggestion: See Example 10.2.11, and justify vanishing terms carefully.) 10.2.8. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}, and let α be the diagonal operator from Problem 10.2.3, where we now assume that a(n) is a real-valued function on N. Prove that α is positive if and only if a(n) ≥ 0 for all n ∈ N. (Suggestion: Parseval.) 10.2.9. (Proves Theorem 10.2.15 ) Let H be a Hilbert space, and let S and T be Hermitian operators in H. (a) Suppose a, b ∈ R. Prove that aS + bT , with domain D(S) ∩ D(T ), is a Hermitian operator in H. (b) Suppose D(S) = D(T ), S(D(S)) ⊆ D(S), T (D(T )) ⊆ D(T ), and ST = T S. Prove that ST , with domain D(S) = D(T ), is a Hermitian operator in H. 10.2.10. (Proves Theorem 10.2.16 ) Let H be a Hilbert space, let S and T be positive operators in H, and suppose a, b ∈ R and a, b ≥ 0. Prove that aS + bT , with domain D(S) ∩ D(T ), is a positive operator in H.

10.3. EIGENVECTORS AND EIGENVALUES

10.3

221

Eigenvectors and eigenvalues

Continuing our Hilbert space generalization of Section 9.3, we next generalize Definition 9.3.2 in a relatively straightforward manner. Definition 10.3.1. Let T be a linear operator in a Hilbert space H. An eigenvector of T is defined to be some f 6= 0 in D(T ) such that T (f ) = λf for some λ ∈ C. To say that λ ∈ C is an eigenvalue of T means that T (f ) = λf for some eigenvector f of T . Jointly, if f 6= 0 in D(T ) and T (f ) = λf , we say that f is an eigenvector of T with eigenvalue λ. Since our Hilbert spaces are all function spaces, we also call eigenvectors of T eigenfunctions of T . As befits a new idea, we immediately apply Definition 10.3.1 to our favorite examples. Example 10.3.2. Consider the operator D(f ) = −if 0 in L2 (S 1 ), with domain D(D) = C ∞ (S 1 ) (Example 10.1.3). Then for any n ∈ Z, en = e2πinx is an eigenfunction of D with eigenvalue 2πn, since en ∈ C ∞ (S 1 ) and (−i)

d (en ) = (−i)(2πin)en = (2πn)en . dx

(10.3.1)

Remark 10.3.3. Note that the choice of the factor −i in D(f ) = −if 0 , as opposed to some other imaginary factor, makes 2πn the nth eigenvalue of D. Example 10.3.4. Consider the operator D(f ) = −if 0 in L2 (R), with domain D(D) = Cc1 (R) or S(R) (Example 10.1.3). Then D has no eigenvectors or eigenvalues, because if λ ∈ C and f ∈ C 1 (R) such that f 0 (x) = λf (x) for all x ∈ R, we must have f (x) = aeλx for some a ∈ C (Theorem 4.6.3); and eλx is not in L2 (R), since 2 λx e = eλx eλx = ebx , Z

(10.3.2)

∞

where b = λ + λ ∈ R, and

ebx dx diverges (even if b = 0).

−∞

Example 10.3.5. In contrast, consider D(f ) = −if 0 in L2 ([a, b]), with domain D(D) = C 1 ([a, b]). Then for any λ ∈ C, eiλx is an eigenvector of D with eigenvalue λ, since d iλx (e ) = (−i)(iλ)eiλx = λeiλx . (10.3.3) dx Example 10.3.6. Consider the operator ∆(f ) = −f 00 in L2 ( 0, 21 ) (Example 10.2.13). • If we use the domain D(∆)Dir = f ∈ C ∞ ( 0, 12 ) | f (0) = 0 = f ( 12 ) , then for every n ∈ N, 4π 2 n2 is an eigenvalue of ∆ (Problem 10.3.1). • If we use the domain D(∆)Neu = f ∈ C ∞ ( 0, 12 ) | f 0 (0) = 0 = f 0 ( 12 ) , then for every integer n ≥ 0, 4π 2 n2 is an eigenvalue of ∆ (Problem 10.3.1). D(eiλx ) = (−i)

222

CHAPTER 10. OPERATORS ON HILBERT SPACES

Note also that even though en is an eigenvector of ∆ if we use the domain C ∞ (S 1 ), en is not an eigenvector of ∆ using either the domain D(∆)Dir or the domain D(∆)Neu , as en satisfies neither the Dirichlet nor the Neumann boundary conditions. Next, consider the operator Mx (f ) = xf (x) in the Hilbert space H = L2 (R) with domain D(Mx ) = Cc0 (R), the space of all continuous functions with compact support, as described in Example 10.1.6. Even though Mx is Hermitian (Example 10.2.5), we have that: Theorem 10.3.7. The operator Mx has no eigenvalues. Proof. Suppose λ ∈ C and Mx (f ) = xf = λf for some f ∈ Cc0 (R). Now, a priori, the equation xf (x) = λf (x) in L2 (R) only holds a.e. in R, but since both xf (x) and λf (x) are continuous functions on R, by Corollary 7.4.9, we have that xf (x) = λf (x) for all x ∈ R, which means that (x − λ)f (x) = 0 for all x ∈ R. It follows that f (x) = 0 unless x = λ, and since a single value of x ∈ R is a set of measure zero, we see that f (x) = 0 in H = L2 (R). Note that Example 10.3.4 and Theorem 10.3.7 are in marked contrast with the finitedimensional theory, in which every operator has at least one eigenvalue. (See, for example, Messer [Mes97, Ch. 8].) Example 10.3.8. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}, let H0 be as defined in (10.1.3), let a(n) be any function a : N → C, and let α be the operator ! ∞ ∞ X X a(n)cn en , (10.3.4) cn en = α n=1

n=1

with domain D(α) = H0 from Theorem 10.1.8. Then each en is an eigenvector with eigenvalue a(n) (Problem 10.3.2). Example 10.3.9. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}. The shift operator σ (Example 10.1.9) on H has no eigenvectors or eigenvalues (Problem 10.3.3). We conclude this section with several useful properties of eigenvectors and eigenvalues (Theorems 10.3.10–10.3.13). Theorem 10.3.10. Let T be a Hermitian operator in a Hilbert space H, and let λ ∈ C be an eigenvalue of T . Then λ must actually be real; in other words, a Hermitian operator has only real eigenvalues. Proof. Problem 10.3.4. Theorem 10.3.11. Let T be a positive operator in a Hilbert space H, and let λ ∈ C be an eigenvalue of T . Then λ ≥ 0; in other words, a positive operator has only nonnegative eigenvalues. Proof. Problem 10.3.5.

10.3. EIGENVECTORS AND EIGENVALUES

223

Theorem 10.3.12. Let T be a Hermitian operator in a Hilbert space H, and let {u1 , . . . , un } be a set of eigenvectors of T with distinct eigenvalues λ1 , . . . , λn (i.e., for i 6= j, λi 6= λj ). Then {u1 , . . . , un } is an orthogonal set. Proof. Problem 10.3.6. Theorem 10.3.13. Let T be a linear operator (not necessarily Hermitian) in a Hilbert space H, and let {u1 , . . . , un } be a set of eigenvectors of T with distinct eigenvalues λ1 , . . . , λn (i.e., for i 6= j, λi 6= λj ). Then {u1 , . . . , un } is linearly independent. Proof. Problem 10.3.7. The reader should compare Theorems 10.3.12 and 10.3.13 with our previously proven fact that any orthogonal set of nonzero vectors is linearly independent (Problem 7.3.2), and work out why none of these results are redundant.

Problems 10.3.1. Consider the operator ∆(f ) = −f 00 in L2 ( 0, 21 ) (Example 10.2.13). • Suppose D(∆)Dir = f ∈ C ∞ ( 0, 12 ) | f (0) = 0 = f ( 21 ) . Prove that for every n ∈ N, 4π 2 n2 is an eigenvalue of ∆. • Suppose D(∆)Neu = f ∈ C ∞ ( 0, 21 ) | f 0 (0) = 0 = f 0 ( 21 ) . Prove that for every integer n ≥ 0, 4π 2 n2 is an eigenvalue of ∆. (Suggestion: Consider sines and cosines. Make sure your eigenfunctions satisfy the appropriate boundary conditions.) 10.3.2. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}, let a(n) be any function a : N → C, and let ! ∞ ∞ X X α cn en = a(n)cn en , (10.3.5) n=1

n=1

operator with domain D(α) = H0 from Theorem 10.1.8. For n ∈ N, prove en is an eigenvector with eigenvalue a(n). 10.3.3. Let H be a Hilbert space with orthonormal basis B = {en | n ∈ N}, and let ! ∞ ∞ X X σ cn en = cn en+1 (10.3.6) n=1

n=1

be the shift operator on H. Prove that if σ(f ) = λf for some λ ∈ C, then f = 0. ∞ X (Suggestion: Suppose f = cn en , and prove that each of the cn = 0.) n=1

224

CHAPTER 10. OPERATORS ON HILBERT SPACES

10.3.4. (Proves Theorem 10.3.10 ) Let T be a Hermitian operator in a Hilbert space H, and let λ ∈ C be an eigenvalue of T . Prove that λ ∈ R. (Suggestion: Let f be an eigenvector with eigenvalue λ, and consider hT (f ), f i.) 10.3.5. (Proves Theorem 10.3.11 ) Let T be a positive operator in a Hilbert space H, and let λ ∈ R be an eigenvalue of T . Prove that λ ≥ 0. (Suggestion: See Problem 10.3.4.) 10.3.6. (Proves Theorem 10.3.12 ) Let T be a Hermitian operator in a Hilbert space H, and let {u1 , . . . , un } be a set of eigenvectors of T with distinct eigenvalues λ1 , . . . , λn (i.e., for i 6= j, λi 6= λj ). Prove that {u1 , . . . , un } is an orthogonal set. (Suggestion: Consider hT (ui ), uj i.) 10.3.7. (Proves Theorem 10.3.13 ) Let T be a linear operator (not necessarily Hermitian) in a Hilbert space H, and let {u1 , . . . , un } be a set of eigenvectors of T with distinct eigenvalues λ1 , . . . , λn (i.e., for i 6= j, λi 6= λj ). Prove that {u1 , . . . , un } is linearly independent. (Suggestion: Use induction on n, and consider (T − λn I) applied to c1 u1 + · · · + cn un .)

10.4

Eigenbases

In some sense, all of the theory in this chapter is aimed towards establishing the following concept, which generalizes Definition 9.3.3 to infinite-dimensional Hilbert spaces. Definition 10.4.1. Let T be an operator in a Hilbert space H. An eigenbasis for T is an orthogonal basis {un } for H such that every un is an eigenvector of T . (In particular, each un must be contained in D(T ).) In other words, if T is an operator in a Hilbert space H, then an eigenbasis {un } for T has the following properties: 1. Each un ∈ D(T ). 2. The set B = {un } is an orthogonal set of nonzero vectors. ∞ X hf, un i ˆ 3. For any f ∈ H, if f (n) = , then f = fˆ(n)un , where convergence is in the hun , un i n=1 norm metric on H.

4. For each un , T (un ) = λn un for some λn ∈ C. We sometimes summarize the above data by saying that {un } is an eigenbasis for T with associated eigenvalues {λn }. Now, there are a few notable theorems that give sufficient conditions on an operator T that ensure the existence of an eigenbasis for T ; see Section 11.8 for a statement of one such result. However, we will not seek such results; instead, for the most part, we will be content with merely producing examples of eigenbases of operators and describing what happens when such an eigenbasis exists. We begin with our canonical, and most important, example(s).

10.4. EIGENBASES

225

Example 10.4.2. Let H = L2 (S 1 ), and let D(f ) = −if 0 and ∆(f ) = D2 (f ) = −f 00 be the operators in H described previously (Examples 10.1.3 and 10.2.11). Then {en | n ∈ Z}, where en (x) = e2πinx , is an orthonormal basis for H (Theorem 8.1.1), and each en is an eigenfunction for both D and ∆, with eigenvalues 2πn and 4π 2 n2 , respectively. It will be helpful to keep Example 10.4.2 in mind in the subsequent discussion of the consequences of the existence of an eigenbasis. The fundamental such consequence is the following theorem. Theorem 10.4.3 (Diagonalization Theorem). Let T be a Hermitian operator in a Hilbert space H, let {un } be an eigenbasis for T , and let T (un ) = λn un . Then for any f ∈ D(T ), we have ! ∞ ∞ X X T (f ) = T fˆ(n)un = λn fˆ(n)un . (10.4.1) n=1

n=1

In other words, relative to the eigenbasis {un }, T acts like a diagonal operator. (See Theorem 10.1.8; also compare Theorem 9.3.4.) Because of that fact, we also say that the eigenbasis {un } diagonalizes T . Proof. Problem 10.4.1. Note that (10.4.1) is a straightforward consequence of linearity in the finite-dimensional case; the interesting point is that T is linear with respect to an infinite sum. Put another way, as long as we only require convergence in L2 , the operator T can be applied term-byterm to a generalized Fourier series with respect to an eigenbasis for T . This is particularly notable if T is defined in terms of derivatives; compare Examples 4.2.2 and 4.2.3. We close this chapter with a few other consequences of diagonalization. For example, we have the following result, which may seem obvious until you think about it. Theorem 10.4.4. Let T be a Hermitian operator in a Hilbert space H, let {un } be an eigenbasis for T , and let T (un ) = λn un . If f ∈ D(T ) is an eigenvector of T with eigenvalue λ, then λ must be equal to λn for at least one n, and f is a linear combination (possibly an infinite one) of eigenvectors un with λn = λ. Proof. Problem 10.4.2. Building on Theorem 10.4.4, we also have the following result. Theorem 10.4.5 (Simultaneous Diagonalization). Let T be a Hermitian operator in a Hilbert space H, let {un } be an eigenbasis for T , and let T (un ) = λn un . Suppose S is a Hermitian operator in H with D(S) = D(T ) = H0 , S(H0 ) ⊆ H0 , T (H0 ) ⊆ H0 , and ST = T S; and suppose also that the eigenvalues of T are distinct, i.e., λk 6= λn for k 6= n. Then {un } is also an eigenbasis for S. Proof. Problem 10.4.3.

226

CHAPTER 10. OPERATORS ON HILBERT SPACES

Remark 10.4.6. It is possible to obtain results similar to Theorem 10.4.5 under weaker hypotheses. For example, if instead of requiring eigenvalues to be distinct, we only require that they repeat at most finitely many times, one can use the (finite-dimensional) Spectral Theorem (see Messer [Mes97, Ch. 8]) to show that there exists an orthogonal basis {vn }, not necessarily the same as {un }, that is an eigenbasis for both S and T , simultaneously.

Problems 10.4.1. (Proves Theorem 10.4.3 ) Let T be a Hermitian operator in a Hilbert space H, let {un } be an eigenbasis for T , and let T (un ) = λn un . Prove that if f ∈ D(T ), then ! ∞ ∞ X X ˆ T (f ) = T f (n)un = λn fˆ(n)un . (10.4.2) n=1

n=1

(Suggestion: Compute the generalized Fourier coefficients of T (f ).) 10.4.2. (Proves Theorem 10.4.4 ) Let T be a Hermitian operator in a Hilbert space H, let {un } be an eigenbasis for T , and let T (un ) = λn un . Suppose f ∈ D(T ) and T (f ) = λf (λ ∈ C). (a) Prove that if λn 6= λ, then fˆ(n) = 0. (Suggestion: Compute the generalized Fourier coefficients of T (f ).) (b) Prove that if f 6= 0 (i.e., f is an eigenvector and λ is an eigenvalue), then f=

N X

fˆ(nk )unk ,

(10.4.3)

k=1

where N is either a positive integer or ∞, and {nk } is the set of all n such that λn = λ. 10.4.3. (Proves Theorem 10.4.5 ) Let T be a Hermitian operator in a Hilbert space H, let {un } be an eigenbasis for T , and let T (un ) = λn un . Suppose S is a Hermitian operator in H with D(S) = D(T ) = H0 , S(H0 ) ⊆ H0 , T (H0 ) ⊆ H0 , and ST = T S; and suppose also that λk 6= λn for k 6= n. (a) Prove that S(un ) is an eigenvector of T with eigenvalue λn . (b) Prove that {un } is an eigenbasis for S. (Suggestion: Theorem 10.4.4.)

Chapter 11

Eigenbases and differential equations The point is this: if from the outset we demand that our solutions be very reguilar, say k-times continuously differentiable, then we are usually going to have a really hard time finding them, as our proofs must then necessarily include possibly intricate demonstrations that the functions we are building are in fact smooth enough. A far more reasonable strategy is to consider as separate the existence and the smoothnesss (or regularity) probelms. The idea is to define for a given PDE a reasonably wide notion of a weak solution, with the expectation that since we are not asking too much by way of smoothness of this weak solution, it may be easier to establish its existence, uniqueness, and continuous dependence on the given data. — Lawrence C. Evans, Partial Differential Equations [Eva10]

And now, a word from our sponsors. That is, as promised in Chapter 9, in this chapter, we return to one of the main reasons Fourier series were invented: solving differential equations. As described in the epigraph above, the basic idea is to use the theory of “derivatives-as-operators” from Chapter 10 to turn a PDE into an operator equation, find solutions in L2 , and then prove those solutions actually have the desired derivatives. We begin with Fourier’s original motivating example of the heat equation (Section 11.1), and then generalize our solution of the heat equation to what we call the eigenbasis method (Section 11.2). The rest of the chapter consists of applications of the eigenbasis method to the wave equation (Section 11.3), boundary value problems (Section 11.4), Legendre polynomials (Section 11.5), and quantum mechanics (Sections 11.6 and 11.7). We conclude by describing a class of differential equations whose solution spaces naturally product corredsponding eigenbases, namely, the class of SturmLiouville equations (Section 11.8). 227

228

11.1

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

The heat equation on the circle

Before getting to a general description of the eigenbasis method for solving PDE’s, we begin with a specific example, namely, the heat equation. To review, we recall that Question 9.1.1, together with (9.1.7), yields the following precise mathematical problem. (The reader may wish to review the notation from Definitions 5.2.6 and 7.5.9.) Question 11.1.1. Given an initial value f (x) ∈ L2 (S 1 ), find u(x, t) (t > 0) such that: 1. (Differentiable) For fixed t0 > 0, u(x, t0 ) ∈ Cx2 (S 1 ), and for fixed x0 ∈ S 1 , u(x0 , t) ∈ Ct1 ((0, +∞)). 2. (Initial value) For any x ∈ S 1 , lim u(x, t) = f (x). t→0+

3. (PDE) For all t > 0, −

∂u ∂2u =− . ∂x2 ∂t

(11.1.1)

Note that we have reversed the signs in (11.1.1) to make the operator on the left-hand ∂2 side equal to the positive operator ∆ = − 2 (Example 10.2.11). Note also that the ∂x initial value condition could be satisfied a fortiori by finding u(x, t) that is continuous on S 1 × [0, +∞) and satisfies u(x, 0) = f (x); however, we use the looser limit condition here to allow for the possibility that the initial value f (x) is not continuous. In any case, it is now time for another visit to: THE LAND OF WISHFUL THINKING We start by recalling that B = {en | n ∈ Z} is an eigenbasis for the Hermitian operator X 2 1 ˆ ∆ (Example 10.4.2). By the definition of basis, since f ∈ L (S ), f (x) = f (n)en (x) in n∈Z

L2 . Similarly, since we want u(x, t0 ) ∈ Cx2 (S 1 ) ⊆ L2x (S 1 ) for fixed t0 > 0, if we let ψn (t) be the nth Fourier coefficient of u(x, t) at time t, then we must have X u(x, t) = ψn (t)en (x) (11.1.2) n∈Z

for all t > 0. The idea of looking for solutions of the form (11.1.2) is called separation of variables. Because {en } is an eigenbasis, we know that ∆ diagonalizes the expression in (11.1.2) ∂ (Theorem 10.4.3). We similarly hope that (11.1.2) is also diagonalized by − (unjustified ∂t leap #1). In that case, since ∆(en ) = 4π 2 n2 en , we get that X ∆(u)(x, t) = 4π 2 n2 en (x)ψn (t), (11.1.3) n∈Z

X ∂u − (x, t) = (−1)en (x)ψn0 (t). ∂t n∈Z

(11.1.4)

11.1. THE HEAT EQUATION ON THE CIRCLE

229

Comparing (11.1.3) and (11.1.4), we see that it suffices to find functions ψn (t) continuous at 0 such that −ψn0 (t) = 4π 2 n2 ψn (t)

and

ψn (0) = fˆ(n),

where the latter condition comes from the further assumption that lim and t→0+

(11.1.5) X

commute

n∈Z

2 2 (unjustified leap #2). By Theorem 4.6.3, we must then have ψn (t) = fˆ(n)e−4π n t , which means that, at least in the Land of Wishful Thinking, we have the solution X 2 2 u(x, t) = fˆ(n)en (x)e−4π n t (11.1.6)

n∈Z

to (11.1.1), satisfying the initial condition lim u(x, t) = f (x). t→0+

EXITING THE LAND OF WISHFUL THINKING Outside of the Land of Wishful Thinking, we may regard (11.1.6) as an educated guess of a solution u(x, t) of the heat equation. To prove that (11.1.6) is an actual solution of the heat equation on S 1 , we need to justify not only the above formal manipulations, but also the convergence and differentiability of (11.1.6). We may also ask: Is (11.1.6) the only solution to the heat equation on S 1 ? Regularity. First off, convergence to a genuine differentiable solution is relatively straightforward, and in fact, works unexpectedly well. Theorem 11.1.2. Consider the heat equation with initial values u(x, 0) = f (x) ∈ L2 (S 1 ), and suppose u(x, t) is given by (11.1.6). 1. For fixed t0 > 0, the function u(x, t0 ) given by (11.1.6) is in Cx∞ (S 1 ), and (11.1.3) holds. 2. For fixed x0 ∈ S 1 and variable t > 0, the function u(x0 , t) is in C ∞ ((0, +∞)), and (11.1.4) holds. Proof. Problems 11.1.1 and 11.1.2. The remarkable thing about Theorem 11.1.2 is that even when the initial values f (x) 2 2 are quite discontinuous, as functions in L2 (S 1 ) can be, the dampening factors e−4π n t force the corresponding solution u(x, t) to be (pointwise) smooth for t > 0. This smoothing effect is characteristic of what are called parabolic PDE’s, of which the heat equation is perhaps the most fundamental example. For more on parabolic PDE’s, see Friedman [Fri08]. Initial values. If we only require convergence in L2 , then the initial value condition of Question 11.1.1 always holds. More precisely: Theorem 11.1.3. Suppose f (x) ∈ L2 (S 1 ) and u(x, t) is defined by (11.1.6). Then lim ku(x, t) − f (x)k = 0.

t→0+

(11.1.7)

230

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

Proof. Problem 11.1.3. If we make additional assumptions about the smoothness of f , we can actually show that u converges uniformly to f as t → 0, in the following sense. Theorem 11.1.4. Consider Question 11.1.1, with the additional hypothesis that f ∈ C 1 (S 1 ). Then u(x, t) converges to f (x) uniformly on S 1 , or more precisely, lim ku(x, t) − f (x)k∞ = 0,

(11.1.8)

kg(x)k∞ = sup |g(x)| | x ∈ S 1

(11.1.9)

t→0+

where for g ∈ C 0 (S 1 ), denotes the L∞ norm of Example 7.2.3. Proof. Since η(t) = ku(x, t) − f (x)k∞ is a nonnegative function of t, by the Squeeze Lemma 3.1.21, it suffices to show that η(t) is bounded above by a nonnegative continuous function h : [0, +∞) → R such that h(0) = 0. This is done in Problem 11.1.4. We will later be able to relax the hypothesis of Theorem 11.1.4 to f ∈ C 0 (S 1 ); see Section 13.6. Uniqueness. We can use a more careful version of our “wishful thinking” argument to show that (11.1.6) is the only solution to the heat equation on the circle. For simplicity, we assume continuous initial values f (x) and replace the limit condition of Question 11.1.1 with continuity of u(x, t) at t = 0. Theorem 11.1.5. Suppose f ∈ C 0 (S 1 ) and u : S 1 × [0, +∞) → C is such that: 1.

∂2u ∂u and exist and are continuous on S 1 × (0, +∞); ∂x2 ∂t

2. u(x, t) is continuous (including at t = 0) and u(x, 0) = f (x) for any x ∈ S 1 ; and 3. For all t > 0, ∂2u ∂u =− . 2 ∂x ∂t

(11.1.10)

2 2 fˆ(n)en (x)e−4π n t .

(11.1.11)

− Then u(x, t) =

X n∈Z

Proof. Since u(x, t0 ) ∈ C 2 (S 1 ) for fixed t0 > 0, by Theorem 8.1.2, u(x, t0 ) converges uniformly to its Fourier series (in x). Therefore, if we let ψn (t) be the nth Fourier coefficient (in x) of u(x, t), we see that for all t > 0, u(x, t) =

X n∈Z

ψn (t)en (x).

(11.1.12)

11.1. THE HEAT EQUATION ON THE CIRCLE

231

Problem 11.1.5 shows that for t > 0, ψn0 (t) = −4π 2 n2 ψn (t).

(11.1.13)

Since Lemma 3.6.20 implies that Z

1

u(x, t) en (x) dx

ψn (t) =

(11.1.14)

0 2 2 is continuous for t ∈ [0, +∞), Theorem 4.6.3 then shows that ψn (t) = fˆ(n)e−4π n t . The theorem follows.

Remark 11.1.6. The reader may find it interesting to know that the smoothing effect of Theorem 11.1.2 is not only mathematical, but also confirmed by physical experiment. For example, historically, the smoothing effect of heat transfer actually created difficulties when trying to transmit signals by wires across the Atlantic Ocean, as an initial discontinuous “pulse” will smooth out in space, slowing the possible transmission rate. See K¨orner [K¨or89, Sects. 65–66] for an enlightening (and entertaining) analysis of the situation.

Problems 11.1.1. (Proves Theorem 11.1.2 ) Suppose f ∈ L2 (S 1 ). Prove that for fixed t > 0, the function X 2 2 u(x, t) = fˆ(n)en (x)e−4π n t (11.1.15) n∈Z

is in Cx∞ (S 1 ) and term-by-term application of ∆ = − Subsection 8.5.1.)

∂2 is valid. (Suggestion: Apply ∂x2

11.1.2. (Proves Theorem 11.1.2 ) Suppose f ∈ L2 (S 1 ). Prove that for fixed x ∈ S 1 and variable t > 0, the function u(x, t) given by (11.1.15) is infinitely differentiable in the variable t, and term-by-term differentiation −

X ∂u 2 2 (x, t) = (4π 2 n2 )en (x)e−4π n t ∂t

(11.1.16)

n∈Z

is valid. (Suggestion: Imitate the proofs of Theorem 8.5.1 and Corollary 8.5.2.) 11.1.3. (Proves Theorem 11.1.3 ) Suppose f ∈ L2 (S 1 ) and u(x, t) is defined by (11.1.15). (a) Define g : [0, +∞) → R by g(t) =

2 X 2 2 2 1 − e−4π n t fˆ(n) ,

(11.1.17)

n∈Z

which we think of as a function series in the variable t. Prove that g converges uniformly to a continuous function such that g(0) = 0.

232

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

(b) Now let η(t) = ku(x, t) − f (x)k2 . Prove that η(t) = g(t). (Suggestion: Isomorphism Theorem for Fourier Series 7.6.7.) 11.1.4. (Proves Theorem 11.1.4 ) Consider Question 11.1.1, with the additional hypothesis that f ∈ C 1 (S 1 ). Define η(t) = ku(x, t) − f (x)k∞ , X 2 2 (11.1.18) h(t) = 1 − e−4π n t fˆ(n) . n∈Z

(a) Prove that h(t) converges uniformly on [0, +∞) to a continuous function. (Suggestion: Theorem 6.4.1 and the Extra Derivative Lemma 8.4.7.) (b) Prove that η(t) ≤ h(t) for t ∈ [0, +∞). ∂2u 11.1.5. (Proves Theorem 11.1.5 ) Suppose that u : S 1 × (0, +∞) → C is such that = ∂x2 ∂u is continuous (and therefore, so is u). Let ψn (t) be the nth Fourier coefficient (in x) of ∂t u(x, t). Prove that ψn0 (t) = −4π 2 n2 ψn (t). (Suggestion: Compute ψn (t) from the definition and apply Theorem 3.6.23 and integration by parts.)

11.2

The eigenbasis method

We may generalize the heuristic (“wishful thinking”) discussion of Section 11.1 as follows. Suppose L is an operator (e.g., some kind of derivative) in the variable x, and T is an operator in the variable t. Suppose also that we want to find one particular solution to the partial differential equation L(u) = T (u) for u(x, t), given boundary conditions in x and ∂u initial values u(x, 0), (x, 0), etc. In many interesting cases, we can find such a solution ∂t u(x, t) by separation of variables, just as we did in Section 11.1. P Specifically, we look for a solution of the form u(x, t) = ϕn (x)ψn (t) as follows: 1. The boundary conditions in x define a domain D(L) for L as an operator in some Hilbert space H. We first prove that L is Hermitian, which often relies on the fact that functions in D(L) satisfy those boundary conditions (see Section 10.2). 2. The critical (and most difficult) step is to find an eigenbasis for L (Definition 10.4.1). To review, this means that we want to find an orthogonal basis {ϕn (x)} for H such that, for each n, (a) ϕn ∈ D(L); and (b) L(ϕn ) = λn ϕn for some λn ∈ R (since L is Hermitian). Note that again, satisfying the condition ϕn ∈ D(L) often comes down to satisfying the boundary conditions in the definition of D(L).

11.2. THE EIGENBASIS METHOD

233

3. Next, for each n, we solve the ordinary differential equation T (ψ) = λn ψ. For example, d if T = a for some a ∈ C, a 6= 0, then we get a solution space spanned by eλn t/a dt √ d2 (Theorem 4.6.3). If T = − 2 , then for λn = κ2n > 0 (i.e., κn = λn > 0), then we dt get a solution space spanned by {cos κn t, sin κn t} (Theorem 4.6.3); and for λn = 0, we get a solution space spanned by {1, t} (a calculus exercise). Note that eiκn t , e−iκn t has the same span as {cos κn t, sin κn t}, but sines and cosines turn out to be more useful for our purposes here. 4. To complete our solution, recall that by the Diagonalization Theorem 10.4.3, L is diagonalizable with respect to the eigenbasis {ϕn }, or in other words, L can be applied term-by-term (with convergence in L2 ). We also temporarily assume that T is diagonalizable with respect to the solutions of T (ψ) = λn ψ described in the previous step, where L(ϕn ) = λn ϕn . We then have two cases of particular interest. (a) Suppose T (u) = a

∂u for a ∈ C, a 6= 0. Then as with the heat equation, ∂t u(x, t) =

∞ X

An ϕn (x)eλn t/a

(11.2.1)

n=1

is a solution to L(u) = T (u), assuming covergence of the right-hand side of (11.2.1). Furthermore, if we express f (x) in the eigenbasis {ϕn } as f (x) =

∞ X

fˆ(n)ϕn ,

(11.2.2)

n=1

then u(x, t) =

∞ X

fˆ(n)ϕn (x)eλn t/a

(11.2.3)

n=1

is a solution to L(u) = T (u) with the initial condition u(x, 0) = f (x). (See Problem 11.2.1.) √ ∂2u (b) Suppose T (u) = − 2 , λ0 = 0, and for n > 0, λn = κ2n > 0 (i.e., κn = λn ). ∂t (Note that having all λn ≥ 0 is equivalent to L being positive.) Then u(x, t) = A0 + B0 t +

∞ X

(An ϕn (x) cos κn t + Bn ϕn (x) sin κn t)

(11.2.4)

n=1

is a solution to L(u) = T (u), assuming covergence of the right-hand side of (11.2.4). Furthermore, if we express f (x) and g(x) in the eigenbasis {ϕn } as f (x) =

∞ X n=0

fˆ(n)ϕn ,

g(x) =

∞ X n=0

gˆ(n)ϕn ,

(11.2.5)

234

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS then u(x, t) = fˆ(0) + gˆ(0)t ∞ X gˆ(n) ˆ ϕn (x) sin(κn t) + f (n)ϕn (x) cos(κn t) + κn

(11.2.6)

n=1

is a solution to L(u) = T (u) with the initial conditions u(x, 0) = f (x) and ∂u (x, 0) = g(x). (See Problem 11.2.2.) ∂t We define a formal solution to L(u) = T (u) to be a solution that is valid if we ignore questions of convergence and diagonalizability, like (11.2.1) and (11.2.4). Note that while formal solutions are useful, if we want (for example) actual suitably differentiable solutions, we must still address questions like the convergence of the right-hand sides of (11.2.1), (11.2.3), (11.2.4), and (11.2.6), the validity of applying term-by-term operations, and the uniqueness (or lack thereof) of those solutions.

Problems For Problems 11.2.1 and 11.2.2, let {ϕn } be an eigenbasis for L with associated eigenvalues {λn }, and assume diagonalizability and convergence freely, as for the moment, we are only interested in formal solutions. 11.2.1. Let T (u) = a

∂u for a ∈ C, a 6= 0. ∂t

(a) Show that u(x, t) =

∞ X

An ϕn (x)eλn t/a

(11.2.7)

n=1

is a solution to L(u) = T (u). (Suggestion: Theorem 4.6.3.) (b) Show that u(x, t) =

∞ X

fˆ(n)ϕn (x)eλn t/a

(11.2.8)

n=1

satisfies the initial conditions u(x, 0) = f (x). 11.2.2. Let T (u) = −

∂2u and λn = κ2n ≥ 0. ∂t2

(a) Show that u(x, t) = A0 + B0 t +

∞ X

(An ϕn (x) cos κn t + Bn ϕn (x) sin κn t)

n=1

is a solution to L(u) = T (u). (Suggestion: Theorem 4.6.3.)

(11.2.9)

11.3. THE WAVE EQUATION ON THE CIRCLE

235

(b) Show that u(x, t) = fˆ(0) + gˆ(0)t ∞ X gˆ(n) ˆ + ϕn (x) sin(κn t) f (n)ϕn (x) cos(κn t) + κn

(11.2.10)

n=1

satisfies the initial conditions u(x, 0) = f (x) and

11.3

∂u (x, 0) = g(x). ∂t

The wave equation on the circle

Recall that

X

denotes a sum over all n ∈ Z except n = 0 (Notation 4.1.3).

n6=0

As with the heat equation, Question 9.1.2 and (9.1.11) together yield the following precise mathematical problem. Question 11.3.1. Given initial values f (x), g(x) ∈ L2 (S 1 ), find u(x, t) such that: 1. (Differentiable) For fixed t0 > 0, u(x, t0 ) ∈ Cx2 (S 1 ), and for fixed x0 ∈ S 1 , u(x0 , t) ∈ Ct2 ((0, +∞)). 2. (Initial value) For any x ∈ S 1 , lim u(x, t) = f (x),

lim

t→0+

t→0+

∂u (x, t) = g(x). ∂t

(11.3.1)

3. (PDE) For all t > 0, −

∂2u ∂2u = − . ∂x2 ∂t2

(11.3.2)

The reader who is (rightfully) concerned about the usefulness of a wave travelling in a circle should regard Question 11.3.1 as looking for solutions with a given period in the x direction (here, period 1). We can apply the eigenbasis method of Section 11.2 to answer Question 11.3.1 as follows. Eigenbasis and formal solution. By ExampleX 10.4.2, B = {en | n ∈ Z} X is an eigenbasis 2 1 ˆ for ∆. Furthermore, since f, g ∈ L (S ), f (x) = f (n)en (x) and g(x) = gˆ(n)en (x) in n∈Z

n∈Z

L2 . So by (11.2.6), since en is an eigenfunction for ∆ with eigenvalue 4π 2 n2 , X gˆ(n) ˆ ˆ u(x, t) = f (0) + gˆ(0)t + f (n)en (x) cos(2πnt) + en (x) sin(2πnt) 2πn

(11.3.3)

n6=0

is a formal solution to (11.3.2) that satisfies the initial conditions u(x, 0) = f (x) and ∂u (x, 0) = g(x). Again, we verify our guess (11.3.3) by checking convergence, initial values, ∂t and uniqueness.

236

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

“Drift” and domain. Note that our guess (11.3.3) is periodic in t with period 1, except for the term gˆ(0)t, which is sometimes known as a “drift” term (imagine a wave descending at a constant rate). However, let g0 (x) = g(x) − gˆ(0) (a drift-free initial velocity) and X gˆ(n) X fˆ(n)en (x) cos(2πnt) + u0 (x, t) = en (x) sin(2πnt). (11.3.4) 2πn n∈Z

n6=0

If u0 (x, t) is sufficiently differentiable, lim u0 (x, t) = f (x),

lim

t→0+

t→0+

∂u0 (x, t) = g0 (x), ∂t

(11.3.5)

and

∂ 2 u0 ∂ 2 u0 = − (11.3.6) ∂x2 ∂t2 for t > 0, then it is certainly reasonable to say that u(x, t) = u0 (x, t) + gˆ(0)t is a solution to Question 11.3.1, with the same regularity properties as u0 . For the rest of this section, then, we assume that gˆ(0) = 0 and u0 (x, t), as given by (11.3.4), is our guessed solution. As a bonus, we note that in fact, u0 (x, t) is periodic in both x and in t. (In fact, u0 is in L2 (S 1 ) in both variables; see Problem 11.3.1.) Physically, we may interpret this as saying that for a solution of the wave equation, periodicity in space forces periodicity in time; see Section 11.4 for more on this point. Furthermore, if we assume that f ∈ C 1 (S 1 ), we get the ∂u0 following term-by-term calculation of . ∂t −

Lemma 11.3.2. Suppose f ∈ C 1 (S 1 ) and u0 is given by (11.3.4). Then X X ∂u0 (x, t) = − (2πn)fˆ(n)en (x) sin(2πnt) + gˆ(n)en (x) cos(2πnt), ∂t n∈Z

(11.3.7)

n6=0

with convergence in L2t (S 1 ). Proof. Problem 11.3.2. Initial values. In terms of regularity of solutions, we have good news and bad news. The good news is that after we assume an extra degree of differentiability to account for one of the initial values being a derivative, the initial value properties of (11.3.4) hold in a manner analogous to the initial value properties of the solution to the heat equation. We begin with the analogue of Theorem 11.1.3. Theorem 11.3.3. Suppose f (x) ∈ C 1 (S 1 ), g(x) ∈ L2 (S 1 ), and u0 (x, t) is defined by (11.3.4). Then

∂u0

(11.3.8) lim ku0 (x, t) − f (x)k = 0, lim (x, t) − f (x)

= 0. ∂t t→0+ t→0+ Proof. Problem 11.3.3.

11.3. THE WAVE EQUATION ON THE CIRCLE

237

Theorem 11.3.4. Consider Question 11.3.1, with the additional hypotheses that f ∈ ∂u0 C 2 (S 1 ) and g ∈ C 1 (S 1 ). Then u0 (x, t) and (x, t) converge to f (x) and g(x) uniformly ∂t on S 1 , or more precisely,

∂u0

lim ku0 (x, t) − f (x)k∞ = 0, (x, t) − g(x) (11.3.9) lim

= 0. ∂t t→0+ t→0+ ∞ Proof. Define η0 (t) = ku(x, t) − f (x)k∞ ,

∂u0

(x, t) − g(x) η1 (t) =

. ∂t ∞

(11.3.10)

As in the proof of Theorem 11.1.4, by the Squeeze Lemma 3.1.21, it suffices to show that for i = 0, 1, ηi (t) is bounded above by a nonnegative continuous function hi : [0, +∞) → R such that hi (0) = 0. See Problem 11.3.4. Pointwise convergence. One significant qualitative difference between the heat and wave equations is that the solution (11.3.4) to the wave equation lacks the exponentially decaying factors found in the solution (11.1.6) to the heat equation, so there is no wave equation analogue to the smoothing results of Theorem 11.1.2. However, as long as we assume one extra degree of differentiability, we can use the Extra Derivative Lemma 8.4.7 to obtain pointwise (or actually, uniform) convergence properties. (As a bonus, convergence also extends to t = 0.) Theorem 11.3.5. Consider Question 11.3.1, with the additional hypotheses on the initial values f and g that f ∈ C 3 (S 1 ) and g ∈ C 2 (S 1 ). Let u0 (x, t) be given by (11.3.4). Then for fixed t0 ≥ 0, u(x, t0 ) ∈ C 2 (S 1 ); for fixed x0 ∈ S 1 , u(x0 , t) ∈ C 2 (S 1 ); and (11.3.2) holds for all t ≥ 0. Proof. Problem 11.3.5. L2 convergence. If we are willing to settle for L2 convergence of the solution, and not just pointwise/uniform convergence, we can weaken the hypotheses of Theorem 11.3.5 to f ∈ C 2 (S 1 ) and g ∈ C 1 (S 1 ). We begin by extending the domain of ∆. Definition 11.3.6. We define the operator ∆+ on H = L2 (S 1 ) to be the operator given by ! X X + ∆ fˆ(n)en (x) = 4π 2 n2 fˆ(n)en (x), (11.3.11) n∈Z

n∈Z

where the domain of ∆+ is defined by ( ) X X D(∆+ ) = fˆ(n) 4π 2 n2 fˆ(n)en (x) converges in L2 . n∈Z

(11.3.12)

n∈Z

In other words, D(∆+ ) is literally the largest possible subspace of L2x (S 1 ) on which ∆+ could possibly converge.

238

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

Theorem 11.3.7. Let ∆ be the Laplacian on S 1 , and let ∆+ be the operator from Definition 11.3.6. 1. The domain D(∆+ ) contains C 2 (S 1 ). 2. For every f ∈ D(∆), ∆+ (f ) = ∆(f ), or in other words, ∆+ extends ∆. Proof. Problems 11.3.6. We may therefore rewrite the wave equation (11.3.6), in an extended sense, as + ∆+ x (u0 ) = ∆t (u0 ),

(11.3.13)

+ where ∆+ x and ∆t are the extended Laplacian defined above, in the variables x and t, respectively. Note that if instead of the basis {en (t)} for L2 (S 1 ), we use {1, cos(2πnt), sin(2πnt)}, by grouping the ±n terms in (11.3.11), we see that for any h ∈ L2t (S 1 ) with ∞

h(t) =

a0 X + (an cos(2πnt) + bn sin(2πnt)), 2

(11.3.14)

n=1

we have ∆+ (h) =

∞ X

4π 2 n2 (an cos(2πnt) + bn sin(2πnt)).

(11.3.15)

n=1

Putting it all together, we have the following theorem. Theorem 11.3.8. If f ∈ C 2 (S 1 ) and g ∈ C 1 (S 1 ), then for t > 0, (11.3.4) is a solution to (11.3.13) as an equation in either L2x (S 1 ) or L2t (S 1 ). Proof. Problem 11.3.7. Uniqueness. Uniqueness of the solution to the wave equation follows for much the same reasons that it does for the heat equation. We again make extra assumptions about continuity to simplify our discussion. Theorem 11.3.9. Suppose f ∈ C 0 (S 1 ), g ∈ C 0 (S 1 ), and u : S 1 × [0, +∞) → C is such that: 1.

∂2u ∂2u and exist and are continuous on S 1 × (0, +∞); ∂x2 ∂t2

∂u are continuous (including at t = 0) and for any x ∈ S 1 , u(x, 0) = f (x) and 2. u and ∂t ∂u (x, 0) = g(t); and ∂t 3. For all t > 0, −

∂2u ∂2u = − . ∂x2 ∂t2

(11.3.16)

11.3. THE WAVE EQUATION ON THE CIRCLE

239

Then u(x, t) = fˆ(0) + gˆ(0)t +

X

fˆ(n)en (x) cos(2πnt) +

n6=0

gˆ(n) 2πn

en (x) sin(2πnt) . (11.3.17)

Proof. As in the proof of Theorem 11.3.9, if we let ψn (t) be the nth Fourier coefficient (in x) of u(x, t), we see that for all t > 0, X ψn (t)en (x). (11.3.18) u(x, t) = n∈Z

Problem 11.3.8 shows that for t > 0, ψn00 (t) = −4π 2 n2 ψn (t). Since Lemma 3.6.20 and Theorem 3.6.23 together imply that Z 1 Z 1 ∂u ψn (t) = u(x, t) en (x) dx, ψn0 (t) = en (x) dx, 0 0 ∂t

(11.3.19)

(11.3.20)

are continuous for t ∈ [0, +∞), Theorem 4.6.3 then shows for n 6= 0 that gˆ(n) ψn (t) = fˆ(n) cos(2πnt) + sin(2πnt). 2πn

(11.3.21)

As for n = 0, it is a straightforward fact from calculus that if ψ ∈ C 2 ((0, +∞)), ψ 0 is continuous at t = 0, and ψ000 (t) = 0 for t > 0, then ψ0 (t) = ψ0 (0) + ψ00 (0)t. The theorem follows. d’Alembert’s formula. The wave equation was actually first solved in 1746, quite some time before Fourier’s work, by d’Alembert, who found the solution Z 1 1 x+t u(x, t) = (f (x + t) + f (x − t)) + g(y) dy. (11.3.22) 2 2 x−t We can use calculus to verify directly that if f ∈ C 2 (S 1 ) and g ∈ C 1 (S 1 ), then (11.3.22) is a solution to (9.1.11); see Problem 11.3.9. For our purposes, then, the Fourier series solution to the wave equation is therefore most notable as a second example of how eigenbases give a general method for solving differential equations. We also note that (11.3.3) actually does give the same answer as (11.3.22); see Problem 11.3.10. Remark 11.3.10. We note that d’Alembert’s formula (11.3.22) still makes sense when we only assume that (for example) f and g are integrable, and we therefore still think of (11.3.22) as a solution to the wave equation, even when u(x, t) is no longer differentiable. We also note that (11.3.22) preserves, and even propagates, any singularities present in the initial conditions f and g, a notable characteristic of hyperbolic PDE’s (see Evans [Eva10]). In some applications, we actually regard this lack of smoothing as a feature, as, for example, the “broadcast” of initial singularities helps to make the transmission of information by electromagnetic waves work in practice.

240

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

Problems For Problems 11.3.1–11.3.7, we make the “drift-free” assumption gˆ(0) = 0. 11.3.1. Suppose f, g ∈ L2 (S 1 ). Prove that for fixed t, both sums in u0 (x, t) =

X

fˆ(n)en (x) cos(2πnt) +

n∈Z

X gˆ(n) n6=0

2πn

en (x) sin(2πnt)

(11.3.23)

converge in L2x (S 1 ), and similarly, prove for fixed x that both sums converge in L2t (S 1 ). (Suggestion: Hilbert Space Comparison Test 7.6.6.) 11.3.2. (Proves Lemma 11.3.2 ) Suppose f ∈ C 1 (S 1 ) and u0 is given by (11.3.23). Prove that X X ∂u0 (x, t) = − (2πn)fˆ(n)en (x) sin(2πnt) + gˆ(n)en (x) cos(2πnt), ∂t n∈Z

(11.3.24)

n6=0

with convergence in L2t (S 1 ). (Suggestion: First use the Diagonalization Theorem 10.4.3 to ∂u0 prove a formula like (11.3.24) for −i and the basis {en (t)}; then rearrange terms.) ∂t 11.3.3. (Proves Theorem 11.3.3 ) Suppose f ∈ C 1 (S 1 ), g ∈ L2 (S 1 ), and u0 (x, t) is defined ∂u0 by (11.3.23) (and therefore, is given by (11.3.24); see Problem 11.3.2). ∂t (a) Define h0 , h1 : [0, +∞) → R by 2 X gˆ(n) 2 2 h0 (t) = |1 − cos(2πnt)| f (n) + 2πn |sin(2πnt)| , n∈Z n6=0 2 X X 2 h1 (t) = |ˆ g (n)|2 |1 − cos(2πnt)|2 , (2πn)fˆ(n) |sin(2πnt)| + X

n∈Z

2 ˆ

(11.3.25)

n6=0

which we think of as function series in the variable t. Prove that for i = 0, 1, hi converges uniformly to a continuous function on [0, +∞) such that hi (0) = 0. (Suggestion: M -test.)

2

∂u0 2

(x, t) − g(x) (b) Now let η0 (t) = ku0 (x, t) − f (x)k and η1 (t) =

. Prove that for ∂t i = 0, 1, we have ηi (t) = hi (t). (Suggestion: Isomorphism Theorem for Fourier Series.) 11.3.4. (Proves Theorem 11.3.4 ) Suppose u0 (x, t) and

∂u0 are given by (11.3.23) and ∂t

11.3. THE WAVE EQUATION ON THE CIRCLE

241

(11.3.24), respectively, and assume also that f ∈ C 2 (S 1 ) and g ∈ C 1 (S 1 ). Define η0 (t) = ku(x, t) − f (x)k∞ ,

∂u0

η1 (t) =

∂t (x, t) − g(x) , ∞ X ˆ X gˆ(n) h0 (t) = |1 − cos(2πnt)| f (n) + 2πn |sin(2πnt)| , n∈Z n6=0 X X h1 (t) = |ˆ g (n)| |1 − cos(2πnt)| , (2πn)fˆ(n) |sin(2πnt)| + n∈Z

(11.3.26) (11.3.27) (11.3.28) (11.3.29)

n6=0

(a) Prove for i = 0, 1 that hi (t) converges uniformly on [0, +∞) to a continuous function. (Suggestion: Theorem 6.4.1 and the Extra Derivative Lemma 8.4.7.) (b) Prove, for i = 0, 1, that ηi (t) ≤ hi (t) for t ∈ [0, +∞). 11.3.5. (Proves Theorem 11.3.5 ) Suppose u0 (x, t) is given by (11.3.23) and assume now that f ∈ C 3 (S 1 ) and g ∈ C 2 (S 1 ). (a) Prove that for k ≤ 2, the series hk (x, t) =

X

X |2πn|k fˆ(n) + |2πn|k−1 |ˆ g (n)|

n∈Z

n6=0

(11.3.30)

converges. (Suggestion: Extra Derivative Lemma 8.4.7.) (b) Prove that the series (11.3.23) for u0 (x, t) can be differentiated term-by-term twice in both x and t. (Suggestion: Theorem 4.3.12.) (c) Prove that

∂ 2 u0 ∂ 2 u0 (x, t) = (x, t) for all x ∈ S 1 and t ≥ 0. ∂x2 ∂t2

11.3.6. (Proves Theorem 11.3.7 ) Let ∆ be the Laplacian on S 1 , and let ∆+ be the extnded operator ! X X + ∆ fˆ(n)en (x) = 4π 2 n2 fˆ(n)en (x) (11.3.31) n∈Z

n∈Z

from Definition 11.3.6. (a) Prove that if f ∈ C 2 (S 1 ), then f is in ) ( X X D(∆+ ) = fˆ(n) 4π 2 n2 fˆ(n)en (x) converges in L2 . n∈Z

(11.3.32)

n∈Z

(Suggestion: What can you conclude from the fact that f 00 ∈ C 0 (S 1 )? Use Section 6.4 and Theorem 8.1.1.) (b) Prove that if f ∈ D(∆), ∆+ (f ) = ∆(f ). (Suggestion: Section 10.4.)

242

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

11.3.7. (Proves Theorem 11.3.8 ) Prove that if f ∈ C 2 (S 1 ) and g ∈ C 1 (S 1 ), then (11.3.23) is a solution to + ∆+ (11.3.33) x (u0 ) = ∆t (u0 ), as an equation in either L2x (S 1 ) or L2t (S 1 ). (Suggestion: Section 6.4.) 11.3.8. (Proves Theorem 11.3.9 ) Suppose that u : S 1 × (0, +∞) → C is such that

∂2u = ∂x2

∂2u is continuous (and therefore, so is u). Let ψn (t) be the nth Fourier coefficient (in x) of ∂t2 u(x, t). Prove that ψn00 (t) = −4π 2 n2 ψn (t). (Suggestion: Compute ψn (t) from the definition and apply Theorem 3.6.23 and integration by parts.) 11.3.9. Prove that if f ∈ C 2 (S 1 ) and g ∈ C 1 (S 1 ), then d’Alembert’s formula 1 1 u(x, t) = (f (x + t) + f (x − t)) + 2 2

Z

x+t

g(y) dy.

(11.3.34)

x−t

gives a valid solution to Question 11.3.1. (Suggestion: Fundamental Theorem of Calculus and the chain rule.) 11.3.10. Working formally (i.e., without worrying about convergence of infinite series, assuming term-by-term operations work, etc.), prove that d’Alembert’s formula (11.3.34) gives the same solution as X gˆ(n) ˆ ˆ f (n)en (x) cos(2πnt) + u(x, t) = f (0) + gˆ(0)t + en (x) sin(2πnt) . (11.3.35) 2πn n6=0

11.4

Boundary value problems

We next turn to a situation that, in many ways, represents the most “applied” of the various ways in which we will solve the heat and wave equations: Namely, on a closed and bounded interval, with various boundary conditions. For reasons that will become clear, we will stick to the interval 0, 21 , but to be sure, our results hold for any interval [a, b] after suitable scaling and translation. We first restate the heat and wave equations as boundary value problems, making Question 9.1.3 precise. Question 11.4.1. Given an initial value f (x) ∈ L2 0, 21 , find u(x, t) (t ≥ 0) such that: 1. (Differentiable) For fixed t0 > 0, u(x, t0 ) ∈ Cx2 ( 0, 12 ), and for fixed x0 ∈ 0, 12 , u(x0 , t) ∈ Ct1 ((0, +∞)). 2. (Initial value) For any x ∈ S 1 , lim u(x, t) = f (x). t→0+

11.4. BOUNDARY VALUE PROBLEMS

243

3. (PDE) For all t > 0 and x ∈ 0, 12 , −

∂u ∂2u =− . ∂x2 ∂t

(11.4.1)

4. (Boundary values) u(x, t) satisfies one of the following sets of boundary conditions: (a) Dirichlet boundary conditions: u(0, t) = u( 12 , t) = 0 for all t. ∂u ∂u 1 (b) Neumann boundary conditions: (0, t) = ( , t) = 0 for all t. ∂x ∂x 2 Question 11.4.2. Given initial values f (x), g(x) ∈ L2 0, 12 , find u(x, t) (t ≥ 0) such that: 1. (Differentiable) For fixed t0 > 0, u(x, t0 ) ∈ Cx2 ( 0, 21 ), and for fixed x0 ∈ 0, 12 , u(x0 , t) ∈ Ct1 ((0, +∞)). 2. (Initial value) For any x ∈ S 1 , lim u(x, t) = f (x),

lim

t→0+

t→0+

∂u (x, t) = g(x). ∂t

(11.4.2)

3. (PDE) For all t > 0, −

∂2u ∂2u =− 2. 2 ∂x ∂t

(11.4.3)

4. (Boundary values) u(x, t) satisfies one of the following sets of boundary conditions: (a) Dirichlet boundary conditions: u(0, t) = u( 12 , t) = 0 for all t. ∂u ∂u 1 (b) Neumann boundary conditions: (0, t) = ( , t) = 0 for all t. ∂x ∂x 2 As in Sections 11.1 and 11.3, to solve the heat and wave equation boundary value problems fully, we would begin by finding a formal solution, and consider regularity later. However, since the regularity proofs for the boundary value problems greatly resemble the proofs we have seen previously, to avoid repetition, we restrict our attention to the formal problem, and leave regularity to the reader (perhaps in another course). As previously discussed, solving the heat and wave boundary value problems (Questions 11.4.1 and 11.4.2) formally is another application of the eigenbasis method (Sec tion 11.2). The principal new content lies in expressing an arbitrary f ∈ L2 0, 21 in terms of an eigenbasis corresponding to the desired boundary conditions. Towards this end, for f ∈ L2 0, 21 , recall (Section 6.3.3) that the even extension and odd extension of f are respectively the functions feven , fodd ∈ L2 (S 1 ) given by   f (x) if 0 < x < 12 ,   (  0 f (x) if 0 ≤ x ≤ 12 , if x = 0, feven (x) = f (x) = (11.4.4) odd 1  f (−x) if − 2 ≤ x < 0, −f (−x) if − 12 < x < 0,    0 if x = ± 12 .

244

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

Recall also (Theorem 6.3.6) that the (complex) Fourier series of the even and odd extensions of f become cosine and sine series, respectively: ∞

feven (x) =

X n∈Z

fodd (x) =

X

a0 X fˆeven (n)en (x) = + an cos(2πnx), 2

(11.4.5)

n=1

fˆodd (n)en (x) =

∞ X

bn sin(2πnx),

(11.4.6)

n=1

n∈Z

where Z an = 4

1/2

Z f (x) cos(2πnx) dx,

bn = 4

0

1/2

f (x) sin(2πnx) dx,

(11.4.7)

0

and convergence in L2 holds by Theorem 8.1.1. We also recall that by (6.3.7)–(6.3.9), the sets Beven = {cos(2πnx) | n ≥ 1} and Bodd = {sin(2πnx) | n ≥ 1} are each orthogonal (but not orthonormal) sets of nonzero vectors. Finally, we observe that if a series converges to f in L2 (S 1 ), it must also converge to f in L2 0, 21 , as Z

1 2

|f (x) − fn (x)|2 dx ≤

0

Z

1 2

− 12

|f (x) − fn (x)|2 dx.

(11.4.8)

Therefore, by the definition of orthogonal basis, we see that: Corollary 11.4.3. Each of the sets Beven = {cos(2πnx) | n ≥ 0} , Bodd = {sin(2πnx) | n ≥ 1} , is an orthogonal basis for L2 0, 12 . Next, we define operators ∆Dir and ∆Neu in L2 0, 12 as follows.

(11.4.9) (11.4.10)

1. Let ∆Dir have the domain D(∆Dir ) = f ∈ C 2 0, 12 | f (a) = 0 = f (b)

(11.4.11)

and the formula ∆Dir (f ) = −f 00 . 2. Let ∆Neu have the domain D(∆Neu ) = f ∈ C 2 0, 21 | f 0 (a) = 0 = f 0 (b)

(11.4.12)

and the formula ∆Neu (f ) = −f 00 . Then we see that: Theorem 11.4.4. The set Bodd = {sin(2πnx) | n ≥ 1} is an eigenbasis for ∆Dir , and the set Beven = {cos(2πnx) | n ≥ 0} is an eigenbasis for ∆Neu .

11.4. BOUNDARY VALUE PROBLEMS

245

Proof. Problems 11.4.1 and 11.4.2. Therefore, if

∞

∞

n=1

n=1

X a0 X f (x) = + an cos(2πnx) = bn sin(2πnx) 2

(11.4.13)

are the cosine and sine expansions of f , respectively, with convergence in L2 , the eigenbasis method (Section 11.2) gives formal solutions ∞ X

u(x, t) =

bn sin(2πnx)e−4π

2 n2 t

,

(11.4.14)

n=1 ∞

a0 X 2 2 + an cos(2πnx)e−4π n t , 2

u(x, t) =

(11.4.15)

n=1

for the heat equation on 0, tively. Similarly, if

1 2

, with Dirichlet and Neumann boundary conditions, respec∞

∞

n=1 ∞ X

n=1 ∞ X

X a0 X f (x) = + an cos(2πnx) = bn sin(2πnx), 2 g(x) =

c0 + 2

cn cos(2πnx) =

n=1

dn sin(2πnx),

(11.4.16) (11.4.17)

n=1

are the cosine and sine expansions of f and g, respectively, with convergence in L2 , the eigenbasis method gives formal solutions ∞ X dn u(x, t) = bn sin(2πnx) cos(2πnt) + sin(2πnx) sin(2πnt) , (11.4.18) 2πn n=1 a0 c0 u(x, t) = + t 2 2 ∞ c (11.4.19) X n cos(2πnx) sin(2πnt) , + an cos(2πnx) cos(2πnt) + 2πn n=1 1 for the wave equation on 0, 2 , with Dirichlet and Neumann boundary conditions, respectively. Remark 11.4.5. As we discussed in Remark 6.3.7, it may strike some readers as strange that we can have initial values f , g, etc., that do not satisfy the desired boundary conditions, but nonetheless produce solutions u(x, t) that do. The reason is that the sine/cosine series for f , g, etc., converge in L2 , but not necessarily pointwise, which means that, in effect, we change f and g at the boundary to get functions that satisfy the boundary conditions. For example, if our problem involves Dirichlet boundary conditions and an initial value f ∈ L2 0, 12 given by f (x) = 1, the corresponding sine series is f (x) ∼

∞ X 2(1 − (−1)n ) n=1

πn

sin(2πnx).

(11.4.20)

246

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

(See Problem 11.4.3.) By Theorem 8.5.17, (11.4.20) converges to 1 for x ∈ 0, 12 and to 0 for x = 0 and x = 21 ; see Figure 11.4.1 for what this looks like.

Figure 11.4.1: Sine series (N = 30) and odd extensions force Dirichlet Similarly, 1 if our problem involves Neumann boundary conditions and an initial value 2 g ∈ L 0, 2 given by g(x) = x, the corresponding cosine series is ∞

g(x) ∼

1 X (1 − (−1)n ) − cos(2πnx). 4 π 2 n2

(11.4.21)

n=1

(See Problem 11.4.4.) In fact, (11.4.21) converges to x for x ∈ 0, 12 (Problem 11.4.5); more relevantly, since the term-by-term derivativeof (11.4.21) is precisely (11.4.20), the “formal derivative” of g converges to 1 for x ∈ 0, 12 and to 0 for x = 0, 12 . See Figure 11.4.2 for some intution as to what this looks like, and note how much faster the Fourier series of g converges in comparison to the Fourier series of f . Remark 11.4.6. As promised back in Section 1.2, we can now supply some mathematical details of what happens when one presses down at the exact midpoint of a string on a stringed instrument. In one (highly simplified) standard model of a stringed instrument, a string of length 12 (say) held fixed at both ends is modelled by the wave equation with Dirichlet boundary conditions u(0, t) = u( 12 , t) = 0 (left-hand side of Figure 11.4.3). It follows that the height of the string at time t must have the form u(x, t) =

∞ X n=1

for some coefficients an , bn ∈ R.

an cos(2πnt) + bn sin(2πnt) sin(2πnx)

(11.4.22)

11.4. BOUNDARY VALUE PROBLEMS

247

Figure 11.4.2: Cosine series (N = 2) and even extensions force Neumann Pressing the string at the halfway point x = 41 (right-hand side of Figure 11.4.3) imposes an extra “boundary” condition u( 14 , t) = 0, which suppresses the odd harmonics n = 2k + 1, leaving the waveform u(x, t) =

∞ X

a2k cos(2π(2k)t) + b2k sin(2π(2k)t) sin(2π(2k)x).

(11.4.23)

k=1

In particular, the lowest remaining frequency is n = 2(1) = 2, twice the usual ground frequency, which a musician hears as being an “octave up”. Moreover, removing the odd harmonics that make the string sound more complex also results in a purer tone quality.

x=0

x=1/2

neck

bridge

x=1/4

Figure 11.4.3: Pressing halfway on a string suppresses odd harmonics

Problems 11.4.1. (Proves Theorem 11.4.4 ) Prove that Bodd = {sin(2πnx) | n ≥ 1} is an eigenbasis for ∆Dir . (Suggestion: See Section 10.4.) 11.4.2. (Proves Theorem 11.4.4 ) Prove that Beven = {cos(2πnx) | n ≥ 1} is an eigenbasis for ∆Neu . (Suggestion: See Section 10.4.)

248

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

11.4.3. Define f ∈ L2

1 0, 2 by f (x) = 1. Prove that the sine series of f is given by ∞ X 2(1 − (−1)n )

πn

n=1

sin(2πnx).

(11.4.24)

(Suggestion: (11.4.7).) 11.4.4. Define g ∈ L2

0, 21

by g(x) = x. Prove that the cosine series of g is given by ∞

1 X (1 − (−1)n ) − cos(2πnx). 4 π 2 n2

(11.4.25)

n=1

(Suggestion: (11.4.7) and integration by parts.) 11.4.5. Prove that the cosine series of g(x) = x on 0, 21 converges uniformly to g on 0, 12 . (Suggestion: M -test on the even extension of g and imitate the proof of Theorem 8.1.2 in Section 8.4.)

11.5

Legendre polynomials

In this section, we consider the set B of Legendre polynomials, which have applications in a number of physical problems, including the Laplacian in spherical coordinates (see Dym and McKean [DM85, 4.12]). However, reversing our usual practice of starting with a differential equation and finding an eigenbasis we can use to solve it, we start with B and find a differential equation we can use to prove that B is an orthogonal basis for L2 ([−1, 1]), or more precisely, an eigenbasis for a particular differential operator. We begin by defining the polynomials in question. Definition 11.5.1. For n ≥ 0, the nth Legendre polynomial Pn (x) is defined to be n 1 d Pn (x) = n (x2 − 1)n . (11.5.1) 2 n! dx We have the following immediate observation. Theorem 11.5.2. The Legendre polynomial Pn (x) is a polynomial of degree n with leading (2n)! coefficient n . 2 (n!)2 Proof. Problem 11.5.1. For example, the first six Legendre polynomials are: P0 (x) = 1, P2 (x) = 12 (3x2 − 1),

P4 (x) = 81 (35x4 − 30x2 + 3),

P1 (x) = x, P3 (x) = 12 (5x3 − 3x), P5 (x) = 81 (63x5 − 70x3 + 15x).

(11.5.2)

Our first main task is to show that {Pn (x)} is an orthogonal subset of L2 ([−1, 1]). As promised, our proof uses the following operator.

11.5. LEGENDRE POLYNOMIALS

249

Definition 11.5.3. We define an operator L in L2 ([−1, 1]) with domain D(L) = C 2 ([−1, 1]) by the formula df d (1 − x2 ) = (1 − x2 )f 00 − 2xf 0 . (11.5.3) L(f ) = dx dx To use the theory of Chapter 10 fully, we need to know that: Theorem 11.5.4. The operator L of Definition 11.5.3 is Hermitian. Proof. Problem 11.5.2. We next digress slightly to make an algebraic observation about multiplication by x and differentiation, considered as operators. Lemma 11.5.5. Let I be an interval in R (possibly I = R), and suppose f ∈ C n (I) for some n ≥ 1. Then n n n−1 d d d (xf (x)) = x (f (x)) + n (f (x)), (11.5.4) dx dx dx n n−1 n d d d ((x2 − 1)f (x)) = (x2 − 1) (f (x)) + 2nx (f (x)) (11.5.5) dx dx dx n−2 d + n(n − 1) (f (x)), dx where (11.5.4) holds for n ≥ 1 and (11.5.5) holds for n ≥ 2. In other words, if we consider D(f ) = f 0 and X(f (x)) = xf (x) as operators, then (11.5.4) and (11.5.5) become Dn X = XDn + nDn−1 Dn (X 2 − 1) = (X 2 − 1)Dn + 2nXDn−1 + n(n − 1)Dn−2

for n ≥ 1,

(11.5.6)

for n ≥ 2,

(11.5.7)

where multiplication denotes composition of operators. Proof. Problem 11.5.3. Orthogonality then comes from the following result. Theorem 11.5.6. For n ≥ 0, the nth Legendre polynomial Pn (x) is an eigenvector of the operator L of Definition 11.5.3 with eigenvalue −n(n + 1). Consequently, {Pn (x)} is an orthogonal subset of L2 ([−1, 1]). Proof. Problem 11.5.4. To finish the proof that B = {Pn (x)} is an orthogonal basis (Definition 7.3.12), it remains to show that the generalized Fourier series of any f ∈ L2 ([−1, 1]) with respect to B (Definition 7.3.7) converges to f in L2 . Again, we first require an algebraic digression.

250

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

Lemma 11.5.7. Suppose that for each n ≥ 0, pn (x) is a polynomial of degree n. Then any polynomial q(x) of degree N can be expressed as a linear combination of p0 , . . . , pN ; in other words, for every polynomial q(x) of degree N , there exist an ∈ C such that q(x) =

N X

an pn (x).

(11.5.8)

n=0

Proof. Problem 11.5.5. We come to the main result of this section. Theorem 11.5.8. The set B = {Pn (x)} of Legendre polynomials is an orthogonal basis for L2 ([−1, 1]). Proof. Comparing Definition 7.3.12 and Theorem 11.5.6, we see that it remains to show that for any f ∈ L2 ([−1, 1]), the generalized Fourier series of f with respect to B converges to f in L2 . This is proved in Problem 11.5.6. In principal, we now know that B = {Pn (x)} is an orthogonal basis for L2 ([−1, 1]). However, to use B for calculations, we need to know kPn (x)k. We therefore conclude this section with the following result. Theorem 11.5.9. We have that hPn , Pn i =

2 . 2n + 1

Proof. Problem 11.5.7. Remark 11.5.10. We should mention that in other sources, our definition of Legendre polynomials (Definition 11.5.1) may appear as a theorem known as Rodrigues’ formula. Legendre polynomials can alternately be defined as polynomial eigenfunctions of L (Holland [Hol07, 2.8]) or coefficients of the power series expansion ∞

X 1 √ = Pn (x)tn . 2 1 − 2xt + t n=0

(11.5.9)

The latter approach occurs naturally when studying eigenfunctions of the Laplacian in spherical coordinates (Dym and McKean [DM85, 4.12]). Legendre polynomials can also be obtained by applying the Gram-Schmidt orthogonalization process to the set {xn } (Dym and McKean [DM85, 1.3]).

Problems 11.5.1. (Proves Theorem 11.5.2 ) Prove by direct calculation that the nth Legendre polynomial n 1 d (x2 − 1)n (11.5.10) Pn (x) = n 2 n! dx is a polynomial of degree n with leading coefficient

(2n)! . 2n (n!)2

11.5. LEGENDRE POLYNOMIALS

251

11.5.2. (Proves Theorem 11.5.4 ) Recall that the inner product on L2 ([−1, 1]) is given by Z

1

hf, gi =

f (x)g(x) dx.

(11.5.11)

2 df (1 − x ) = (1 − x2 )f 00 − 2xf 0 . dx

(11.5.12)

−1

Prove that the operator d L(f ) = dx

is Hermitian. (Suggestion: Integration by parts; as usual, be sure to justify any terms that vanish.) 11.5.3. (Proves Lemma 11.5.5 ) Consider D(f ) = f 0 and X(f (x)) = xf (x) as operators, and let multipication denote composition of operators. (a) For n ≥ 1, prove that Dn X = XDn + nDn−1 . (Suggestion: Apply both sides to some n times differentiable f , and use induction.) (b) For n ≥ 2, prove that Dn (X 2 − 1) = (X 2 − 1)Dn + 2nXDn−1 + n(n − 1)Dn−2 . 11.5.4. (Proves Theorem 11.5.6 ) Consider the operator L from (11.5.12). (a) Check by direct calculation that P0 (x) = 1 and P1 (x) = x are eigenfunctions of L. (b) Check by direct calculation that (x2 − 1)

d 2 (x − 1)n = 2nx(x2 − 1)n . dx

(11.5.13)

(c) For n ≥ 2, by differentiating both sides of (11.5.13) n + 1 times, prove that yn = Dn (x2 − 1)n

(11.5.14)

is an eigenvector of L with eigenvalue −n(n + 1), and prove that Pn (x) = 2n n!ym is also an eigenvector of L with eigenvalue −n(n + 1). (Suggestion: Lemma 11.5.5.) (d) Prove that {Pn (x)} is an orthogonal subset of L2 ([−1, 1]). (Suggestion: Section 10.3.) 11.5.5. (Proves Lemma 11.5.7 ) Suppose that for each n ≥ 0, pn (x) is a polynomial of degree n. Prove that for every polynomial q(x) of degree N , there exist an ∈ C such that q(x) =

N X n=0

(Suggestion: Induction on N .)

an pn (x).

(11.5.15)

252

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

11.5.6. (Proves Theorem 11.5.8 ) Let B = {Pn (x)} be the set of Legendre polynomials, and suppose f ∈ L2 ([−1, 1]). Prove that lim

N →∞

N X

fˆ(n)un = f.

(11.5.16)

n=1

(Suggestion: Start with Weierstrass approximation (Theorem 8.5.4) and use Lemma 11.5.7 to imitate the proof of Theorem 8.1.1 in Section 8.4.) 11.5.7. (Proves Theorem 11.5.9 ) The goal of this problem is to calculate hPn , Pn i.

(a) Prove that for 0 ≤ k ≤ n − 1, xk , Pn = 0. (Suggestion: Lemma 11.5.7.) k d (b) Prove that for 0 ≤ k ≤ n − 1, (x2 − 1)n = qk (x)(1 − x2 )n−k , where qk (x) is dx a polynomial. (Suggestion: Finite induction on k.) (c) Prove that hPn , Pn i = han xn , Pn i, where an xn is the leading term of Pn (x). (Suggestion: Part (a).) n d 2 n 2 n . (Suggestion: Calculate x , (x − 1) using (d) Prove that hPn , Pn i = 2n + 1 dx (what else?) integration by parts; again, carefully justify vanishing.)

11.6

Hermite functions

In this section, we introduce the eigenbasis of Hermite functions {hn (x)} for a differential operator in L2 (R) coming from Schr¨odinger’s equations (9.2.8) for the quantum harmonic oscillator (see Section 9.2). As we shall see, the orthogonal basis {hn (x)} is not just useful for solving Schr¨ odinger’s equations, to which we return in Section 11.7, but also turns out to be an eigenbasis for the Fourier transform, the subject of Chapters 12 and 13. As in the previous section, we begin by defining the functions in question. In keeping with our choices of conventions and constants, we follow the presentation and somehwhat unusual conventions of Hermite functions and polynomials from Dym and McKean [DM85, 2.5]; see Remark 11.6.11 for a comparison with other definitions. Definition 11.6.1. For n ≥ 0, the nth Hermite function hn (x) is defined to be n (−1)n d 2 πx2 hn (x) = e−2πx . e n! dx

(11.6.1)

We again begin with an immediate observation. Theorem 11.6.2. The Hermite function hn (x) satisfies 2

hn (x) = Hn (x)e−πx , where Hn (x) is a polynomial of degree n with leading coefficient

(11.6.2) (4π)n . n!

11.6. HERMITE FUNCTIONS

253

Proof. Problem 11.6.1. Definition 11.6.3. We define the nth Hermite polynomial to be the polynomial Hn (x) described in Theorem 11.6.2. For example, the first six Hermite polynomials are: 1 (64π 3 x3 − 48π 2 x), 3! 1 H1 (x) = 4πx, H4 (x) = (256π 4 x4 − 384π 3 x2 + 48π 2 ), 4! 1 1 2 2 H2 (x) = (16π x − 4π), H5 (x) = (1024π 5 x5 − 2560π 4 x3 + 960π 3 x). 2! 5! H0 (x) = 1,

H3 (x) =

(11.6.3)

As mentioned above, our main application of Hermite functions in this chapter is to study the following differential operator. Definition 11.6.4. We define the operator K in L2 (R) with domain D(K) = S(R) (the space of Schwartz functions from Section 4.7) by K(f ) = −

d2f + 4π 2 x2 f. dx2

(11.6.4) 2

Note that since the Schwartz space S(R) contains all functions of the form p(x)e−ax , where p(x) is a polynomial and a > 0 (Theorem 4.7.4), the Hermite functions hn (x) are contained in the domain of K. Note also that by Examples 10.2.6 and 10.2.12 and Theorem 10.2.15, K is Hermitian. In any case, as with the Legendre polynomials, our first main task is to prove that the hn (x) are eigenfunctions of K, which we do using the following formulas. Lemma 11.6.5. Setting h−1 (x) = 0, for n ≥ 0, the Hermite functions hn (x) satisfy the following identities: d − 2πx hn = h0n − 2πxhn = −(n + 1)hn+1 , (11.6.5) dx d + 2πx hn = h0n + 2πxhn = 4πhn−1 . (11.6.6) dx Consequently, if D(f ) = f 0 and X(f (x)) = xf (x), we sometimes call the operator D − 2πx a raising operator, and the operator D + 2πx a lowering operator. Proof. Problems 11.6.2 and 11.6.3. Theorem 11.6.6. For n ≥ 0, the nth Hermite function hn (x) is an eigenfunction of the operator K of Definition 11.6.4 with eigenvalue 4π n + 21 . Consequently, {hn (x)} is an orthogonal subset of L2 (R). Proof. Problem 11.6.4.

254

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

It therefore remains to show that for any f ∈ L2 (R), the generalized Fourier series of f with respect to {hn (x)} converges to f . However, we will be able to prove this fact more naturally once we have the Fourier transform available to us, so we will delay the proof of the following theorem until Section 13.4. Theorem 11.6.7. The set B = {hn (x)} of Hermite functions is an orthogonal basis for L2 (R); moreover, it is an eigenbasis for the operator K of Definition 11.6.4. 2

Remark 11.6.8. Recall that S(R) contain all functions of the form p(x)e−ax , where p(x) is a polynomial and a > 0 (Theorem 4.7.4). Actually, Theorem 11.6.7 implies that such 2 functions are dense in L2 (R), so we may think of functions p(x)e−ax as being, in some sense, typical elements of S(R), or more loosely speaking, (approximately) typical elements of L2 (R). Again, it will be helpful for calculations to know the values of khn k: (4π)n . Theorem 11.6.9. We have that hhn , hn i = √ 2n! Proof. Problem 11.6.5. Definition 11.6.10. We define the normalized Hermite functions ψn (x) to be ψn (x) =

√ ! 21/4 n! hn (x). (4π)n/2

(11.6.7)

Note that Theorems 11.6.7 and 11.6.9 together imply that {ψn (x)} is an orthonormal basis for L2 (R). Remark 11.6.11. As mentioned at the beginning of this section, our definition of the Hermite polynomial includes some unusual choices. For comparison, two more common definitions for the Hermite polynomials (see [DLM, Sect. 18]) are: d n −x2 /2 H1n (x) = (−1) e e , dx n d 2 2 H2n (x) = (−1)n ex e−x . dx n x2 /2

(11.6.8) (11.6.9)

These polynomials H1n (x) and H2n (x) are related to our polynomials Hn (x) by

n! x H1n (x) = Hn √ , (4π)n/2 4π n! x H2n (x) = Hn √ . (2π)n/2 2π

(11.6.10) (11.6.11)

11.6. HERMITE FUNCTIONS

255

Problems 11.6.1. (Proves Theorem 11.6.2 ) Prove that the nth Hermite function hn (x) =

(−1)n n!

e

πx2

d dx

n

2

e−2πx

(11.6.12)

2

has the form hn (x) = Hn (x)e−πx , where Hn (x) is a polynomial of degree n with leading (4π)n . (Suggestion: Induction.) coefficient n! 11.6.2. (Proves Lemma 11.6.5 ) Prove that for n ≥ 0, the Hermite functions hn (x) satisfy h0n − 2πxhn = −(n + 1)hn+1 . (Suggestion: Compute h0n from the definition.) 11.6.3. (Proves Lemma 11.6.5 ) Let h−1 (x) = 0. (a) Prove that for n ≥ 1, 4πx

d dx

n

2

(e−2πx ) = −

d dx

n+1

2

(e−2πx ) − 4πn

d dx

n−1

2

(e−2πx ).

(11.6.13)

(Suggestion: Lemma 11.5.5.) (b) Prove that for n ≥ 0, h0n + 2πxhn = 4πhn−1 . (Suggestion: Do n = 0 separately; otherwise, use Problem 11.6.2.) 11.6.4. (Proves Theorem 11.6.6 ) Let hn (x) be the nth Hermite function, and let K(f ) = −

d2f + 4π 2 x2 f. dx2

(a) Prove that hn (x) is an eigenfunction of K with eigenvalue 4π n + Lemma 11.6.5.)

(11.6.14) 1 2

. (Suggestion:

(b) Prove that {hn (x)} is an orthogonal subset of L2 (R). (Suggestion: Section 10.3.) 11.6.5. (Proves Theorem 11.6.9 ) The goal of this problem is to calculate hhn , hn i. D E 2 (a) Prove that for 0 ≤ k ≤ n − 1, xk e−πx , hn = 0. (Suggestion: Lemma 11.5.7.) D E 2 n −πx (b) Prove that hhn , hn i = an x e , hn , where an xn is the leading term of Hn (x). (Suggestion: Part (a).) an (c) Prove that hhn , hn i = √ . (Suggestion: Theorem 4.8.6, induction on n ≥ 0, and (yet 2 again!) integration by parts; again, carefully justify vanishing.)

256

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

11.7

The quantum harmonic oscillator

With the Hermite functions in hand, we now return to the solution of Schr¨odinger’s equation for the quantum harmonic oscillator (Question 9.2.2), restated here more precisely. Question 11.7.1. Given an initial value f (x) ∈ L2 (R), find Ψ(x, t) (t > 0) such that: 1. (Differentiable) For fixed t0 > 0, Ψ(x, t0 ) is a twice differentiable function on R, and for fixed x0 ∈ S 1 , Ψ(x0 , t) is differentiable on (0, +∞). 2. (Initial value) For any x ∈ S 1 , lim Ψ(x, t) = f (x). t→0+

3. (PDE) For all t > 0, −

∂2Ψ ∂Ψ + 4π 2 x2 Ψ = i . 2 ∂x ∂t

(11.7.1)

Note that (11.7.1) can be written as K(Ψ) = i

∂Ψ , ∂t

(11.7.2)

d2f where K is the operator K(f ) = − 2 +4π 2 x2 f from Definition 11.6.4. Again confining our dx interests to formal solutions, as we saw in Section 11.6, the normalized Hermite functions {ψn (x)} form an orthonormal eigenbasis for K, where ψn has the eigenvalue 4π n + 21 . In particular, for any f ∈ L2 (R), f (x) =

∞ X

fˆ(n)ψn (x),

(11.7.3)

n=0

where fˆ(n) is the generalized Fourier coefficient fˆ(n) = hf, ψn i and convergence is in L2 . ∂u The eigenbasis method (Section 11.2), in the case T (Ψ) = i , then gives the solution ∂t Ψ(x, t) =

∞ X

1 fˆ(n)ψn (x)e−4πi(n+ 2 )t .

(11.7.4)

n=0

to Question 11.7.1, as the reader should re-verify (Problem 11.7.1). Having solved Schr¨ odinger’s equation, we turn to what in many ways is a much more difficult question: What does the state function Ψ mean? In fact, this question is not at all straightforward, and is even still, as of this writing (in 2017), a matter of quite some debate. To give one common interpretation, we first rephrase the question in the following way. Question 11.7.2. If the state of a given particle is given by Ψ(x, t), what can we observe, or measure, about that particle?

11.7. THE QUANTUM HARMONIC OSCILLATOR

257

The following axiom then provides a mathematically precise answer to Question 11.7.2. (See Section 14.4 for a further refinement.) Axiom 11.7.3. An observable quantity of our system is represented by a Hermitian operator M in L2 (R). If {ψn } is an orthonormal eigenbasis for M with M (ψn ) = λn ψn , and the X state of our particle is Ψ = cn ψn , with kΨk = 1, then the only possible values of the observable quantity are the eigenvalues λn of M , and upon measurement, the state of the system collapses into the single state ψn corresponding to the observed value λn with probability |cn |2 . For example, the energy of the particle in the quantum harmonic oscillator is represented by the operator K, so if we try to measure the energy of a particle in the state Ψ(x, t) from ˆ 2 (11.7.4), the particle will enter the state ψn with probability f (n) . Axiom 11.7.3 has the following notable features. • Perhaps the most famous, and most unsettling, aspect of this interpretation of quantum mechanics is the randomized nature of observation, famously described by Einstein as “God rolling dice”. The question of whether this feature is desirable or defensible is beyond the scope of this book; we note only that the great majority of experiments involve taking averages over an enormous number of particles, and in such cases, a probabilistic approach gives useful predictions that have an excellent record of being verified by experiment. • More concretely, the reader should verify that the collapsing event represents a genuine probability, or in other words, that the sum of the probabilities of all possible postcollapse states is actually 1; see Problem 11.7.2. • Returning to the energy levels of an oxygen molecule (Question 9.2.1), Axiom 11.7.3 and Theorems 11.6.6 and 11.6.7 predict that (up to scaling constants) the only possible 1 total energy levels of a quantum harmonic oscillator are 4π n + 2 . This phenomenon is the reason for the “quantum” in quantum mechanics, and explains why we only observe discrete energy levels in oxygen. For more about quantum mechanics, including a more general mathematical formulation of quantum mechanics and further references, see Sections 13.5 and 14.4.

Problems 11.7.1. Working formally (i.e., assuming that all series converge and all operators commute with infinite sums), prove that Ψ(x, t) =

∞ X

1 fˆ(n)ψn (x)e−4πi(n+ 2 )t

n=0

is a solution to K(Ψ) = i

∂Ψ , where {ψn (x)} is an orthonormal eigenbasis for K. ∂t

(11.7.5)

258

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

11.7.2. Suppose the state of a particle at time t is given by Ψ(x, t) =

∞ X

1

cn ψn (x)e−4πi(n+ 2 )t ,

(11.7.6)

n=0

with kΨk = 1. Prove that

∞ X

|cn |2 = 1. (In other words, the collapse event of Axiom 11.7.3

n=0

is governed by a genuine probability distribution.)

11.8

Sturm-Liouville theory

In this chapter, we have used several different eigenbases to solve differential equations: d2 in L2 (S 1 ). dx2 2. {sin(2πnx)} for the Laplacian in L2 0, 12 with Dirichlet boundary conditions. 3. {cos(2πnx)} for the Laplacian in L2 0, 12 with Neumann boundary conditions. d df 4. {Pn (x)} (Legendre polynomials) for the operator L(f ) = (1 − x2 ) in L2 ([−1, 1]). dx dx

1. {en (x)} for the Laplacian ∆ = −

5. {hn (x)} (Hermite functions) for the operator K(f ) = −

d2f + 4π 2 x2 f in L2 (R). dx2

The reader may notice that all of the operators in question are some variation on the Laplacian; more precisely, they all have the following form. Definition 11.8.1. Let X be either a closed (possibly infinite) interval in R or S 1 . We define a Sturmian operator to be an operator in L2 (X) of the form d df L(f (x)) = p(x) + r(x)f (x) (11.8.1) dx dx for some real-valued p ∈ C 1 (X) and r ∈ C 0 (X). If X is closed and bounded and p(x) 6= 0 for x ∈ X, we say that L is regular; otherwise, we say that L is singular. Specifically, p(x) = −1 and r(x) = 0 gives the Laplacian; p(x) = (1 − x2 ) and r(x) = 0 gives the Legendre operator; and p(x) = −1 and r(x) = 4π 2 x2 gives the operator associated with the Hermite functions. A Sturmian operator L will be Hermitian under conditions on D(L) that are often met in practice. More precisely, in the case where X is a closed and bounded interval, we have: Theorem 11.8.2. Let d L(f (x)) = dx

df p(x) + r(x)f (x) dx

(11.8.2)

11.8. STURM-LIOUVILLE THEORY

259

be a Sturmian operator in L2 ([a, b]). Then L is Hermitian if and only if for every f, g ∈ D(L), b (11.8.3) p(x)(f (x)g 0 (x) − f 0 (x)g(x)) a = 0. We can rewrite (11.8.3) using the Wronskian

f1 (x) f2 (x) W (f1 , f2 ) = det 0 = f1 (x)f20 (x) − f2 (x)f10 (x). f1 (x) f20 (x)

(11.8.4)

In these terms, (11.8.3) becomes b p(x)W (f, g) a = 0.

(11.8.5)

For more on the Wronskian, see Hartman [Har02, IV.8]. Note that (11.8.3) holds if D(L) satisfies Dirichlet or Neumann boundary conditions or if p(a) = p(b) = 0 (as is true for the Legendre operator). We also have the analogue of (11.8.3) if X = S 1 , by periodicity, or if X = R and D(L) = S(R) (the Schwartz space), by taking limits. Proof. Problem 11.8.1. Sturm-Liouville theory studies the cases and conditions under which a Sturmian operator can be guaranteed to have an eigenbasis, and also studies the resulting eigenbases. For example, the following result, whose proof is beyond the scope of this book, has a relatively straightforward statement. Theorem 11.8.3. Let L be a regular (i.e., p(x) > 0) Sturmian operator in L2 ([a, b]), suppose that D(L) ⊆ C 1 ([a, b]), and suppose that any f ∈ D(L) satisfies the boundary conditions α0 f (a) + α1 f 0 (a) = 0,

β0 f (b) + β1 f 0 (b) = 0,

(11.8.6)

for fixed αi , βi ∈ R such that (α0 , α1 ) and (β0 , β1 ) 6= (0, 0). Then there exists an orthonormal eigenbasis {ϕn } for L with associated eigenvalues {−λn } such that the sequence λn is strictly increasing and lim λn = +∞. n→∞

Note that the boundary conditions (11.8.6) imply that L is Hermitian (Problem 11.8.2). For a proof of Theorem 11.8.3 and much more about Sturm-Liouville theory, see AlGwaiz [AG08].

Problems 11.8.1. Let d L(f (x)) = dx

df p(x) + r(x)f (x) dx

(11.8.7)

260

CHAPTER 11. EIGENBASES AND DIFFERENTIAL EQUATIONS

be a Sturmian operator in L2 ([a, b]). Prove that L is Hermitian if and only if for every f, g ∈ D(L), b (11.8.8) p(x)(f (x)g 0 (x) − f 0 (x)g(x)) a = 0. (Suggestion: As always, parts.) 11.8.2. Let L be a Sturmian operator in L2 ([a, b]), and suppose that for some αi , βi ∈ R such that (α0 , α1 ) and (β0 , β1 ) 6= (0, 0), we have α0 f (a) + α1 f 0 (a) = 0,

β0 f (b) + β1 f 0 (b) = 0,

(11.8.9)

for all f ∈ D(L). Prove that L is Hermitian. (Suggestion: Use the fact from linear algebra that the determinant of a square matrix is 0 if and only if its rows are linearly dependent.)

Part IV

The Fourier transform and beyond

261

Chapter 12

The Fourier transform The integrals which we have obtained are not only general expressions which satisfy the differential equation, they represent in the most distinct manner the natural effect which is the object of the phenomenon. . . . [W]hen this condition is fulfilled, the integral is, properly speaking, the equation of the phenomenon; it expresses clearly the character and progress of it, in the same manner as the finite equation of a line or curved surface makes known all the properties of those forms. — Joseph Fourier, The Analytical Theory of Heat In this chapter, we introduce the Fourier transform, which one may view as the continuous analogue of Fourier series; that is, instead of Fourier coefficients fˆ(n) for n ∈ Z, we look at the transform fˆ(γ) for γ ∈ R. After discussing some context (Section 12.1) and establishing fundamental tools (Section 12.2), we first establish the Fourier transform in the friendly confines of S(R) (Sections 12.3 and 12.4) and then extend it to all of L2 (R) (Section 12.5).

12.1

The big picture

At this point, it seems appropriate to take stock, with the benefit of hindsight, of what we have done so far. In terms of overall theory, one way to look at Part II of this book is that we answered the following question: Question 12.1.1. Given f ∈ L2 (S 1 ), to what extent can we recover f from its Fourier coefficients Z 1 ˆ f (n) = hf, en i = f (x)en (x) dx? (12.1.1) 0

As the reader may recall, Question 12.1.1 is not a question we initially set out to answer in Chapter 6, but it is a question for which we found a complete answer in the Inversion Theorem for Fourier Series (Theorem 8.1.1), which we repeat here in slightly different terms. 263

264

CHAPTER 12. THE FOURIER TRANSFORM

Theorem 12.1.2. If f ∈ L2 (S 1 ), then f =

X

fˆ(n)en (x) in the L2 metric. In particular,

n∈Z

we can recover f completely from its Fourier coefficients fˆ(n). From a theoretical point of view, the Fourier transform is the analogue of the Fourier coefficient mapping f 7→ fˆ(n) that we get when we replace S 1 with R and Z with R. More precisely, consider the following question. Question 12.1.3. Given f ∈ L2 (R), to what extent can we recover f from the function fˆ : R → C defined by Z ∞ ˆ f (x)e−2πiγx dx? (12.1.2) f (γ) = −∞

The function fˆ defined by (12.1.2) is called the Fourier transform of f . Note that with the Fourier transform, instead of having coefficients fˆ(n) that define a function on Z, we have coefficients fˆ(γ) that define a function on R. In any case, we will eventually end up with much the same result: Theorem 12.1.4 (Inversion Theorem for the Fourier Transform). If f ∈ L2 (R) and fˆ(γ) is defined by (12.1.2), then f can be recovered from fˆ(γ) by the inverse Fourier transform Z ∞ f (x) = f (γ)e2πixγ dγ. (12.1.3) −∞

By now, the reader may be used to our delaying the proof of key results like the Inversion Theorem until later (Theorem 12.5.5). What may not be as apparent is that not only are there problems to overcome in the proof of Theorem 12.1.4; there are actually problems in the statement of Question 12.1.3. Most notably, if f ∈ L1 (R), then the integral (12.1.2) is well-defined (Problem 12.1.1), but if we only know that f ∈ L2 (R), then (12.1.2) might not be well-defined (Problem 12.1.2). (In particular, note that (12.1.2) is not an inner product in L2 , because e2πiγx ∈ / L2 (R).) Conversely, if we only know that f ∈ L1 (R), fˆ may not be 1 in L (R) (Problem 12.1.3), causing problems defining the inverse transform (12.1.3). We therefore need to find a way to extend the definition of (12.1.2) to all of L2 (R). Our basic strategy will be to develop the necessary theory in the function space S(R) first and then, since S(R) is dense in L2 (R), extend to L2 (R) by taking limits. Exactly how this works will unfold in the rest of this chapter, but in any case, we hope the reader now has some idea of why will we need to prove many of our main results twice. We also take this opportunity to review/recap material we will need about calculus on the space of Schwartz functions S(R). Specifically, we remind the reader: • Section 4.7 describes S(R) and its basic properties. • Section 4.8 carries over results for integrals on finite intervals to improper integrals on R. In particular, we have improper versions of integration by parts (Theorem 4.8.7), differentiating an integral Z(Theorem 4.8.8), and Fubini’s Theorem (Theorem 4.8.11). ∞

We also have the integral −∞

2

e−πx dx = 1 (Theorem 4.8.6).

12.1. THE BIG PICTURE

265

In connection with the latter material, we will need several background lemmas. As with Section 4.8 itself, the first-time reader may choose to take these lemmas as given and return to their proofs later. Our first lemma is a “separation of variables” trick borrowed from Stein and Shakarchi [SS03, Ch. 5, Prop. 1.11]. Lemma 12.1.5. If f ∈ C 0 (R) is rapidly decaying (Definition 4.7.1), then for any k ≥ 0, there exists some Ck > 0 such that ! Ck (1 + |y|)k (12.1.4) |f (x − y)| ≤ k |x| for all x, y ∈ R. Proof. Problem 12.1.4. Our other lemma describes some specific cases when Fubini’s Theorem 4.8.11 applies. ∂G ∂G , and are ∂x ∂y all bounded. Then the following functions satisfy the hypotheses of Fubini’s Theorem 4.8.11. Lemma 12.1.6. Suppose f, g ∈ S(R), and G ∈ C 1 (R2 ) is such that G,

1. The function F (x, y) = G(x, y)f (x)g(y). 2. The function F (x, y) = G(x, y)f (x − y)g(y). Proof. Problem 12.1.5.

Problems 12.1.1. Prove that if f ∈ L1 (R), then (12.1.2) is well-defined. (Suggestion: See Definition 7.5.1.) 12.1.2. Define f : R → R by  0 f= 1  x

if x < 1, (12.1.5)

if x ≥ 1.

Prove that f ∈ L2 (R), but for any u ∈ R, f (x)e2πiγx is not Lebesgue integrable on R. (Suggestion: Definition 7.5.1 and Theorem 7.5.2.) 12.1.3. Define f : R → R by ( 1 if −1 ≤ x ≤ 1 f= 0 otherwise. (a) Prove that for γ 6= 0, Z ∞ Z −2πiγx ˆ f (γ) = f (x)e dx = −∞

1

−1

e−2πiγx dx =

(12.1.6)

sin(2πγ) . πγ

(12.1.7)

266

CHAPTER 12. THE FOURIER TRANSFORM

(b) Let g(x) =

sin(πx) . Prove that for n ∈ N, πx Z n+1 |g(x)| dx ≥ n

1 π(n + 1)

2 . π

(12.1.8)

(Suggestion: Bound the integrand |g(x)| below by the “worst case” denominator.) Z ∞ sin(πx) ˆ / L1 (R). (c) Prove that the improper integral πx dx diverges, and therefore, f ∈ 1 (Suggestion: Sum the result of (b).) 12.1.4. (Proves Lemma 12.1.5 ) Prove that if f ∈ C 0 (R) is rapidly decaying (Definition 4.7.1), then for any k ≥ 0, there exists some Ck > 0 such that |f (x − y)| |x|k ≤ Ck (1 + |y|)k |x|k for all x, y ∈ R. In other words, prove that |f (x − y)| (1 + |y|)k (Suggestion: Consider two cases, |x| ≤ 2 |y| and |x| ≥ 2 |y|.)

(12.1.9) ! is bounded on R × R.

12.1.5. (Proves Lemma 12.1.6 ) Suppose f, g ∈ S(R) and G(x, y) ∈ C 1 (R2 ) is such that ∂G ∂G G, , and are all bounded. ∂x ∂y ∂F ∂F , and are all integrable by ∂x ∂y separation (Definition 4.8.9). (Suggestion: Example 4.8.5.) ∂F ∂F (b) Let F (x, y) = G(x, y)f (x − y)g(y). Prove that F , , and are all integrable by ∂x ∂y separation. (Suggestion: Lemma 12.1.5.) (a) Let F (x, y) = G(x, y)f (x)g(y). Prove that F ,

12.2

Convolutions, Dirac kernels, and calculus on R

Recall that convolution (Section 8.2), Dirac kernels (Section 8.3), and well-chosen substitutions (such as Lemma 8.2.1) were key tools in proving the Inversion Theorem for Fourier Series in Section 8.4. In this section, we establish the analogous results for S(R), replacing integrals on S 1 with (improper) integrals on R. We begin with the R versions of translation invariance and scaling. Lemma 12.2.1. For f ∈ S(R), a ∈ R, a 6= 0, we have Z ∞ Z ∞ f (x + a) dx = f (x) dx, −∞ −∞ Z ∞ Z ∞ 1 f (ax) dx = f (x) dx. |a| −∞ −∞

(12.2.1) (12.2.2)

12.2. CONVOLUTIONS, DIRAC KERNELS, AND CALCULUS ON R

267

Note the sign when a < 0 in (12.2.2), which is perhaps the only surprise here. Proof. Problem 12.2.1. Our first big idea to consider is convolution on R. Definition 12.2.2. For f, g ∈ L2 (R), the convolution f ∗ g : R → C is defined by the formula Z ∞ f (x − t)g(t) dt. (12.2.3) (f ∗ g)(x) = −∞

Note that if f (t) ∈ L2 (R) as a function of t, so are f (−t) and f (x − t), which means that (12.2.3) is well-defined; in fact, for fixed x, (12.2.3) is the inner product of f (x − t) and g(t) as functions of t. Convolutions on R have the same properties as convolutions on S 1 (Theorem 8.2.4), though again, the new wrinkle is that we need to be careful about convergence of integrals on R. In addition, we have the key additional property that convolution preserves S(R), or more generally, the property of rapid decay. Theorem 12.2.3. If f, g ∈ C 0 (R) are rapidly decaying (Definition 4.7.1), then f ∗ g is rapidly decaying. Moreover, suppose f, g, h ∈ S(R). Then: 1. (f ∗ g)(x) = (g ∗ f )(x). 2. ((f ∗ g) ∗ h)(x) = (f ∗ (g ∗ h))(x). df d ((f ∗ g)(x)) = ∗ g (x). 3. dx dx 4. f ∗ g ∈ S(R). Proof. Problems 12.2.2–12.2.6. Next, we define Dirac kernels in much the same way as the analogous kernels defined on S 1 . The main difference is that instead of an integer parameter n → ∞, we use a continuous parameter t → 0. Definition 12.2.4. A Dirac kernel on R is a one-parameter family of continuous functions Kt : R → R (t ∈ R, t > 0) that are integrable on R such that: 1. For all t > 0 and all x ∈ R, Kt (x) ≥ 0; Z ∞ 2. For all t > 0, Kt (x) dx = 1; and −∞

268

CHAPTER 12. THE FOURIER TRANSFORM

3. For any fixed η > 0, we have Z lim

t→0 |x|≥η

Kt (x) dx = 0,

(12.2.4)

or in other words, for any η > 0 and > 0, there exists some δ(η, ) > 0 such that for 0 < t < δ(η, ), we have Z η 1−< Kt (x) dx ≤ 1. (12.2.5) −η

As in Section 8.4, the key result is: Theorem 12.2.5. If {Kt } is a Dirac kernel, and f ∈ S(R), then lim(f ∗ Kt )(x) = f (x)

(12.2.6)

t→0

uniformly on R (i.e., with convergence independent of x ∈ R). To prove Theorem 12.2.5, following the proof of Theorem 8.4.1, we first bound the integral of |f (x − y) − f (x)| |Kt (y)| for y close to 0. Lemma 12.2.6. For any 1 > 0, there exists some η1 (1 ) > 0 such that for 0 < η < η1 (1 ), any x ∈ R, and any t > 0, we have Z η |f (x − y) − f (x)| |Kt (y)| dy < 1 . (12.2.7) −η

Proof. Problem 12.2.7. Secondly, for fixed η > 0, by keeping y away from 0 and letting t → 0, we can also force the integral of |f (x − y) − f (x)| |Kt (y)| on |y| ≥ η to be as small as we like. Lemma 12.2.7. For any fixed η > 0 and 2 > 0, there exists some δ2 (η, 2 ) such that for 0 < t < δ2 (η, 2 ) and any x ∈ R, we have Z |f (x − y) − f (x)| |Kt (y)| dy < 2 . (12.2.8) |y|≥η

Proof. Problem 12.2.8. As before, Lemmas 12.2.6 and 12.2.7 combine to prove the desired theorem. Proof of Theorem 12.2.5. Problem 12.2.9. Of course, Theorem 12.2.5 is not much use without a concrete example of a Dirac kernel. Example 12.2.8. The Gauss kernel {Gt } is 1 Gt (x) = exp t as shown in Figure 12.2.1.

−πx2 t2

,

(12.2.9)

12.2. CONVOLUTIONS, DIRAC KERNELS, AND CALCULUS ON R

269

Figure 12.2.1: The Gauss kernel Gt (x) (t = 1, 12 , 14 ) Theorem 12.2.9. The Gauss kernel is a Dirac kernel. Proof. Problem 12.2.10. Remark 12.2.10. We pause here to give an interpretation Z ∞ of the convolution f ∗ g that complements the one given in Remark 8.2.5. Suppose g(t) dt = 1 and g(t) ≥ 0. If we −∞

think of the integrand f (x − t)g(t) dt as being the value of f taken from x − t with weight g(t), then Z ∞ (f ∗ g)(x) = f (x − t)g(t) dt, (12.2.10) −∞

the value of f ∗ g at x, is obtained by averaging the values of f taken from x − t with weight g(t). Very loosely speaking, keeping Figure 12.2.1 and the example g(x) = Gt (x) in mind, the values of (f ∗ g)(x) are obtained by taking each f (x) and smearing it out to nearby values x + t with weight g(t). (Note that the + in x + t is not a mistake: If the value of f is taken from x − t with weight g(t), then the value of f is sent to x + t with weight g(t).) Again, see Sections 13.3 and 14.2 for further related discussion and other interpretations of convolution.

Problems 12.2.1. (Proves Lemma 12.2.1 ) Suppose f ∈ S(R), a ∈ R. Z ∞ Z ∞ (a) Prove that f (x + a) dx = f (x) dx. −∞

−∞

270

CHAPTER 12. THE FOURIER TRANSFORM

Z 1 ∞ (b) For a > 0, prove that f (ax) dx = f (x) dx. a −∞ −∞ Z ∞ Z 1 ∞ f (ax) dx = − (c) For a < 0, prove that f (x) dx. a −∞ −∞ Z

∞

(Suggestion for all parts: Finite substitution and take limits. Note the sign in (c).) 12.2.2. (Proves Theorem 12.2.3 ) Suppose f, g ∈ C 0 (R) are rapidly decaying (Definition 4.7.1). Prove that (f ∗ g)(x) is rapidly decaying. (Suggestion: Use Lemma 12.1.5 to bound |x|k (f ∗ g)(x).) 12.2.3. (Proves Theorem 12.2.3 ) For f, g ∈ S(R), prove that (f ∗ g)(x) = (g ∗ f )(x). (Suggestion: Substitution and translation invariance.) 12.2.4. (Proves Theorem 12.2.3 ) For f, g, h ∈ S(R), prove that ((f ∗ g) ∗ h)(x) = (f ∗ (g ∗ h))(x).

(12.2.11)

(Suggestion: Substitution, translation invariance, Lemma 12.1.6, and Fubini’s Theorem 4.8.11.) 12.2.5. (Proves Theorem 12.2.3 ) For f, g ∈ S(R), prove that d df ((f ∗ g)(x)) = ∗ g (x). dx dx

(12.2.12)

(Suggestion: Theorem 4.8.8 and the fact that f and f 0 are bounded.) 12.2.6. (Proves Theorem 12.2.3 ) Suppose f, g ∈ S(R). Prove that f ∗ g ∈ S(R). (Suggestion: Use the previous parts of Theorem 4.8.8.) For Problems 12.2.7–12.2.9, assume that {Kt } is a Dirac kernel (Definition 12.2.4) and f ∈ S(R). 12.2.7. (Proves Lemma 12.2.6 ) Prove that for any 1 > 0, there exists some η1 (1 ) > 0 such that for 0 < η < η1 (1 ), any x ∈ R, and any t > 0, we have Z η |f (x − y) − f (x)| |Kt (y)| dy < 1 . (12.2.13) −η

(Suggestion: Corollary 4.7.3.) 12.2.8. (Proves Lemma 12.2.7 ) Prove that for any fixed η > 0 and 2 > 0, there exists some δ2 (η, 2 ) such that for 0 < t < δ2 (η, 2 ) and any x ∈ R, we have Z |f (x − y) − f (x)| |Kt (y)| dy < 2 . (12.2.14) |t|≥η

(Suggestion: Use the fact that f is bounded on R.)

12.3. THE FOURIER TRANSFORM ON S(R)

271

12.2.9. (Proves Theorem 12.2.5 ) Prove that for any > 0, there exists some δ(f, ), not depending on x ∈ R, such that for all x ∈ R and all t > 0, if t < δ(f, ), then |(f ∗ Kt )(x) − f (x)| < . In other words, Z prove that f ∗ Kt converges uniformly to f on R. ∞

f (x)Kt (y) dy (why?) and compare (f ∗ Kn )(x).)

(Suggestion: Use the fact that f (x) = −∞

12.2.10. (Proves Theorem 12.2.9 ) Define Gt (x) = Z

1 exp t

−πx2 t2

.

(12.2.15)

∞

Gt (x) dx = 1. (Suggestion: Theorem 4.8.6 and substi-

(a) Prove that for any t > 0, −∞

tution.) Z (b) Fix η > 0. Prove that lim

η

t→0 −η

Gt (x) dx = 1. (Suggestion: Substitute y = x/t and pay

close attention to the limits of integration.)

12.3

The Fourier transform on S(R)

We can now finally define the Fourier transform on S(R). Definition 12.3.1. For f ∈ S(R), we define the Fourier transform of f to be Z ∞ fˆ(γ) = f (x)e−2πiγx dx.

(12.3.1)

−∞

Note that the integral in (12.3.1) is well-defined by Example 4.8.5. Remark 12.3.2. We will sometimes write the Fourier transform of f as U (f ) = fˆ. When using this notation, it will sometimes be useful to let the transform variable also be x, or in other words, Z ∞ ˆ (U (f ))(x) = f (x) = f (y)e−2πixy dy. (12.3.2) −∞

As we shall see, the alternate choice of variables in (12.3.2) is especially useful when we consider the Fourier transform as an operator, so we call this choice of variables and the notation U (f ) for fˆ the operator notation for the Fourier transform. We collect some important properties of the Fourier transform in the following theorem. Theorem 12.3.3. If the Fourier transform of f ∈ S(R) is fˆ(γ), and a, b ∈ R, b > 0, then the Fourier transforms of certain transformations of f are given by Table 12.3.1. In particular, the Fourier transform of f ∈ S(R) is differentiable. Proof. Problems 12.3.1–12.3.3.

272

CHAPTER 12. THE FOURIER TRANSFORM Function (in x)

Fourier transform (in γ)

f (x + a)

e2πiaγ fˆ(γ)

e2πiax f (x)

f (−x)

fˆ(γ − a) 1 ˆ γ f b b fˆ(−γ)

f 0 (x)

(2πiγ)fˆ(γ)

(−2πix)f (x)

fˆ0 (γ)

f (bx)

Table 12.3.1: Some Fourier transform identities Remark 12.3.4. It will also be useful to restate Table 12.3.1 using operator notation (Remark 12.3.2). First, for any a, b ∈ R and any polynomial p(x), we define operators τa , µa , sb , and Mg(x) on S(R) by (µa (f ))(x) = e2πiax f (x),

(τa (f ))(x) = f (x + a), (sb (f ))(x) = f (bx),

(Mp(x) (f ))(x) = p(x)f (x).

(12.3.3)

Note that one must be careful when considering compositions of operators; for example, since s−1 (f )(x) = f (−x), (τ−a (s−1 (f )))(x) = (s−1 (f ))(x − a) = f (−(x − a)) = f (a − x).

(12.3.4)

In any case, we may restate Table 12.3.1 as: U (τa (f )) = µa (U (f )), 1 U (sb (f )) = s1/b (U (f )), b d U (f ) = M2πix (U (f )), dx

U (µa (f )) = τ−a (U (f )), U (s−1 (f )) = s−1 (U (f )), U (M−2πix (f )) =

(12.3.5)

d U (f ). dx

where again, we assume that b > 0. As a result of Theorem 12.3.3, we see that the Fourier transform preserves S(R): Corollary 12.3.5. If f ∈ S(R), then fˆ ∈ S(R). Proof. Problem 12.3.4. The following formula should be familiar from its S 1 analogue (Theorem 8.2.4), and is proven similarly. (Note that f ∗ g ∈ S(R) by Theorem 12.2.3.) Theorem 12.3.6. If f, g ∈ S(R), then f[ ∗ g(γ) = fˆ(γ)ˆ g (γ).

12.3. THE FOURIER TRANSFORM ON S(R)

273

Proof. Problem 12.3.5. We also have the following handy theorem, which we call the “Pass the Hat” formula. Theorem 12.3.7 (Pass the Hat). For f, g ∈ S(R), we have that Z ∞ Z ∞ ˆ f (x)ˆ g (x) dx. f (x)g(x) dx = −∞

(12.3.6)

−∞

Proof. Problem 12.3.6. The reader may have noticed that we have yet to calculate any specific examples of Fourier transforms. There are two good reasons: First, many natural examples are not in S(R); and second, even for functions in S(R), this calculation is not easy. We do, however, have the following crucial example. 2 2 Theorem 12.3.8. The Fourier transform of f (x) = e−πx is fˆ(γ) = e−πγ ; in other words, f is its Fourier transform, or U (f ) = f . More generally, for t > 0, let Gt (x) = own2 1 −πx exp be the Gauss kernel. Then t t2

ˆˆ U (U (Gt )) = G t = Gt .

ˆ t (γ) = e−πt2 γ 2 , G

(12.3.7)

Proof. Problem 12.3.7.

Problems 12.3.1. (Proves Theorem 12.3.3 ) Suppose f ∈ S(R). (a) Let g(x) = f (x + a). Prove that gˆ(γ) = e2πiaγ fˆ(γ). ˆ (b) Let h(x) = e2πiax f (x). Prove that h(γ) = fˆ(γ − a). 12.3.2. (Proves Theorem 12.3.3 ) Suppose f ∈ S(R) and b > 0. 1 γ (a) Let g(x) = f (bx). Prove that gˆ(γ) = fˆ . b b ˆ (b) Let h(x) = f (−x). Prove that h(γ) = fˆ(−γ). (Suggestion: Lemma 12.2.1.) 12.3.3. (Proves Theorem 12.3.3 ) Suppose f ∈ S(R). (a) Let g(x) = f 0 (x). Prove that gˆ(γ) = (2πiγ)fˆ(γ). (Suggestion: Parts.) ˆ (b) Let h(x) = (−2πix)f (x). Prove that h(γ) = fˆ0 (γ). (Suggestion: Theorem 4.8.8.) 12.3.4. (Proves Corollary 12.3.5 ) Suppose f ∈ S(R). Z ∞ ˆ (a) Prove that for any γ ∈ R, f (γ) ≤ |f (x)| dx. (It follows that fˆ is bounded.) −∞

274

CHAPTER 12. THE FOURIER TRANSFORM

(b) Prove that for any n, k ≥ 0,

γ n fˆ(k) (γ)

is the Fourier transform of

1 d 2πi dx

n

(−2πix)k f (x).

(c) Prove that for any n, k ≥ 0, γ n fˆ(k) (γ) is bounded. (Suggestion: Use Problem 4.7.2.) 12.3.5. (Proves Theorem 12.3.6 ) For f, g ∈ S(R), prove that f[ ∗ g(γ) = fˆ(γ)ˆ g (γ). (Suggestion: Lemma 12.1.6, Fubini’s Theorem 4.8.11, and Lemma 12.2.1.) Z ∞ 12.3.6. (Proves Theorem 12.3.7 ) Prove that if f, g ∈ S(R), then fˆ(x)g(x) dx = −∞ Z ∞ f (x)ˆ g (x) dx. (Suggestion: Lemma 12.1.6 and Fubini’s Theorem 4.8.11. The variables −∞

choice in (12.3.2) may also be helpful.) 2 12.3.7. (Proves Theorem 12.3.8 ) Let f (x) = e−πx , and let y = F (γ) = fˆ(γ).

(a) Prove that F 0 (γ) = −2πγF (γ). (Suggestions: Use Theorems 3.6.23 and 12.3.3.) (b) Find the value of F (0) by direct calculation. (c) Prove that fˆ = f by solving the differential equation F 0 (γ) = −(2πγ)F (γ). −πx2 1 . Prove that (d) For t ∈ R, t > 0, let Gt (x) be the Gauss kernel Gt (x) = exp t t2 ˆ t (γ) = e−πt2 γ 2 . (Suggestion: Use Theorem 12.3.3 instead of doing more integrals.) G ˆ ˆ t = Gt . (e) Prove that G

12.4

Inversion and the Plancherel theorem

In this section, we prove two important theorems about Fourier transforms on S(R). First: Theorem 12.4.1 (Inversion Theorem in S(R)). For f ∈ S(R), we have that Z ∞ ˆ ˆ f (x) = fˆ(γ)e−2πiγx dγ = f (−x).

(12.4.1)

−∞

In other words, replacing x with −x, for f ∈ S(R), we have Z ∞ fˆ(γ)e2πiγx dγ = f (x).

(12.4.2)

−∞

The operation on the left-hand side of (12.4.2) is therefore known as the inverse Fourier transform. Note that because of our conventions of where to put 2π, our inverse transform greatly resembles our forward transform. With other conventions, a factor of 2π appears more prominently in the inverse; see Remark 13.1.1. In any case, as indicated by its title, Theorem 12.4.1 (or rather, its L2 analogue yet to come) is the Fourier transform analogue of the Inversion Theorem for Fourier Series 8.1.1.

12.4. INVERSION AND THE PLANCHEREL THEOREM

275

Note also that in operator notation (Remarks 12.3.2 and 12.3.4), Theorem 12.4.1 is equivalent to saying that U (U (f )) = s−1 (f ) for all f ∈ S(R). Our second theorem is the transform version of the Isomorphism Theorem for Fourier Series 7.6.7, though the statement may not appear analogous at first. D E Theorem 12.4.2 (Isomorphism Theorem in S(R)). For f, g ∈ S(R), we have that fˆ, gˆ = hf, gi, or in operator terms, hU (f ), U (g)i = hf, gi . (12.4.3)

In particular, kf k = fˆ . In operator terms, Theorem 12.4.2 (or again, its L2 analogue to come) says that the operator U is an isomorphism of Hilbert spaces. (Compare Remark 7.6.8.) Operators that satisfy (12.4.3) are also known as unitary operators. Turning to the proof of Theorems 12.4.1 and 12.4.2, we begin with the following lemmas, which we use to hide a bit of grunge. ˆ Lemma 12.4.3. For f ∈ S(R) and constant x ∈ R, let hx (y) = f (−x−y). Then fˆ(x−y) = ˆˆ h x (y), where the Fourier transform is calculated in the variable y. Proof. Using operator notation (Remark 12.3.4) in the variable y, we first observe that since (s−1 (f ))(y) = f (−y), (τx (s−1 (f )))(y) = f (−(y + x)) = f (−x − y) = hx (y).

(12.4.4)

Also recall that by (12.3.5), we have that τ−x s−1 U U = τ−x U U s−1 = U µx U s−1 = U U τx s−1 .

(12.4.5)

It follows that, still working with operator notation in the variable y, ˆ ˆ ˆ ˆˆ fˆ(x − y) = (τ−x (fˆ))(−y) = (τ−x (s−1 (fˆ)))(y) = (U (U (τx (s−1 (f ))))(y) = h x (y).

(12.4.6)

The lemma follows. ˆ Lemma 12.4.4. For g ∈ S(R), let h(x) = g(x). Then gˆ(γ) = h(−γ). Proof. Problem 12.4.1. We now prove our main results. Proof of Theorem 12.4.1. Suppose f ∈ S(R). Define g(x) = f (−x) = (s−1 (f ))(x), and for fixed x ∈ R, define hx (y) = f (−x − y) = g(x + y), as in Lemma 12.4.3.

276

CHAPTER 12. THE FOURIER TRANSFORM Then for any fixed t > 0, we have Z ∞ ˆ ˆ (fˆ ∗ Gt )(x) = fˆ(x − y)Gt (y) dy Z−∞ ∞ ˆ ˆ x (y)Gt (y) dy = h −∞ Z ∞ ˆˆ = hx (y)G t (y) dy −∞ Z ∞ = g(x + y)Gt (y) dy Z−∞ ∞ = g(x − y)Gt (−y) dy −∞ Z ∞ = g(x − y)Gt (y) dy

(Lemma 12.4.3) (Pass the Hat 12.3.7) (Theorem 12.3.8)

(12.4.7)

(Lemma 12.2.1) (Gt is an even function)

−∞

= (g ∗ Gt )(x). The theorem follows by taking lim on both sides and applying Theorem 12.2.5. t→0

Finally, for the proof of Theorem 12.4.2, see Problem 12.4.2.

Problems ˆ 12.4.1. For g ∈ S(R), let h(x) = g(x). Prove that gˆ(γ) = h(−γ). D E 12.4.2. Suppose f, g ∈ S(R). Prove that fˆ, gˆ = hf, gi. (Suggestions: Pass the hat, \ = h(−γ) ˆ inversion, and Lemma 12.4.4. You may also need the fact that h(−γ) (why?).)

12.5

The L2 Fourier transform

In this section, we extend the definition of the Fourier transform and the results of Sections 12.3 and 12.4 from the Schwartz space S(R) to L2 (R). Instead of hard work, we will achieve this mainly via: TOTAL ABSTRACT NONSENSE Recall that by Corollary 8.5.7 and Theorem 2.4.15, every f ∈ L2 (R) is the limit in L2 of some sequence of functions in S(R). The following definition therefore at least makes some initial sense. Definition 12.5.1. For f ∈ L2 (R), choose some sequence fn in S(R) such that lim fn = f . n→∞ We define the Fourier transform fˆ of f to be fˆ = lim fˆn , n→∞

where fˆn is the Fourier transform of fn as a function in S(R) (Definition 12.3.1).

(12.5.1)

12.5. THE L2 FOURIER TRANSFORM

277

Theorem 12.5.2. For f ∈ L2 (R), the Fourier transform fˆ from Definition 12.5.1 is a well-defined function in L2 (R). Specifically: 1. If fn is a sequence of functions in S(R) such that lim fn = f , then the sequence fˆn n→∞ converges to some fˆ ∈ L2 (R). 2. If fn and gn are two sequences in S(R) such that lim fn = f = lim gn , then n→∞ n→∞ lim fˆn = lim gˆn . n→∞

n→∞

Proof. Problems 12.5.1 and 12.5.2. To make sure the theory works correctly, we also need to extend the inversion and isomorphism theorems to L2 functions; and for applications, we need to extend some of the formal properties of the Fourier transform (Theorems 12.3.3 and 12.3.6) to L2 functions. We begin with the following lemma. Lemma 12.5.3. As in Remark 12.3.4, let s−1 : L2 (R) → L2 (R) be defined by (s−1 (f ))(x) = f (−x). Then s−1 is unitary, and therefore, bounded and continuous. Proof. For f, g ∈ L2 (R), Lemma 12.2.1 shows that Z ∞ Z hs−1 (f ), s−1 (g)i = f (−x)g(−x) dx = −∞

∞

f (x)g(x) dx = hf, gi .

(12.5.2)

−∞

The lemma follows. Curiously, it will be convenient to reverse the order of proof of the inversion and isomorphism theorems that we used in Section 12.4. Theorem 12.5.4 Transform). For f, g ∈ L2 (R),

D E(Isomorphism Theorem for the Fourier

we have that fˆ, gˆ = hf, gi. In particular, kf k = fˆ . Proof. Problem 12.5.3. Theorem 12.5.5 (Inversion Theorem for the Fourier Transform). For f ∈ L2 (R), we have ˆ ˆ that fˆ(x) = (s−1 (f ))(x), or in other words, f (x) = fˆ(−x). Proof. Problem 12.5.4. Perhaps the trickiest part of defining the L2 Fourier transform is actually making sure that a concrete formula like (12.3.1) in our original definition (Definition 12.3.1) makes sense. The main technical issue is that if f ∈ L2 (R) but f ∈ / L1 (R), then because Z ∞ Z ∞ f (x)e−2πiγx dx = |f (x)| dx = +∞, (12.5.3) −∞

−∞

the formula (12.3.1) is not well-defined as an ordinary Lebesgue integral (see Definition 7.5.1). We therefore need to use “improper Lebesgue integrals” in the following sense.

278

CHAPTER 12. THE FOURIER TRANSFORM

Theorem 12.5.6. For f ∈ L2 (R) and b > 0, define ( f (x) if |x| ≤ b, fb (x) = 0 otherwise. Then fˆ(γ) = lim fˆb (γ) = lim b→∞

Z

b

b→∞ −b

(12.5.4)

f (x)e−2πiγx dx.

(12.5.5)

Proof. We first consider the special case of g ∈ L2 (R) with support contained in [−b, b], that is, g(x) = 0 for |x| > b. In that case, the theorem reduces to proving that Z

g(x)e−2πiγx dx =

gˆ(γ) =

Z

b

g(x)e−2πiγx dx.

(12.5.6)

−b

Note that while (12.5.6) certainly seems reasonable, gˆ(γ) is only defined to be the limit of Fourier transforms of functions in S(R), so we need to find a sequence providing the desired result. The idea is that we want to find functions gn (x) ∈ Cc∞ (R) (see Theorem 8.5.6) that approximate g(x) closely on [−b, b] and vanish outside a small neighborhood of [−b, b]; see Figure 12.5.1, in which the dashed lines represent g(x) and the solid lines represent gn (x), for what this might look like.

b+δ

−b−δ

Figure 12.5.1: Smoothly approximating a function with compact support To be precise, for each n ∈ N, by Corollary 8.5.3, choose hn (x) ∈ C ∞ (R) (for example, a trigonometric polynomial) such that Z

b

|hn (x) − g(x)|2 dx <

−b

1 . 2n

(12.5.7)

Note that we are not yet done because it may be the case that hn (x) is large outside [−b, b]. Next, let Mn = max {|hn (x)| | −b − 1 ≤ x ≤ b + 1} ,

δn =

1 4n(Mn + 1)2

< 1,

(12.5.8)

12.5. THE L2 FOURIER TRANSFORM

279

and by Theorem 8.5.5, choose a bump function ϕn (x) ∈ C ∞ (R) with ϕ(x) = 1 on [−b, b] and ϕ(x) = 0 for x ∈ / [−b − δn , b + δn ]. Finally, let gn (x) = hn (x)ϕn (x), which is in S(R) because ϕn (x) has compact support. Then Z ∞ |gn (x) − g(x)|2 dx −∞ −b

Z

|gn (x)|2 dx +

=

b

|hn (x) − g(x)|2 dx +

−b

−b−δn

<

Z

Z

b+δn

|gn (x)|2 dx

(12.5.9)

b

Mn2 1 Mn2 1 + + ≤ . 2 2 4n(Mn + 1) 2n 4n(Mn + 1) n

It follows that lim gn = g in L2 (R), and therefore, in L2 ([−b − 1, b + 1]). Moreover, n→∞

Z gˆn (γ) −

b

g(x)e

−2πiγx

−b

Z dx ≤

b+1

|gn (x) − g(x)| dx.

(12.5.10)

−b−1

By Theorem 7.5.15, both sides of (12.5.10) go to 0 as n → ∞, and the theorem follows in the special case. Returning to the general case, since kfb k ≤ kf k, fb ∈ L2 (R), which means that fb has a well-defined Fourier transform. Moreover, by the Isomorphism Theorem for Fourier Transforms 12.5.4,

lim fˆ − fˆb = lim kf − fb k = 0, (12.5.11) b→∞

b→∞

so fˆb conveges to fˆ in the L2 metric. The theorem then follows from the special case. In any case, Theorem 12.5.6 now ensures that we may compute the Fourier transforms of functions in L2 (R) by (improper) integration. See Section 13.1 for some examples. For applications, we also need to know that many of the formal properties of the Fourier transform on S(R) extend to L2 transforms. We first require a lemma. Lemma 12.5.7. If f ∈ C 1 (R) ∩ L2 (R) and f 0 ∈ L2 (R), then lim f (x) = 0. x→±∞

Proof. By the Fundamental Theorem of Calculus, we see that Z x Z x d(f (t)2 ) (f (x))2 = (f (0))2 + dt = (f (0))2 + 2 f (t)f 0 (t) dt. (12.5.12) dt 0 0 Z ∞ 0 2 f (t)f 0 (t) dt is finite, and so lim f (x) = L for some However, since f, f ∈ L (R), x→∞ 0 Z ∞ L ∈ C. However, if L 6= 0, then f (x) dx diverges, so lim f (x) = 0. The same 0

x→∞

argument shows that lim f (x) = 0. x→−∞

Theorem 12.5.8. If f, g ∈ C 0 (R) ∩ L2 (R), then the L2 Fourier transforms of f and g have the following properties.

280

CHAPTER 12. THE FOURIER TRANSFORM

1. The first four rows of Table 12.3.1 (see Theorem 12.3.3) hold. 2. If f ∈ C 1 (R) ∩ L2 (R) and f 0 (x) ∈ L2 (R), then fb0 (γ) = (2πiγ)fˆ(γ). ∂F ∂F , and are all ∂x ∂y integrable by separation (Definition 4.8.9), then f[ ∗ g(γ) = fˆ(γ)ˆ g (γ).

3. Let F (x, y) = f (x − y)g(y). If f, g ∈ C 1 (R) ∩ L2 (R) and F ,

Proof. Since the proofs of the first four rows of Table 12.3.1 rely only on substitution, they extend to prove Claim (1) under the above hypotheses. The previous proof of Claim (2) also extends under the stated conditions, as it uses only integration by parts and the results of Lemma 12.5.7. Finally, since the limit definition (Definition 12.5.1) and improper integral formula for the Fourier transform agree for functions in C 0 (R) ∩ L2 (R) (Theorem 12.5.6), and the proof of Theorem 12.3.6 relies only on Fubini’s Theorem and substitution, that proof extends to the above hypotheses. Remark 12.5.9. The awkwardness of our hypotheses for Theorem 12.5.8(3) stems from the awkwardness of the hypotheses for our version of Fubini’s Theorem (Theorem 4.8.11). Given a full development of the Lebesgue integral and measure theory, one can use the much more natural hypotheses that f, g ∈ L1 (R); see Rudin [Rud86, Thm. 9.2] for details and a proof.

Problems 12.5.1. (Proves Theorem 12.5.2 ) Suppose fn is a sequence of functions in S(R) such that lim fn = f (in L2 ). Prove that the sequence fˆn converges to some g ∈ L2 (R). (Suggestion: n→∞ Use the Isomorphism Theorem in S(R) 12.4.2 to prove that the sequence fˆn is Cauchy.) 12.5.2. (Proves Theorem 12.5.2 ) Prove that if fn and gn are two sequences in S(R) such that lim fn = f = lim gn , then lim fˆn = lim gˆn . (Suggestion: Put everything on one n→∞

n→∞

n→∞

n→∞

side.) D E 12.5.3. (Proves Theorem 12.5.4 ) Suppose f, g ∈ L2 (R). Prove that fˆ, gˆ = hf, gi. (Suggestion: Use the continuity of the inner product and the Isomorphism Theorem in S(R) 12.4.2.) ˆ 12.5.4. (Proves Theorem 12.5.5 ) Prove that if f ∈ L2 (R), then fˆ(x) = (s−1 (f ))(x). (Sugˆ gestion: Given lim fn = f , compute fˆ by finding a sequence of functions in S(R) whose n→∞

limit is fˆ. Lemma 12.5.3 also helps.)

Chapter 13

Applications of the Fourier transform I think it isn’t a stretch to say that [the Fourier transform] is one of the most widely applicable mathematical discoveries, with applications ranging from optics to quantum physics, radio astronomy, MP3 and JPEG compression, Xray crystallography, voice recognition, and PET or MRI scans. . . . [It] was even used by James Watson and Francis Crick to decode the double helix structure of DNA from the X-ray patterns produced by Rosalind Franklin. You probably use a descendant of Fourier’s idea every day, whether you’re playing an MP3, viewing an image on the web, asking Siri a question, or tuning in to a radio station. — Aatish Bhatia, “The Math Trick Behind MP3s, JPEGs, and Homer Simpson’s Face”, Nautilus, Nov 6, 2013 In this chapter, we discuss some applications of the Fourier transform. Specifically, after presenting a table of Fourier transforms needed for applications (Section 13.1), we use Fourier transforms to solve differential equations, both ODEs (Section 13.2) and PDEs (Section 13.3), we extend our previous discussions of Hermite functions (Section 13.4) and quantum mechanics (Section 13.5), we discuss the Poisson summation formula and its consequences (Section 13.6), and we describe a few miscellaneous applications (Section 13.7).

13.1

A table of Fourier transforms

When applying the Fourier transform, it is often useful to have a table of known transforms. In this section, we establish such a table, with the calculations left to the reader, as usual. We first introduce some notation that may be unfamiliar to the reader. Two functions that appear naturally when using Fourier transforms are  (  sin x if x 6= 0, 1 if −t ≤ x ≤ t, χt (x) = sinc(x) = (13.1.1) x 1 0 otherwise, if x = 0,

281

282

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

where t > 0. The Heaviside function ( 1 if x ≥ 0, u(x) = 0 otherwise,

(13.1.2)

is also useful for the sake of brevity, as ( f (x) u(x)f (x) = 0

( f (x) u(−x)f (x) = 0

if x ≥ 0, otherwise,

if x ≤ 0, otherwise.

(13.1.3)

In any case, without further ado, see Table 13.1.1 for a brief list of transforms we will use, along with the problems where they are computed. For constants, we assume α = a+bi ∈ C, with a > 0, t > 0, and n ≥ 1. Function (in x) 1 −πx2 Gt (x) = exp t t2

Fourier transform (in γ)

u(x)e−αx n−1 x u(x) e−αx (n − 1)! (−x)n−1 u(−x) eαx (n − 1)!

1 α + 2πiγ 1 (α + 2πiγ)n 1 (α − 2πiγ)n 2α 2 α + 4π 2 u2

exp −πt2 γ 2

e−α|x| χt (x)

2t sinc(2πtγ)

Source Problem 12.3.7 Problem 13.1.1 Problem 13.1.2 Problem 13.1.3 Problem 13.1.4 Problem 13.1.5

Table 13.1.1: Some Fourier transform examples

Remark 13.1.1. The reader interested in applications should note that ω = 2πγ is a more commonly used variable for the Fourier transform, and that many tables use j instead of i √ for −1. Note that in the variable ω, the Fourier transform becomes fˆ(ω) =

Z

∞

f (x)e−iωx dx,

(13.1.4)

−∞

and the inverse transform becomes 1 f (x) = 2π

Z

∞

−∞

fˆ(ω)eiωx dω.

(13.1.5)

13.2. LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS283

Problems 13.1.1. For α = a + bi ∈ C, a > 0, let ( e−αx f (x) = 0

if x ≥ 0, if x < 0.

(13.1.6)

Compute the Fourier transform fˆ(γ). 13.1.2. Let

 n−1  x e−αx (n − 1)! f (x) =  0

if x ≥ 0,

(13.1.7)

if x < 0.

Compute the Fourier transform fˆ(γ). (Suggestion: Induction and parts; the base case n = 1 is Problem 13.1.1.) 13.1.3. Let

 n−1  x e−αx (n − 1)! f (x) =  0

if x ≤ 0,

(13.1.8)

if x < 0.

Compute the Fourier transform fˆ(γ). (Suggestion: Use previous results instead of integrating.) 13.1.4. Let f (x) = e−α|x| . Compute the Fourier transform fˆ(γ). (Suggestion: Either integrate or combine previous results.) 13.1.5. For t ∈ R, t > 0, let ( 1 f (x) = 0

if |x| ≤ t, otherwise.

(13.1.9)

Compute the Fourier transform fˆ(γ). (Note that u = 0 is a separate case.)

13.2

Linear differential equations with constant coefficients

d Recall that Table 12.3.1 shows that the Fourier transform turns into multiplication by dx 2πiγ, and therefore, turns constant coefficient differential equations in x into algebraic equations in γ. We may therefore use the Fourier transform to find particular solutions, naturally expressed as convolutions, of linear differential equations with constant coefficients. To be able to discuss solutions coming from applications, we will concentrate on formal solutions and use results about convolution described (but not proven) in Remark 12.5.9. To be specific, consider the differential equation cn y (n) + cn−1 y (n−1) + · · · + c1 y 0 + c0 y = h(x),

(13.2.1)

284

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

where each ci ∈ C and h ∈ L1 (R). (Note that our initial data is in L1 (R) and not L2 (R), as has been more typical for us.) Let p(t) = cn tn + cn−1 tn−1 + · · · + c0 and v = 2πiγ. Taking the Fourier transform of both sides of (13.2.1), we get (Problem 13.2.1): ˆ cn v n yˆ(γ) + cn−1 v n−1 yˆ(γ) + · · · + c1 v yˆ(γ) + c0 yˆ = h(γ).

(13.2.2)

ˆ In other words, p(v)ˆ y (γ) = h(γ). Formally, at least, we may then solve for yˆ(γ) to get yˆ(γ) = solution y = U −1

1 ˆ h(γ), and find the p(v)

1 ˆ h(γ) . p(v)

If we happen to know the inverse Fourier transform

U −1

(13.2.3)

1 , then by Theorem 12.3.6, p(v)

we may express y as the convolution y = U −1

1 ∗ h, (13.2.4) p(2πiγ) 1 −1 and h(x) are in L1 (R). at least in the case where both U p(2πiγ) As it turns out, as long as p(v) has no zeros of the form it, where t ∈ R (including = 0), t 1 −1 then we can use Table 13.1.1 and the method of partial fractions to find U , as p(v) follows: 1. Factor p(v) into linear terms, which is always possible (at least in principle) over C. 2. As the reader may recall from calculus, the method of partial fractions shows that, if p(v) has no repeated zeros, then 1 A1 A2 A3 = + + + ... p(v) α1 ± v α2 ± v α3 ± v

(13.2.5)

for some Ai ∈ C, where the signs of the Ai and the ±v are chosen so that αi has positive real part. (Here is where we assume that p(v) has no purely imaginary zeros.) If p(v) has some k-fold zero ±α, replace the corresponding linear term in (13.2.5) with C1 Ck B0 + B1 v + · · · + Bk−1 v k−1 = + ··· + . 1 k (α ± v) (α ± v) (α ± v)k

(13.2.6)

3. Solve for each coefficient Ai of a multiplicity 1 term using Heaviside’s method: Multiply both sides by (αi ± v) and plug in v = ∓αi to erase most of the terms and solve for Ai . For higher multiplicity, one may need to do honest linear algebra, though if we have only one term of higher multiplicity, we can solve for everything else with Heaviside and determine the remaining term by subtraction.

13.3. THE HEAT AND WAVE EQUATIONS ON R

285

1 as a linear combination of terms found in the p(v) 1 second column of Table 13.1.1, so we can use Table 13.1.1 to calculate U −1 p(v) as a corresponding linear combination of inverse transforms.

4. In any case, we can now express

See Problems 13.2.2–13.2.5 for some examples. Remark 13.2.1. In fact, it can be shown that in a suitably generalized sense, for any polynomial p(t), (13.2.4) exists and is a solution to (13.2.1). A complete account can be found in H¨ ormander [H¨ or90, Chs. 7], but see also Section 14.2 for more on this point.

Problems 13.2.1. Use the properties of the Fourier transform to show that if we take the Fourier transform of both sides of (13.2.1), we get (13.2.2). 13.2.2. For h ∈ L1 (R), find a formal solution to the differential equation y 00 − y 0 − 2y = h.

(13.2.7)

Express your answer as a convolution with h. 13.2.3. Same as Problem 13.2.2, but for y 00 + 2y 0 + y = h.

(13.2.8)

13.2.4. Same as Problem 13.2.2, but for y 000 − y = h.

(13.2.9)

13.2.5. Same as Problem 13.2.2, but for y 000 − y 00 − y 0 + y = h.

13.3

(13.2.10)

The heat and wave equations on R

Back in Chapter 11, we solved the heat and wave equations both on the circle and on an interval [a, b] with suitable boundary conditions. With the Fourier transform, we can consider these problems on R, stated precisely as follows. Question 13.3.1. Given an initial value f (x) ∈ L2 (R), find u(x, t) (t > 0) such that: 1. (Differentiable) For fixed t0 > 0, u(x, t0 ) ∈ C 2 (R), and for fixed x0 ∈ R, u(x0 , t) ∈ C 1 ((0, +∞)). 2. (Initial value) For any x ∈ R, lim u(x, t) = f (x). t→0+

286

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

3. (PDE) For all t > 0, ∂2u ∂u = . 2 ∂x ∂t

(13.3.1)

Question 13.3.2. Given initial values f (x), g(x) ∈ L2 (R), find u(x, t) (t > 0) such that: 1. (Differentiable) For fixed t0 > 0, u(x, t0 ) ∈ C 2 (R), and for fixed x0 ∈ R, u(x0 , t) ∈ C 2 ((0, +∞)). 2. (Initial value) For any x ∈ R, lim u(x, t) = f (x),

t→0+

lim

t→0+

∂u (x, t) = g(x). ∂t

(13.3.2)

3. (PDE) For all t > 0, ∂2u ∂2u = . ∂x2 ∂t2

(13.3.3)

Our first observation is that, under the following conditions, taking the Fourier transform of F (x, t) in the variable x commutes with differentiation in the variable t. Theorem 13.3.3. Suppose u : R × [a, b] → C is continuous as a function of two variables and continuously differentiable in the variable t, and suppose that the sequences Z N Z N ∂u dx, |u(x, t)| dx, lim lim (x, t) (13.3.4) N →∞ −N ∂t N →∞ −N converge uniformly (i.e., independently of t) on [a, b] to the corresponding improper integrals. Then for all t ∈ [a, b] and all x ∈ R, c ∂u ˆ ∂u = . ∂t ∂t

(13.3.5)

Proof. Problem 13.3.1. ∂y ∂2y or 2 , ∂t ∂t assuming (13.3.5) always holds, we may solve a PDE of the form L(u) = T (u) as follows. So now, if L is a differential operator with constant coefficients and T (y) =

1. Let v = 2πiγ. By (13.3.5), take the Fourier transform (in the variable x) of both sides ∂u ˆ ∂2u ˆ of L(u) = T (u) to get p(v)ˆ u(γ, t) = or , where p(v) is a polynomial in v (see ∂t ∂t2 Problem 13.2.1). 2. Integrate both sides in t to solve for u ˆ(γ, t), treating p(v) as a constant and taking ˆ initial conditions u ˆ(γ, 0) = f (γ), etc. 3. Take the inverse transform in x to solve for u(x, t).

13.3. THE HEAT AND WAVE EQUATIONS ON R

287

Specifically, for the heat equation, we have the following solution. Theorem 13.3.4. Suppose f ∈ S(R), and in the notation of Example 12.2.8, let g(x, t) = G2π

√

1 x2 √ exp − 2 . t (x) = 4π t 2π t

(13.3.6)

Let u(x, t) = (g ∗ f )(x, t) where the convolution is in the variable x with t fixed. Then u(x, t) is a solution to the heat equation (Question 13.3.1). Note that our hypothesis of f ∈ S(R) is far stronger than necessary, both in terms of the smoothness and the decay rate of f . In fact, it can be shown, using similar methods but with better-developed knowledge of integration, that the theorem holds if we only assume f ∈ L1 (R) (see Dym and McKean [DM85, Sect. 2.7.2]). Nevertheless, we stick with the case of f ∈ S(R) because it simplifies the technical details while conveying the main idea of what the Fourier transform contributes. Proof. Problem 13.3.2 formally justifies guessing the above solution u(x, t), as long as Theorem 13.3.3 and Table 12.3.1 hold. Problem 13.3.3 then verifies that u(x, t) actually does solve the heat equation. Remark 13.3.5. Recall that in Remark 12.2.10, we described the convolution f ∗ G2π√t as “smearing” the function f around its domain R with a distribution described by G2π√t . Recall also that as t → 0+ , G2π√t approaches a delta function-like spike at 0, and as t → ∞, G2π√t becomes a wider and wider “bell curve” distribution, as illustrated by Figure 12.2.1. It follows that the solution f ∗ G2π√t starts out as f , and then as time t moves forward, becomes evenly smeared out over R. As mentioned previously in Remark 11.1.6, for a related (and entertaining) discussion of “pulse shape” over time and the resulting difficulties in building transatlantic cables in the 19th century, see K¨ orner [K¨ or89, Chs. 62, 65–66]. Applying the same formal manipulation used in the proof of Theorem 13.3.4 to the wave equation yields d’Alembert’s solution (see Section 11.3) by yet another manner. Because this solution is not new, we restrict our attention to its derivation as a formal solution. Theorem 13.3.6. Consider the wave equation (Question 13.3.2) with initial values f, g ∈ L2 (R). Working formally (e.g., assuming that Theorem 13.3.3 and Table 12.3.1 apply) and using the above method, we obtain the formal solution 1 1 u(x, t) = (f (x + t) + f (x − t)) + 2 2 Proof. Problem 13.3.4.

Z

x+t

g(y) dy. x−t

(13.3.7)

288

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

Problems 13.3.1. (Proves Theorem 13.3.3 ) Suppose u : R × [a, b] → C is continuous as a function of two variables and continuously differentiable in the variable t, and suppose that the sequences Z N Z N ∂u |u(x, t)| dx, lim (x, t) dx, lim (13.3.8) N →∞ −N ∂t N →∞ −N converge uniformly (i.e., independently of t) on [a, b] to the corresponding improper integrals. Prove that for all t ∈ [a, b] and all x ∈ R, c ∂u ˆ ∂u = . ∂t ∂t

(13.3.9)

(Suggestion: Theorems 4.8.4 and 4.8.8.) 13.3.2. (Proves Theorem 13.3.4 ) Consider the heat equation as described in Question 13.3.1, with initial values f ∈ S(R). (a) Assuming (13.3.9) always holds, take the Fourier transform (in x) of both sides of 2 2 (13.3.1) and solve in t for constant γ to show that u ˆ(γ, t) = C(γ)e−4π γ t for some function C(γ). (b) Assuming that C(γ) = fˆ(γ) and that Table 12.3.1 holds for the functions in question, prove that u(x, t) = (g ∗ f )(x, t) for g(x, t) = G2π

√

1 x2 √ exp − 2 . t (x) = 4π t 2π t

(13.3.10)

(Suggestion: See also Table 13.1.1.) 13.3.3. (Proves Theorem 13.3.4 ) To check that the solution found in Problem 13.3.2 actually works, consider the heat equation as formulated in Question 13.3.1). Suppose f ∈ S(R), let 1 x2 √ g(x, t) = G2π t (x) = √ exp − 2 , (13.3.11) 4π t 2π t Z ∞ and let u(x, t) = (g ∗ f )(x, t) = g(x − y, t)f (y) dy (i.e., convolution in the first variable). −∞ 2 2 (a) Fix t ≥ 0. Prove that u ˆ(γ, t) = fˆ(γ)e−4π γ t , where again, the Fourier transform is taken in the first variable.

(b) Suppose h ∈ S(R), and let Z

∞

F (x, t) = (g ∗ h)(x, t) =

g(x − y, t)h(y) dy. −∞

(13.3.12)

13.4. AN EIGENBASIS FOR THE FOURIER TRANSFORM

289

For 0 < a ≤ t ≤ b, prove that 1 |F (x, t)| ≤ √ 2π a

∞

(x − y)2 exp − |h(y)| dy, 4π 2 b −∞

Z

(13.3.13)

and prove that the right-hand side of (13.3.13) is rapidly decaying as a function of x. (Suggestion: Theorem 12.2.3.) (c) Prove that for 0 < a < b, the sequences Z N |u(x, t)| dx, lim N →∞ −N

lim

N →∞

N

∂u (x, t) dx, ∂t −N

Z

(13.3.14)

converge uniformly (i.e., independently of t) for t ∈ [a, b] to the corresponding improper integrals. (d) Prove that u(x, t) is a solution to the heat equation with initial values f (x). In particular, prove that for fixed x ∈ R, lim u(x, t) = f (x). (Suggestions: Theorem 13.3.3 t→0

and Section 12.2.) 13.3.4. (Proves Theorem 13.3.6 ) Consider the wave equation as described in Question 13.3.2). c ∂u ∂u ˆ In this problem, we work formally, assuming the validity of = , convolution formulas, ∂t ∂t and so on. Suppose ∂2u ∂2u = . (13.3.15) ∂x2 ∂t2 (a) Assuming Theorem 13.3.3 applies, take the Fourier transform (in the variable x) of both sides of (13.3.15) to show that u ˆ(γ, t) = A(γ) cos(2πγt) + B(γ) sin(2πγt)

(13.3.16)

for some functions A(γ), B(γ). (Suggestion: Theorem 4.6.3.) ∂u ˆ (b) Assuming that u ˆ(γ, 0) = fˆ(γ) and (γ, 0) = gˆ(γ), solve for A(γ) and B(γ). ∂t (c) Assuming that Theorem 13.3.3 and Tables 12.3.1 and 13.1.1 hold, prove that Z 1 1 x+t u(x, t) = (f (x + t) + f (x − t)) + g(y) dy. (13.3.17) 2 2 x−t (Suggestion: Each term is a convolution.)

13.4

An eigenbasis for the Fourier transform

Recall that in Section 11.6 we considered the Hermite functions n (−1)n d 2 πx2 hn (x) = e e−2πx , n! dx

(13.4.1)

290

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

which have the form (Theorem 11.6.2) 2

hn (x) = Hn (x)e−πx ,

(13.4.2)

where Hn (x) is the Hermite polynomial of degree n. Recall also that if K is the operator in L2 (R) with domain D(K) = S(R) defined by K(f ) = −

d2f + 4π 2 x2 f, dx2

(13.4.3)

then {hn (x)} is an orthogonal set of eigenfunctions of K in L2 (R) (Theorem 11.6.6). Finally, we recall (Lemma 11.6.5) that the hn (x) satisfy the recurrence relation h0n − (2πx)hn = −(n + 1)hn+1 .

(13.4.4)

In this section, we complete the proof of Theorem 11.6.7, which states that {hn (x)} is an eigenbasis for K, in the course of proving the following remarkable result. Theorem 13.4.1. Let U be the Fourier transform considered as an operator on L2 (R). Then the set {hn (x)} is an eigenbasis for U . Our approach is taken from Dym and McKean [DM85, Sect. 2.5]. We begin by showing that the hn (x) are eigenfunctions of U . Theorem 13.4.2. The nth Hermite function hn (x) is an eigenfunction of the Fourier transform U with eigenvalue (−i)n . ˆ n . To prove the theorem, it suffices to show that gn = hn , and this is Proof. Let gn = in h Problem 13.4.1. Having confirmed the eigenfunction portion of Theorem 11.6.7, it remains to show that {hn (x)} satisfies one of the equivalent conditions of the Isomorphism Theorem for Fourier Series 7.6.7. We first have some preliminary lemmas. Lemma 13.4.3. Suppose f ∈ DL2 (R) satisfies E hf, hn i = 0 for any Hermite function hn . 2 −πx Then for any polynomial p(x), f, p(x)e = 0. Proof. Problem 13.4.2. Lemma 13.4.4. For any x, γ ∈ R and any N ∈ N, we have that N X (−2πiγx)n ≤ e2π|γx| . n! n=0

Proof. Problem 13.4.3.

(13.4.5)

13.4. AN EIGENBASIS FOR THE FOURIER TRANSFORM

291

Proof of Theorem 13.4.1. Suppose f ∈ L2 (R) and hf, hn i = 0 for all n ∈ N. Let F (x) = 2 f (x)e−πx . Then Z ∞ 2 Fˆ (γ) = f (x)e−πx e−2πiγx dx −∞

Z

∞

f (x)e

=

−πx2

−∞

∞ X (−2πiγx)n

(13.4.6)

! dx.

n!

n=0

Let −πx2

GN (x) = f (x)e

N X (−2πiγx)n

n=0 −πx2 +2π|γx|

G(x) = |f (x)| e

!

n!

,

.

(13.4.7) (13.4.8)

By Lemma 13.4.4, we see that |GN (x)| ≤ G(x) for all N ∈ N. Furthermore, since 2 +2π|γx|

e−πx

2 +2πγx

≤ e−πx

2 −2πγx

+ e−πx

,

(13.4.9)

2

the sum of two nonnegative functions in L2 (R), e−πx +2π|γx| ∈ L2 (R), and G ∈ L1 (R) (Theorem 7.5.10). It follows that we may apply dominated convergence (Lebesgue Axiom 4) to (13.4.6). Therefore, ! Z ∞ N n X (−2πiγx) 2 Fˆ (γ) = lim f (x) e−πx dx = 0, (13.4.10) N →∞ −∞ n! n=0

by Lemma 13.4.3. Taking inverse transforms (Theorem 12.4.1), we see that F (x) = 0 a.e., which means that f (x) = 0 a.e. Condition (4) of the Isomorphism Theorem for Fourier Series 7.6.7 is therefore satisfied, and the theorem follows.

Problems 13.4.1. (Proves Theorem 13.4.2 ) In this problem, we use the operator notation for the ˆ n (x), where hn is the nth Hermite Fourier transform (Remark 12.3.2), and we let gn (x) = in h function. (a) By taking the Fourier transform of (13.4.4), prove that gn0 − (2πx)gn = −(n + 1)gn+1 .

(13.4.11)

(Suggestion: Remark 12.3.4.) (b) Prove that gn = hn . (Suggestion: Induction.) 13.4.2. (Proves Lemma 13.4.3 ) Suppose f ∈ L2 (R) satisfies hf, hnEi = 0 for any HerD 2 mite function hn . Prove that for any polynomial p(x), f, p(x)e−πx = 0. (Suggestion: Lemma 11.5.7.)

292

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

13.4.3. (Proves Lemma 13.4.4 ) Prove that for any x, γ ∈ R and any N ∈ N, N X (−2πiγx)n ≤ e2π|γx| . n!

(13.4.12)

n=0

(Suggestion: Consider the power series definition of e2π|γx| .)

13.5

Continuous-valued quantum observables

Recall that in Section 11.7, we saw that the mathematical model of quantum mechanics found in Schr¨ odinger’s equation can be abstracted as follows. 1. The state of a quantum-mechanical system (e.g., a particle) at time t can be expressed as a function Ψ(x, t) such that for fixed t, Ψ(x, t) ∈ L2 (R) and kΨ(x, t)k = 1. 2. An observable quantity of our system is represented by a Hermitian operator M in L2 (R). 3. In the special case of an observable M where we can find an orthonormal eigenbasis {ψn } for M with M (ψn ) = λn ψn , then: • The only possible values of the observable quantity are the eigenvalues λn of M ; and P • If Ψ = cn ψn , then upon measurement, the state of the system collapses into the single state ψn corresponding to the observed value λn with probability |cn |2 . Two important observables in quantum mechanics are: • The position operator Mx (f (x)) = xf (x); and 1 d f 0 (x) (f (x)) = . (Note that the factor of 2π here is 2πi dx 2πi nonstandard, and appears because our definition of the Fourier transform uses e−2πiγx instead of e−iπγx .)

• The momentum operator

However, as the reader may recall (Example 10.3.4 and Theorem 10.3.7), neither of these operators has any eigenfunctions or eigenvalues, so they cannot possibly have corresponding eigenbases. The question arises, then: What are the possible values of position and momentum for a quantum system in R, and how are these observables modelled mathematically? Or to be more specific, what replaces the eigenbasis in the framework mentioned above? In this section, we focus less on results and more on developing a language for describing observables with continuous spectra, at least in one special case. In fact, while the Fourier transform is used at one key point, most of what we discuss here does not rely on it. Nevertheless, since the Fourier transform is the continuous analogue of Fourier series, this seems to be an appropriate place to consider the continuous analogue to the eigenbasis

13.5. CONTINUOUS-VALUED QUANTUM OBSERVABLES

293

decomposition interpretation of measurement from Section 11.7. For simplicity and concreteness, we restrict our attention to one particular type of measurement of a continuous observable, taken from Braginsky, Khalili, and Thorne [BKT92, Sect. 2.6]; see the entirety of that reference for much more about quantum measurement. Turning first to the position operator, we have the following replacement for point 3 (above). Suppose a quantum system representing a particle is in the state Ψ(x, t) at a given (fixed) time t. Then: • The possible values of position are all x ∈ R. • The only kind of measurement we consider is a “YES/NO detector” on a closed interval [a, b] ⊆ R. That is, we are only allowed to ask the question, “Is the position of the particle between x = a and x = b?” • Upon measurement at time t, a particle is found to have position in [a, b] with probaZ b bility |Ψ(x, t)|2 dx, and when the answer for a given particle is YES, the state of a

the system collapses to the state Ψ1 = Ψ0 / kΨ0 k, where ( Ψ(x, t) Ψ0 (x) = 0

if a ≤ x ≤ b, otherwise.

(13.5.1)

Note that we never consider the probability of the particle’s position collapsing to a single x ∈ R; it only makes sense to discuss the probability of the particle being in some range [a, b]. 1 d For the momentum operator , the possible values taken by the system in the state 2πi dx Ψ(x, t) are not as immediately visible as they are with the position operator. What makes them visible is the Fourier transform, because by Remark 12.3.4, U

1 d 2πi dx

= Mx U.

(13.5.2)

In physical terms, the Fourier transform turns the momentum operator in position space into the multiplication operator Mγ in the transform space, which is therefore called momentum ˆ and momentum that we did space. We may therefore apply the same interpretation to Ψ for Ψ and position, namely, that: • The possible values of momentum are all γ ∈ R. • Again, the only measurement we consider is the case of a “YES/NO detector” on a closed interval [a, b] in momentum space. That is, we only measure the answer to the question, “Is the momentum of the particle between γ = a and γ = b?”

294

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM • Upon measurement at time t, a particle is found to have momentum in [a, b] with Z b 2 ˆ probability Ψ(γ, t) dγ, and when the answer for a given particle is “YES”, the a

ˆ1 = Ψ ˆ 0/ ˆ 0 state of the system collapses to the state Ψ

Ψ

, where ( ˆ 0 (γ) = Ψ(γ, t) if a ≤ γ ≤ b, Ψ 0 otherwise.

(13.5.3)

Note the isomorphism theorem for the Fourier transform 12.5.4, we have that Z ∞ that by 2 2 ˆ ˆ t) is indeed a genuine probability distribution. Ψ(γ, t) dγ = 1; in other words, Ψ(γ, −∞

To continue our discussion of the interpretation of Ψ(x, t) and observables, we consider the expected value (or in other terms, the mean value) of an observable M for a given state Ψ. If M has an eigenbasis {ψ Pn } with the eigenvalue λn representing the observed value in the eigenstate ψn , and Ψ = cn ψn , then because our system collapses to the state ψn with probability |cn |2 , we see that over many measurements, the expected value will be DX E X X λn |cn |2 = λn cn ψn , cn ψn D X X E (13.5.4) = M cn ψn , cn ψn = hM (Ψ), Ψi , where the first equality holds by the Isomorphism Theorem for Fourier Series 7.6.7, and the second holds by the Diagonalization Theorem 10.4.3. Similarly, for the position operator Mx , we see that the expected value of position for a particle in state Ψ(x, t) is Z ∞ Z ∞ x |Ψ(x, t)|2 dx = xΨ(x, t)Ψ(x, t) dx = hxΨ(x, t), Ψ(x, t)i = hMx (Ψ), Ψi , (13.5.5) −∞

−∞

1 d , we see that the expected value of momentum is 2πi dx Z ∞ 2 D E 1 d ˆ ˆ ˆ γ Ψ(γ, t) dγ = Mγ (Ψ), Ψ = (Ψ), Ψ , (13.5.6) 2πi dx −∞

and for the momentum operator

where the last equality follows from the Isomorphism Theorem for the Fourier Transform 12.5.4. All of this is meant to motivate the following definition. Definition 13.5.1. Let T be an operator in L2 (R), not necessarily assumed to be Hermitian or representing an observable. We define the expected value of T to be the function hT i = hT (Ψ), Ψi

(13.5.7)

for all Ψ ∈ D(T ) with kΨk = 1. The reader should note the (well-established) abuse of notation whereby dependency on Ψ is implied on one side of (13.5.7) and explicit on the

13.5. CONTINUOUS-VALUED QUANTUM OBSERVABLES

295

other. By a further abuse of notation, if Ψ is understood, we also use hT i to denote the operation of scalar multiplication by hT (Ψ), Ψi. So for example, with Ψ fixed, we have hhT iΨ, Ψi = hT ihΨ, Ψi = hT i = hT (Ψ), Ψi .

(13.5.8)

With Definition 13.5.1 as our starting point, in the rest of this section, following Nielsen and Chuang [NC00, Sect. 2.2.5], we combine some probability theory and operator algebra to prove Heisenberg’s famous uncertainty principle. Throughout, we fix a Hilbert space H. We assume that each operator M in H has the same domain D(M ) = H0 , and that H0 is invariant under each operator M (i.e., M (H0 ) ⊆ H0 ); as a consequence, we may form arbitrary compositions and linear combinations of operators. (Lest this last assumption seem restrictive, we note that for all of the operators we have mentioned in connection with quantum mechanics, H0 = S(R) works as a common invariant domain in H = L2 (R).) Definition 13.5.2. Let T be an operator in H. To say that T is skew-Hermitian means that for every f, g ∈ D(T ), hT f , gi = − hf, T gi. Theorem 13.5.3. Let T be an operator in H. If T is Hermitian, then the expected value hT i (Definition 13.5.1) is real, or in other words, for every Ψ ∈ D(T ), hT (Ψ), Ψi ∈ R. Similarly, if T is skew-Hermitian, then hT i is purely imaginary, or in other words, for every Ψ ∈ D(T ), hT (Ψ), Ψi = bi for some b ∈ R. Proof. Problem 13.5.1. Definition 13.5.4. Let M be a Hermitian operator in H, and note that as discussed in Definition 13.5.1, if we fix Ψ, since hM i is real-valued (Theorem 13.5.3), we can think of hM i as a Hermitian operator (a real multiple of the identity). We may therefore define the (squared) standard deviation, or variance, of M to be

(13.5.9) σ(M )2 = (M − hM i)2 . Note that, continuing the abuse of notation from Definition 13.5.1, (13.5.9) depends implicity on Ψ, kΨk = 1, so that σ(M )2 applied to Ψ is (applying the Hermitian property)

(M − hM i)2 = h(M − hM i)Ψ, (M − hM i)Ψi = k(M − hM i)Ψk2 . (13.5.10)

Note also that, as in probability and statistics, since (M − hM i)2 represents the expected value of the squared distance between M Ψ and hM i Ψ, a larger variance for an observable quantity indicates a wider-spread distribution, on average, for that quantity, which we can think of as being a “greater uncertainly” in the value of the observable. To practice proper notational abuse, the reader may want to try proving the following standard result from probability. Theorem 13.5.5. If M is a Hermitian operator in a Hilbert space H, then

σ(M )2 = M 2 − hM i2 .

(13.5.11)

296

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

Proof. Problem 13.5.2. In any case, continuing with operator algebra: Definition 13.5.6. Let A and B be operators in H with common invariant domain H0 . We define the commutator of A and B to be [A, B] = AB − BA,

(13.5.12)

and we define the anti-commutator of A and B to be {A, B} = AB + BA.

(13.5.13)

Lemma 13.5.7. Let A and B be Hermitian operators in H with common invariant domain H0 . Then the commutator [A, B] is skew-Hermitian, and the anti-commutator {A, B} is Hermitian. Proof. Problem 13.5.3. We come to the Heisenberg’s uncertainty principle itself. Theorem 13.5.8 (Uncertainty Principle). Let S and T be Hermitian operators in H with common invariant domain H0 . Then σ(S)2 σ(T )2 ≥

|h[S, T ]i|2 . 4

(13.5.14)

Proof. Problem 13.5.4. 1 d For example, if S = Mx is the position operator in L2 (R) and T = is the 2πi dx momentum operator, then because d d d (Mx (f )) = (xf (x)) = xf 0 (x) + f (x) = Mx + 1, dx dx dx

(13.5.15)

we have that [S, T ] = ST − T S 1 d d = Mx − Mx +1 (13.5.16) 2πi dx dx 1 =− . 2πi 1 The Uncertainty Principle then says that σ(S)2 σ(T )2 ≥ , no matter what Ψ is. Note 16π 2 that contrary to popular belief, the point is not that we can be certain about only one of the quantities position and momentum; indeed, one of the fundamental precepts of quantum mechanics is that we can be certain of neither! The point is actually that the more certain we are about the position of a particle (i.e., the smaller σ(S)2 is), the less certain we can be about its momentum (i.e., the larger σ(T )2 must be), and vice versa.

13.5. CONTINUOUS-VALUED QUANTUM OBSERVABLES

297

Remark 13.5.9. Note that we have stated and proved the Uncertainty Principle 13.5.8 in the setting of operators in an arbitrary Hilbert space, with no reference to either integrals or series. For a discussion of the foundations of quantum mechanics at roughly this level of abstraction, see Section 14.4.

Problems For all problems, let H be a Hilbert space. 13.5.1. (Proves Theorem 13.5.3 ) Let T be an operator in H. (a) Prove that if T is Hermitian and Ψ ∈ D(T ), then hT (Ψ), Ψi ∈ R. (Suggestion: hT (Ψ), Ψi.) (b) Prove that if T is skew-Hermitian and Ψ ∈ D(T ), then hT (Ψ), Ψi = bi for some b ∈ R.

13.5.2. Let M be a Hermitian operator in H. Prove that σ(M )2 = M 2 − hM i2 , or in other words, prove that for Ψ ∈ D(M ) with kΨk = 1, we have h(M − hM i)Ψ, (M − hM i)Ψi = hM Ψ, M Ψi − hhM iΨ, hM iΨi .

(13.5.17)

13.5.3. (Proves Lemma 13.5.7 ) Let A and B be Hermitian operators in H with common invariant domain H0 . (a) Prove that the commutator [A, B] is skew-Hermitian. (b) Prove that the anti-commutator {A, B} is Hermitian. 13.5.4. (Proves Theorem 13.5.8 ) Let A, B, S, and T be Hermitian operators in H with common invariant domain H0 . (a) Prove that h{A, B}i and h[A, B]i are the real and imaginary parts, respectively, of 2 hABi, and consequently, |h{A, B}i|2 + |h[A, B]i|2 = 4 |hABi|2 .

(13.5.18)

(Suggestion: Theorem 13.5.3. Keep the implied Ψ ∈ H0 with kΨk = 1 in mind.)

(b) Prove that |hABi|2 ≤ A2 B 2 . (Suggestion: Cauchy-Schwarz.) (c) Taking A = S − hSi and B = T − hT i, prove that σ(S)2 σ(T )2 ≥

|h[S, T ]i|2 . 4

(13.5.19)

298

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

13.6

Poisson summation and theta functions

The Poisson summation formula is the following striking result. Theorem 13.6.1 (Poisson Summation). Suppose f ∈ C 1 (R) and there exist constants C and p > 1 such that f (x) ≤

C , |x|p

f 0 (x) ≤

C |x|p

(13.6.1)

for all x ∈ R. Then X

f (n) =

X

fˆ(n),

(13.6.2)

n∈Z

n∈Z

where fˆ is the Fourier transform of f . Proof. Problem 13.6.1. One classic application of the Poisson formula is the study of the following function. Definition 13.6.2. For τ ∈ R, τ > 0, the Jacobi theta function is defined to be X 2 Θ(τ ) = e−πn τ .

(13.6.3)

n∈Z

Corollary 13.6.3. The Jacobi theta function satisfies the identity 1 1 Θ(τ ) = √ Θ τ τ for all τ > 0. Furthermore, for y ∈ R, z = iy, y > 0, if we let q = eπiz and X 2 X 2 θ(z) = qn = eπn iz , n∈Z

(13.6.4)

(13.6.5)

n∈Z

then we have θ(z + 2) = θ(z), p 1 θ − = ( z/i)θ(z), z where we define

(13.6.6) (13.6.7)

p p z/i = y.

p Note that we must be specific about how we define z/i because the square root function cannot be defined consistently over all of C; see, for example, Waterhouse [Wat12]. Proof. Problem 13.6.2.

13.6. POISSON SUMMATION AND THETA FUNCTIONS

299

Remark 13.6.4. If we rewrite θ(z) from (13.6.5) as a function of q = eπiτ , we get X

θ(q) =

q

n2

=

n∈Z

∞ X

ak q k ,

(13.6.8)

k=0

where   2 2 ak = (number of n ∈ Z such that n = k) = 1   0

if k is a nonzero square, if k = 0, otherwise.

(13.6.9)

We say that θ is a generating function for ak , or in other words, θ is a series whose coefficients ak have an interesting combinatorial interpretation. Many interesting generating functions have both notable combinatorial and notable analytic properties; for example, (13.6.6) and (13.6.7), extended to all z ∈ C with positive imaginary part, imply that θ(z) is what is called a modular form. (For the reader who is somewhat familiar with modular forms, the fact that θ has period 2 and not period 1 means that θ is modular with respect to a proper subgroup of the full modular group.) Jacobi used modularity to prove a quantitative version of Lagrange’s four squares theorem, which states that any nonnegative integer is the sum of four squares. Roughly speaking, Jacobi’s proof starts with      X 2 X 2 X 2 X 2 θ(z)4 =  q n1   q n2   q n3   q n4  ni ∈Z

=

X

ni ∈Z

q

ni ∈Z

n21 +n22 +n23 +n24

=

(ni )∈Z4

∞ X

ni ∈Z

(13.6.10)

bk q k

k=0

and derives a formula for bk coming from the fact that θ(z)4 is modular. In particular, this formula implies that bk ≥ 1 for all k ≥ 0 (the four squares theorem). See, for example, Diamond and Shurman [DS05]. To give one other application of Poisson Summation, we tie up a loose end from Section 11.1. Recall that we found the solution X 2 u(x, t) = fˆ(n)e−4πn t en (x) (13.6.11) n∈Z

to the heat equation on the circle, where f ∈ L2 (S 1 ) is our given initial value. We then claimed that the pointwise initial value condition lim u(x, t) = f (x)

t→0+

(13.6.12)

still holds even if we relax the assumption f ∈ C 1 (S 1 ) to f ∈ C 0 (S 1 ), and we can now use Poisson Summation to prove this claim. We begin with the following slight variation on a familiar definition.

300

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

Definition 13.6.5. Combining Definitions 8.3.1 and 12.2.4, to say that a one-parameter family of continuous functions Kt : S 1 → R (t ∈ R, t > 0) is a Dirac kernel means that 1. For all t > 0 and all x ∈ − 12 , 12 , Kt (x) ≥ 0. Z

1/2

2. For all t > 0,

Kt (x) dx = 1; and −1/2

3. For any fixed δ > 0, we have Z lim

t→0+

δ≤|x|≤ 21

Kt (x) dx = 0.

(13.6.13)

Combining the proofs of Theorems 8.4.1 and 12.2.5 then gives the following result. (The details of the proof, which are essentially nothing new, are omitted to avoid excess repetition.) Theorem 13.6.6. If {Kt } is a Dirac kernel, and f ∈ C 0 (S 1 ), then lim (f ∗ Kt )(x) = f (x),

t→0+

(13.6.14)

where convergence is uniform on S 1 . So now, let Ht (x) =

X

2

e−4πn t en (x)

n∈Z

=1+2

∞ X

(13.6.15) e

−4πn2 t

cos(2πnx)

n=1

be the heat kernel. We can then reformulate our solution (13.6.11) as follows. Theorem 13.6.7. For f ∈ C 0 (S 1 ), let u be the solution (13.6.11) to the heat equation. Then for x ∈ S 1 and t > 0, we have that u(x, t) = (f ∗ Ht )(x). Proof. Problem 13.6.3. Therefore, by Theorem 13.6.6, it remains to show that Ht is a Dirac kernel. Looking at examples (see Figure 13.6.1), this certainly seems plausible, but rigorously speaking, it is not obvious even that Ht (x) ≥ 0. The following result, however, clarifies matters considerably. Theorem 13.6.8. For all x ∈ S 1 , we have that X Ht (x) = G2π√t (x + n).

(13.6.16)

n∈Z

In other words, Ht (x) is the “periodized” version of the Gauss kernel G2π√t (x) (Example 12.2.8).

13.6. POISSON SUMMATION AND THETA FUNCTIONS

301

Figure 13.6.1: The heat kernel Ht (x) (t = 0.1, 0.01, 0.001) Proof. Problem 13.6.4. The fact that the heat kernel is a Dirac kernel in the sense of Definition 13.6.5 is then a consequence of the following more general result. Theorem 13.6.9. Let Kt : R → R be a Dirac kernel on R (Definition 12.2.4), and suppose that for fixed t > 0, X Ht (x) = Kt (x + n) (13.6.17) n∈Z

converges uniformly on [− 12 , 21 ]. Then Ht (x) is a Dirac kernel on S 1 , in the sense of Definition 13.6.5. Proof. Uniform convergence proves that Ht is continuous (Theorem 4.3.8), and the condition Ht (x) ≥ 0 follows because Kt (x) ≥ 0. The other conditions of Definition 13.6.5 are proven in Problem 13.6.5. In any case, we can now finally prove the result promised in Section 11.1. Corollary 13.6.10. For f ∈ C 0 (S 1 ), let u(x, t) =

X n∈Z

Then lim u(x, t) = f (x). t→0+

2 fˆ(n)e−4πn t en (x).

(13.6.18)

302

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

Proof. Fix t > 0. Since G2π√t (x) is strictly decreasing as |x| → ∞, we see that for n 6= 0 and x ∈ [− 21 , 21 ], 1 −π(|n| − 1)2 √ √ ≤ Ca|n|−1 (13.6.19) G2π t (x + n) ≤ G2π t (|n| − 1) = √ exp 4π 2 t 2π t X for some C > 0, 0 < a < 1. Therefore, since Ca|n|−1 is a convergent geometric series, by the Weierstrass M -test (Theorem 4.3.7), the heat kernel (13.6.16) satisfies the convergence hypothesis of Theorem 13.6.9. The corollary then follows by Theorems 13.6.6, 13.6.8, and 13.6.9.

Problems 13.6.1. (Proves Theorem 13.6.1 ) Suppose f ∈ C 1 (R) and there exist constants C and C C and f 0 (x) ≤ for all x ∈ R. Let p > 1 such that f (x) ≤ |x|p |x|p X X g(x) = f (x + n), h(x) = f 0 (x + n). (13.6.20) n∈Z

n∈Z

(a) Prove that both g and h converge absolutely and uniformly on [0, 1]. (Suggestion: Weierstrass M -test. What is the largest possible value of f (x) on [n, n + 1]?) (b) Prove that g(x + 1) = g(x), or in other words, g is a well-defined function on S 1 . (Suggestion: Absolutely convergent series sum independently of the order of summation.) (c) Prove that g 0 (x) = h(x), and therefore, that g ∈ C 1 (S 1 ). (d) Prove that for n ∈ Z, gˆ(n) = fˆ(n), where gˆ(n) is the nth Fourier series coefficient of g, and fˆ(n) is the Fourier transform of f evaluated at n. (Suggestion: Uniform convergence and substitution.) (e) Prove that X

f (n) =

n∈Z

X

fˆ(n).

(13.6.21)

n∈Z

(Suggestion: Theorem 8.1.2.) 13.6.2. (Proves Corollary 13.6.3 ) Let Θ(τ ) =

X

2τ

e−πn

for τ > 0, and let θ(z) =

n∈Z

X

2 iz

eπn

for y ∈ R, z = iy, y > 0.

n∈Z

(a) Use Poisson Summation to prove that 1 Θ(τ ) = √ Θ τ for all τ > 0. (Suggestion: Table 12.3.1.)

1 τ

(13.6.22)

13.6. POISSON SUMMATION AND THETA FUNCTIONS

303

(b) Prove that for y ∈ R, z = iy, y > 0, we have θ(z + 2) = θ(z), p 1 = ( z/i)θ(z), θ − z where we define

(13.6.23) (13.6.24)

p p z/i = y. (Suggestion: Substitution.)

13.6.3. (Proves Theorem 13.6.7 ) Suppose f ∈ C 0 (S 1 ), Ht (x) =

X

2

e−4πn t en (x), x ∈ S 1 ,

n∈Z

and t > 0. Prove that u(x, t) = (f ∗ Ht )(x). (Suggestion: Section 8.2.) 13.6.4. (Proves Theorem 13.6.8 ) For x ∈ R, prove that X X 2 e−4πn t en (x) = G2π√t (x + n). n∈Z

(13.6.25)

n∈Z

(Suggestion: Poisson Summation.) 13.6.5. (Proves Theorem 13.6.9 ) Let Kt : R → R be a Dirac kernel on R (Definition 12.2.4), and suppose that for fixed t > 0, X Ht (x) = Kt (x + n) (13.6.26) n∈Z

converges uniformly on [− 12 , 21 ]. (a) Prove that for fixed N ∈ N, Z

N X

1/2

Z

−1/2 n=−N

Z

N +1/2

Kt (x + n) =

Kt (x) dx.

(13.6.27)

−N −1/2

1/2

(b) Prove that for fixed t > 0,

Ht (x) dx = 1. (Suggestion: Uniform convergence.) −1/2

(c) Fix δ > 0. By rewriting Ht (x) = Kt (x) +

∞ X

Kt (x + n) +

n=1

∞ X

Kt (x − n)

(13.6.28)

n=1

prove that Z lim

t→0+

δ≤|x|≤ 12

Ht (x) dx = 0.

(13.6.29)

(Suggestion: Use the idea of part (a) and apply Definition 12.2.4 with two different values of η.)

304

13.7

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

Miscellaneous applications of the Fourier transform

In this section, we collect some other miscellaneous applications of the Fourier transform: Local-global decay-smoothness relations (Section 13.7.1), the sampling theorem (Section 13.7.2) and the continuous Wiener-Khinchin theorem (Section 13.7.3).

13.7.1

Decay and smoothness for Fourier transforms

Recall that in Theorems 6.4.2 and 8.5.1, we exhibited a close connection between the smoothness of a function f ∈ L2 (S 1 ) and the decay rate of its Fourier coefficients. Here, we show that similar results hold for the Fourier transform. We first prove a direct analogue of Theorem 6.4.2. Theorem 13.7.1. Suppose f : R → C is a function such that for some k ≥ 1, we have that f, f 0 , . . . , f (k) are all in C 0 (R) ∩ L2 (R) and f (k) ∈ L1 (R). Then there exists some C > 0 C such that fˆ(γ) ≤ for all γ ∈ R. |2πγ|k Proof. Problem 13.7.1. We also have the following converse, which has no direct analogue for Fourier series. (Note, however, that by Fourier inversion, this result also gives a transform analogue of Theorem 8.5.1.) Theorem 13.7.2. Suppose f ∈ L2 (R) has the property that |f (x)| ≤

C for some con|x|p

stants C > 0 and p > 2. Then fˆ ∈ C k (R) for 0 ≤ k ≤ p − 1. Proof. Problem 13.7.2.

13.7.2

The sampling theorem

Suppose a signal f (x) has the property that for some b > 0, fˆ(γ) = 0 for all |γ| > b. We may think of this condition as stating that the frequencies present in f are limited to some finite range, and so we say that f is band-limited. The sampling theorem states that if f is band-limited, then f may be reconstructed by taking “samples” (finding values) of f at discrete time intervals. Specifically: Theorem 13.7.3. For f ∈ L2 (R), suppose that fˆ(γ) = 0 for all u ∈ R such that |γ| > 12 . Then X f (x) = f (n) sinc(π(x − n)). (13.7.1) n∈Z

The sampling theorem says, roughly, that if f is band-limited with (absolute) frequency |γ| ≤ 12 , then at least in principle, we can recover f by sampling at twice that rate (frequency 1). As a real-life example, the reason that audio compact discs (and later, MP3 sound files) were designed to work at a sample rate of 44.1 kHz (44,100 cycles per second) is that human hearing is only sensitive to frequencies ranging up to about 20 kHz.

13.7. MISCELLANEOUS APPLICATIONS OF THE FOURIER TRANSFORM

305

Proof. Problem 13.7.3. Remark 13.7.4. A number of authors have been cited as discovering the sampling theorem. The first explicit published result is due to Shannon (1949), but the basic idea appeared in print as early as 1928 in work of Nyquist. The sampling theorem is therefore sometimes called the Nyquist-Shannon sampling theorem (or just the Nyquist sampling theorem), and the frequencies corresponding to 21 and 1 in Theorem 13.7.3, scaled appropriately, are known as the Nyquist frequency and the Nyquist rate, respectively. See Oppenheim, Willsky, and Nawab [OWN97, Sec. 7.1.1] for a discusssion.

13.7.3

The continuous-time Wiener-Khinchin theorem

Recall that in Section 8.5.4, we considered statistics coming from (discrete) time series x : Z → C and their frequency responses (i.e., discrete-time Fourier transforms) x ˆ : S 1 → C. 2 Specifically, if we define the power spectrum of x to be Sx (γ) = |ˆ x(γ)| , which we think of as describing the (continuous) distribution of signal power among all frequencies, and we define the autocorrelation function of x(t) with time lag τ to be rx (τ ) = hx(t), x(t − τ )i, which we think of as describing how much x and x time-shifted by τ are correlated, then the discrete-time Wiener-Khinchin Theorem 8.5.12 says Z rx (τ ) =

1

Sx (γ)eτ (γ) dγ.

(13.7.2)

0

In other words, rx (τ ) is precisely the (−τ )th Fourier coefficient of the power spectrum Sx (γ). In the continouous setting, we proceed in exactly the same manner to get the analogous result. For the convenience of the reader, we will proceed independently of the discrete discussion, though we will be a bit more terse to avoid excess repetition. Throughout, we use the variable t for the function variable and γ for the transform variable. Definition 13.7.5. Let x ∈ L2 (R) represent a (continuous-time) signal. Because Theorem 12.5.4 can be interpreted as saying that the Fourier transform preserves the total power of x(t), we can think of the function Sx (γ) = |ˆ x(γ)|2 as describing the (continuousfrequency) distribution of signal power among all frequencies. We therefore define Sx (γ) to be the power spectrum of x(t). Definition 13.7.6. Statistically, we may interpret the L2 (R) inner product hx(t), y(t)i = Z ∞ x(t)y(t) dt as describing the extent to which x and y are “pointed in the same direction”, −∞

or correlated. For τ ∈ R, we therefore define Z

∞

rx (τ ) = hx(t), x(t − τ )i =

x(t)x(t − τ ) dt

(13.7.3)

−∞

to be the autocorrelation function of x(t) with time lag τ , since rx (τ ) describes how x(t) is correlated with x(t − τ ) (i.e., x(t) shifted by time τ ).

306

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

In the above terms, the following theorem describes the relationship between the autocorrelation function and the power spectrum of a time series x(t). Theorem 13.7.7 (Continuous-time Wiener-Khinchin). For x(t) ∈ L2 (R), we have Z ∞ rx (τ ) = Sx (γ)e2πiγτ dγ. (13.7.4) −∞

In other words, the autocorrelation function value rx (τ ) is precisely the inverse Fourier transform of the power spectrum Sx (γ). Proof. Problem 13.7.4. Remark 13.7.8. Again, as in Section 8.5.4, Theorem 13.7.7 is really an “easy” case of the Wiener-Khinchin theorem, in that Wiener’s theorem extends (13.7.4) to a situation where the required Fourier transforms may not be defined, and Khinchin’s contribution is to extend Wiener’s result to the case where x(t) is a random variable. As before, however, we hope that Theorem 13.7.7 gives a flavor of what Wiener-Khinchin says.

Problems 13.7.1. (Proves Theorem 13.7.1 ) Suppose f : R → C is a function such that for some k ≥ 1, we have that f, f 0 , . . . , f (k) are all in C 0 (R) ∩ L2 (R) and f (k) ∈ L1 (R). (k) is a bounded function on R. (a) Prove that fd

(b) Prove that there exists some C > 0 such that fˆ(γ) ≤

C |2πγ|k

for all γ ∈ R. (Sugges-

tion: Theorem 12.5.8.) 13.7.2. (Proves Theorem 13.7.2 ) Suppose f ∈ L2 (R) has the property that |f (x)| ≤

C |x|p

for some constants C > 0 and p > 2. (a) Prove that |f (x)| ≤ max(1, |f (x)|2 ) ≤ 1 + |f (x)|2 . (b) Prove that for 0 ≤ k ≤ p − 1, we have that xk f (x) ∈ L1 (R). (Suggestion: Consider Z 1 Z ∞ Z −1 k k k x f (x) dx, x f (x) dx, and x f (x) dx separately.) −1

−∞

1

(c) Prove that if g ∈ and |g(x)| ≤ D/ |x|q for some constants D > 0 and q > 1, then gˆ(γ) is continuous on R. L2 (R)

(d) Prove that fˆ(k) (γ) =

Z

∞

(2πix)k f (x)e2πiγx dx

−∞

for 0 ≤ k ≤ p − 1. (Suggestion: Theorem 4.8.8.)

(13.7.5)

13.7. MISCELLANEOUS APPLICATIONS OF THE FOURIER TRANSFORM

307

13.7.3. (Proves Theorem 13.7.3 ) Suppose f ∈ L2 (R) has the property that for all γ ∈ R 1 such that |γ| > , fˆ(γ) = 0. 2 (a) Considering fˆ as a function in L2 ([− 12 , 12 ]) = L2 (S 1 ), let cn be the nth Fourier series coefficient of fˆ. Prove that cn = f (−n). (b) Prove that fˆ(γ) =

X f (n)e−2πinγ 

if |γ| ≤ 12 ,

n∈Z

(13.7.6)

if |γ| > 12 ,

 0

where convergence is in L2 (R). (Note the signs and variable names.) (c) Use Fourier inversion to prove that f (x) =

X

f (n) sinc(π(x − n))

n∈Z

in L2 , where sinc is defined in Section 13.1. (Suggestion: Use continuity of the iner product; be careful about which inner product you use.) 13.7.4. (Proves Theorem 13.7.7 ) Taking t as our function variable and γ as our transform variable, for x ∈ L2 (R), define Sx (γ) = |ˆ x(γ)|2 . Z ∞ (a) Explain why the integral Sx (γ)e2πiγτ dγ is well-defined, even though Sx may not be in L2 (R).

−∞

(b) Prove that Z

∞

hx(t), x(t − τ )i =

Sx (γ)e2πiγτ dγ.

−∞

(Suggestion: Table 12.3.1 and Theorem 12.5.4.)

(13.7.7)

308

CHAPTER 13. APPLICATIONS OF THE FOURIER TRANSFORM

Chapter 14

What’s next? One of the most profound ideas in mathematics, the Langlands program, relates number theory to function theory (harmonic analysis) on very special moduli spaces. . . . This is an extremely exciting and active area of mathematics, which counts among its recent triumphs the proof of Fermat’s Last Theorem. . . . — David Ben-Zvi, “Moduli spaces”, The Princeton Companion to Mathematics The reason why quantum probability is potentially useful for modeling the ambiguity and contextuality of humor is that, whereas in classical probability theory, events are drawn from a common sample space, in quantum probability theory, events can be drawn from different sample spaces. States and variables are defined with respect to a particular context, represented using a basis vector in a Hilbert space. . . . — Liane Gabora, “Toward a Quantum Model of Humor”, Psychology Today, April 6, 2017 As mentioned in the Introduction, while this book aims to be a satisfying “last math book you’ll ever read”, our not-so-secret hope is that it won’t be the last math book you’ll ever read. This raises the question: What’s next? As the reader can see from the epigraphs above, the places one can go next range from the sublime to (the study of) the ridiculous, and in this chapter, we briefly discuss a few of those places. Specifically, besides the most natural next stop of learning more analysis (Section 14.1), we describe where the reader can go to learn more about signal processing and distributions (Section 14.2), wavelets (Section 14.3), quantum mechanics (Section 14.4), spectral methods and number theory (Section 14.5), and abstract harmonic analysis (Section 14.6). Note that while not all of the references we give in each section will be accessible to the reader based only on the background from this book, we believe that each section gives some kind of a starting place on each topic for a reader who has absorbed the material presented here. Throughout, we stay mainly in descriptive mode, avoiding proofs to concentrate on the big picture. 309

310

14.1

CHAPTER 14. WHAT’S NEXT?

What’s next: More analysis

As also discussed in the Introduction, probably the biggest gap in this book is our axiomatic approach to the Lebesgue integral, and the reader interested in continuing to study analysis should now go back and learn integration theory for real. Canonical sources include Royden and Fitzpatrick [RF10] and Rudin [Rud86]; the former has long been used in graduate analysis, and while the latter can be tough sledding for newcomers, it is an authoritative reference. More recent, and more accessible, sources include Nelson [Nel15] and Johnston [Joh15]; the former takes a fairly standard approach, and the latter takes an unusual approach (the Daniell-Riesz integral) that defines the Lebesgue integral without having to develop measure theory first. For viewpoints on Fourier analysis that are complementary to what we have done in this book, see K¨ orner [K¨ or89] and Stein and Shakarchi [SS03]. We should also give special mention to Holland [Hol07], which is more oriented towards applications than proofs, but whose choice of topics nevertheless greatly influenced ours. In particular, Holland is a good place to see more of the Sturm-Liouville theory found in our Chapter 11, as is AlGwaiz [AG08]. For graduate-level Fourier analysis, we mention Dym and McKean [DM85] and Terras [Ter13]. The former is a classic, and the latter bends in the direction of number theory, but both are favorites of the author. The reader interested in Fourier analysis and number theory will eventually need a foundation in complex analysis, that is, the calculus of functions f : C → C, as opposed to the functions f : R → C that are our main focus. Classic sources include Conway [Con78, Con96] and Ahlfors [Ahl79]; more recent and more accessible texts include Bak and Newman [BN10] and Needham [Nee99].

14.2

What’s next: Signal processing and distributions

Probably the most common everyday use of Fourier analysis is, in one form or another, the study of signal processing. To paraphrase the foreward to Oppenheim, Willsky, and Nawab [OWN97], the basic idea of signal processing is to study a system that takes as input a signal f (t) and produces some corresponding output f˜(t). In these terms, two of the main problems in signal processing can be stated (somewhat abstractly) as follows. 1. Given a system and an input signal f (t), describe/predict what the output f˜(t) will be. 2. Design a system that will produce an output f˜(t) with desired characteristics from a given f (t). To give one concrete example, recall (Theorem 12.5.8) that for sufficiently “nice” functions f, g : R → C, the convolution f ∗ g has the property f[ ∗ g(γ) = fˆ(γ)ˆ g (γ).

(14.2.1)

14.2. WHAT’S NEXT: SIGNAL PROCESSING AND DISTRIBUTIONS

311

So if, for example, we want to remove all frequences with |γ| > Γ from the signal f , we can take the convolution of f with a function g such that ( 1 if |γ| ≤ Γ, gˆ(γ) = (14.2.2) 0 otherwise. By Table 13.1.1, the inversion theorem, and the fact that sinc is an even function, we see that taking the convolution of f (t) with g(t) = 2Γ sinc(2πΓt)

(14.2.3)

works. Convolution with g(t) is therefore called an ideal lowpass filter, in that only the low-magnitude frequencies of f (t) are allowed to “pass through”. Convolution can also be used, for example, to simulate the reverb (echoing, or lack thereof) of a specific audio environment, real or imagined; see Opitz [Opi96]. One curious aspect of signal processing, which is the most obviously applicable of the topics discussed in this chapter, is that everyday signal processing often uses one of the more sophisticated concepts in analysis we will discuss here, namely, that of a distribution. To be specific, the reader who looks at a standard engineering table of Fourier transforms may notice that many rows have mysterious entries like ˆ δ(γ) = 1,

(14.2.4)

or in other words, the Fourier transform of the Dirac delta function is the constant function 1. On the one hand, this is useful, as delta functions and constant signals certainly model the real-life phenomena of impulse signals and constant inputs/outputs. On the other hand, the alert reader will recall that we have said many times that δ(x) is not a function (see Section 8.3), and may also notice that the constant function 1 is contained in neither L2 (R) nor L1 (R). What this means is that, for entirely practical reasons, we need to extend the Fourier transform to the following class of objects. Definition 14.2.1. Recall that Cc∞ (R) is the space of smooth functions with compact support. A distribution is a linear function Λ : Cc∞ (R) → C that is continuous on Cc∞ (R) in a particular sense (see Rudin [Rud91, Ch. 6] for precise details), and a tempered distribution is a continuous linear function Λ : S(R) → C. (Note that a fortiori, every tempered distribution defines a distribution, and in fact one can think of tempered distributions as distributions that are not too “wild” at infinity.) For example, if f : R → C is a locally (Lebesgue) integrable (see Definition 4.8.1) then the function Λf : Cc∞ (R) → C given by Z ∞ Λf (ϕ) = f (x)ϕ(x) dx (14.2.5) −∞

is a distribution. It follows that distributions generalize a wide class of functions on R. For another example, define δ : Cc∞ (R) → C by δ(ϕ) = ϕ(0).

(14.2.6)

312

CHAPTER 14. WHAT’S NEXT?

This is also a distribution, and so we finally have a rigorous definition of the Dirac delta “function”. Distributions allow us to extend differential calculus to many functions and distributions (like δ(x)) that are not differentiable in the ordinary sense. For example, if Λ is a distribution, then imitating the identity Z ∞ Z ∞ 0 f (x)ϕ(x) dx = − f (x)ϕ0 (x) dx (14.2.7) −∞

−∞

coming from (what else?) integration by parts for f ∈ C 1 (R) and ϕ ∈ Cc∞ (R), we define the derivative of Λ to be Λ0 (ϕ) = −Λ(ϕ0 ). Similarly, if Λ is a tempered distribution, then imitating “pass the hat” (Theorem 12.3.7), we define the Fourier transform of Λ by ˆ Λ(ϕ) = Λ(ϕ). ˆ For more about distributions and their use in solving differential equations, see Rudin [Rud91] and H¨ ormander [H¨or90]. For the discrete/algebraic-minded, we would be remiss not to mention the discrete Fourier transform (DFT), which, for a fixed N ∈ N, is a transform on functions f : {0, . . . , N − 1} → C defined by N −1 1 X fˆ(k) = f (n)e−2πink/N . N

(14.2.8)

n=0

We see that the DFT is both an analogue of the Fourier transform and, taking x = n/N , also a discrete approximation of Fourier series on S 1 . Perhaps most notably, the DFT can be computed via the fast Fourier transform (FFT), with an exponential (N log N vs. N 2 ) decrease in required computing time. That speedup, and the ubiquity of signal processing in modern life, is why the FFT was named one of the top 10 algorithms of the 20th century (Dongarra and Sullivan [DS00]). For more on the DFT and FFT, see Rockmore [Roc00].

14.3

What’s next: Wavelets

As we have seen, in terms of signal processing (i.e., taking time t to be our function variable), Fourier analysis (either series or transform) takes a signal f (t) and encapsulates the part of that signal occuring at a single frequency n ∈ Z (or γ ∈ R) in the Fourier coefficient fˆ(n) (or fˆ(γ)). However, the resulting transform fˆ (in either sense) is not localized in time. Indeed, from the beginning, we have thought of Fourier coefficients as being averages over time (see Section 6.1), and one of the most interesting features of Fourier series is how they transform local features of f (t), like differentiability, into global features of fˆ(n), like decay as x → ±∞ (see Sections 6.4 and 8.5.1.) In contrast, wavelets localize a signal both in frequency and in time. While a fuller discussion is beyond the scope of this book, we will illustrate the idea of time localization with the simplest, and oldest (Haar [Haa10]) example of a wavelet transform: the Haar wavelet basis for L2 ([0, 1]).

14.3. WHAT’S NEXT: WAVELETS

313

Definition 14.3.1. The Haar wavelet family, as illustrated schematically in Figure 14.3.1, is defined to be the set of all wnk ∈ L2 ([0, 1]) (n ≥ 0, 0 ≤ k ≤ 2n − 1) defined by

wnk (x) =

  −1     1      0

k 1 k ≤ x < n + n+1 , n 2 2 2 k 1 k+1 if n + n+1 ≤ x < , 2 2 2n otherwise,

if

(14.3.1)

along with the constant function 1.

0

1

n= 0

n= 1

n= 2 Figure 14.3.1: The functions wnk from the Haar wavelet family In terms of theory, our main point about the Haar family {1, wnk } is: Theorem 14.3.2. The Haar family B = {1, wnk } is an orthogonal basis for L2 ([0, 1]). Proof. Problem 14.3.1 shows that B is an orthogonal set. To prove that B is an orthogonal basis, we first show that (Problem 14.3.2): For every g ∈ C 0 ([0, 1]) and every > 0, there exists a finite subset S ⊆ [0, 1] and a (finite) linear combination h(x) of elements of B = {1, wnk } such that for all x ∈ [0, 1] and x ∈ / S, we have that |g(x) − h(x)| < . Since the integral of a piecewise continuous function is not affected by its values at finitely many points, this means that for every > 0, there exists a (finite) linear combination h(x) of elements of B = {1, wnk } such that the L2 norm kg − hk < . The theorem then follows by the same reasoning as the proof of the Inversion Theorem for Fourier Series 8.1.1 in Section 8.4. In practical terms, the time localization properties of wavelets make them particularly useful for many applications. For example, consider a periodic signal f (t) with a discontiuity at one particular location in S 1 . While that discontinuity will affect every Fourier coefficient

314

CHAPTER 14. WHAT’S NEXT?

fˆ(n), the time-localized nature of a wavelet series (e.g., the generalized Fourier series of f with respect to the Haar basis) means that the same discontiuity will affect only a relatively sparse set of wavelet coefficients. The multiresolution nature of many wavelet families also makes them useful for data compression. To give an idea of why this might be true, consider the Haar wavelet series of f ∈ L2 ([0, 1]) converging to f in L2 : The 1 term is a function with the same average value as f on [0, 1]; the w00 term then corrects this to a function with the same average value as f on both [0, 21 ] and [ 21 , 1]; the w1k terms correct this to a function with the same average value as f on each interval [ k4 , k+1 4 ], and so on. This idea of an approximation being resolved first coarsely and then at successively finer scales may seem familiar to any reader who has ever tried to view a slow-loading image over the internet, and this is no coincidence: Multiresolution wavelets are actually incorporated in, for example, the JPEG2000 image compression standard. See Van Fleet [Fle] for a discussion of both the wavelets used in JPEG2000 and the discrete cosine transform used in the older JPEG standard. For authoritative references on wavelets, see, for example, Daubechies [Dau92] and Mallat [Mal08]; for approachable introductions, see Nievergelt [Nie00] and Walker [Wal08].

Problems 14.3.1. (Proves Theorem 14.3.2 ) For n ≥ 0, 0 ≤ k ≤ 2n − 1, let  k 1 k  −1 if n ≤ x < n + n+1 ,    2 2 2  1 k+1 k wnk (x) = , 1 if n + n+1 ≤ x <    2 2 2n   0 otherwise,

(14.3.2)

and let B = {1, wnk }. Prove that B is an orthogonal set in L2 ([0, 1]). (Suggestion: Consider the cases h1, wnk i, hwnk , wn` i, and hwnk , wm` i for n < m, and keep Figure 14.3.1 in mind.) 14.3.2. (Proves Theorem 14.3.2 ) Let wnk be defined as in (14.3.2), and for n ≥ 0, 0 ≤ k ≤ 2n − 1, let  1 if k ≤ x < k + 1 , χnk = (14.3.3) 2n 2n 0 otherwise. Note that χ00 is precisely the constant function 1, except at x = 1. (a) Prove that each of χ10 and χ11 is a linear combination of χ00 and w00 . (b) Use induction on n to prove that each χnk (n ≥ 0, 0 ≤ k ≤ 2n − 1) is a linear combination of {χ00 } ∪ {wmk | m < n, 0 ≤ k ≤ 2m − 1}. (c) Fix g ∈ C 0 ([0, 1]) and > 0. Prove that there exists a finite subset S ⊆ [0, 1] and a (finite) linear combination h(x) of elements of B = {1, wnk } such that for all x ∈ [0, 1] and x ∈ / S, we have that |g(x) − h(x)| < . (Suggestion: Use {χnk } instead of B and use the uniform continuity of g.)

14.4. WHAT’S NEXT: QUANTUM MECHANICS

14.4

315

What’s next: Quantum mechanics

Another prominent application of the Fourier analysis and the Hilbert space theory we have discussed is in quantum mechanics. Here, we will go into a bit more detail than with the other sections in this chapter, because the general framework we discuss here, adapted from Nielsen and Chuang [NC11], is not easy to find in one place, and we think it may be helpful to the reader as a guide in studying other sources. Specifically, so far in this book we have discussed the quantum mechanics of a single particle moving in one dimension. In Section 11.7, we discussed solutions to Schr¨odinger’s equation and the discrete-valued observable energy, and in Section 13.5, we looked at the continuous-valued observables position and momentum. In this section, we describe a set of axioms for quantum mechanics that encompasses all of these aspects in a common framework. We begin by defining calculus for functions taking values in a Hilbert space. The definitions should seem familiar, but they do need to be restated in this setting. Definition 14.4.1. Let X be a subinterval of R and let H be a Hilbert space. For c ∈ X, to say that Φ : (X\ {c}) → H has a limit f ∈ H as t approaches c means that for every > 0, there exists some δ() > 0 such that if t 6= c and |t − c| < δ(), then kΦ(t) − f k < . In that case, we write lim Φ(t) = f. (14.4.1) t→c

Definition 14.4.2. Let X be a subinterval of R and let H be a Hilbert space. To say that Φ : X → H is differentiable at c ∈ X means that the limit 1 f = lim (Φ(t) − Φ(c)) (14.4.2) t→c t − c dΦ (c) = Φ0 (c) = f . dt We also need to define some new operator terminology, though to shorten the discussion, we will keep certain technical points vague. Let T be an operator in a Hilbert space H. To say that T is self-adjoint means that T is Hermitian and D(T ) is “sufficiently large” (see Reed and Simon [RS80, VIII.2] for a precise definition). If T is a self-adjoint operator, then the spectrum σ(T ) of T is the set of all λ ∈ C such that the operator λI − T (I is the identity on H) is not a bijection of D(T ) onto H with a bounded inverse. For example, if λ is an eigenvalue of T , then λI − T is not invertible, so λ ∈ σ(T ). We may now define quantum mechanics in terms of four axioms. exists. In that case, we define

1. Axiom 1: State space. For any isolated physical system (e.g., an electron, or the universe), there is a Hilbert space H, called the state space of the system. The state of that system at a given time t is represented by a unit vector Ψ(t) ∈ H. Note that if, for example, H = L2 (R), then Ψ(t) is a function on R, so we may prefer to think of Ψ as a function of two variables Ψ(x, t). Note also, however, that it is sometimes useful to consider finite-dimensional (and therefore, more algebraic in flavor) Hilbert spaces, like Cn with the dot product (Example 7.1.5).

316

CHAPTER 14. WHAT’S NEXT?

2. Axiom 2: Time evolution. For any isolated physical system with state space H, the state Ψ(t) of the system changes in one of the following (equivalent) ways: • (2C) Continuous time evolution. There exists a self-adjoint linear operator H on H, called the Hamiltonian of the system, such that for any time t, H(Ψ(t)) = i~

dΨ , dt

(14.4.3)

where the derivative is taken in the sense of Definition 14.4.2, and ~ is Planck’s constant. (For simplicity, we have been pretending that ~ = 1 in this book, but the reader should be aware of its presence in other sources.) • (2D) Discrete time evolution. For any two times t1 < t2 , there exists a unitary operator (see (12.4.3) in the statement of Theorem 12.4.2) U (t1 , t2 ) on H such that U (t1 , t2 )Ψ(t1 ) = Ψ(t2 ). (14.4.4) For an explanation (in one direction) as to why (2C) and (2D) describe the same idea, see Problem 14.4.1. 3. Axiom 3: Observables. An observable quantity (position, momentum, spin, etc.) of our isolated system is represented by a self-adjoint operator M on H. The basic idea is that when we have a system that is currently in state Ψ, and we measure the observable corresponding to M , the state of the system collapses into a randomly chosen (generalized) eigenstate (i.e., generalized eigenfunction) of M , with a probability distribution determined, roughly speaking, by the coordinates of Ψ relative to a orthogonal basis of (generalized) eigenstates. More specifically, here are the two cases of measurement we have discussed previously. • Discrete spectrum. If {ψn } is an orthonormal eigenbasis for M with corresponding eigenvalues λn , then as mentioned in Section 11.7, the only possible values of the observable are the λn , and upon measurement in state Ψ ∈ H, the state of the system collapses into the eigenstate ψn corresponding to the observed value λn with probability |cn |2 . • Continuous spectrum. If H = L2 (R) and M is the multiplication operator M (f (x)) = xf (x), then as mentioned in Section 13.5, the possible values of the observable are all x ∈ R. If we measure the YES/NO question “Is the value of Z b the observable x ∈ [a, b]?”, the answer is YES with probability |Ψ(x, t)|2 dx, a

and when the answer is YES, the state of the system collapses to the state Ψ1 = Ψ0 / kΨ0 k, where ( Ψ(x, t) if a ≤ x ≤ b, Ψ0 (x) = (14.4.5) 0 otherwise.

14.4. WHAT’S NEXT: QUANTUM MECHANICS

317

In general, an observable, or actually, any self-adjoint operator, can have a mix of the two behaviors (eigenvalues and multiplication operators); see Reed and Simon [RS80, Thm. VIII.4] for a precise statement. For more on the physical interpretation of measurement, see Braginsky, Khalili, and Thorne [BKT92]. 4. Axiom 4: Composite systems. The last axiom is the most algebraic, so we only discuss it briefly and for the sake of completeness. Suppose H1 and H2 are the state spaces of two isolated physical systems. Then the state space of the composite system is the tensor product H1 ⊗H2 . Without going into the details of the definition of H1 ⊗ H2 , suffice it to say that H1 ⊗ H2 is also a Hilbert space, and if {ϕ1m } and {ϕ2n } are orthogonal bases for H1 and H2 , respectively, then there exists an orthogonal basis {ϕ1m ⊗ ϕ2n } for H1 ⊗ H2 . For finite-dimensional H1 and H2 , this means that instead of adding the dimensions of H1 and H2 to get the dimension of H1 ⊗ H2 , we multiply them. It turns out that one of the more interesting recent applications of this abstract framework for quantum mechanics comes in the study of quantum computing. In one common approach (see Nielsen and Chuang [NC11]), the fundamental unit of quantum computing is the qubit, which is a quantum system with state space H = C2 . The Hamiltonian of such a system is represented by a 2 × 2 Hermitian matrix H, and Axiom 2 becomes dΨ , (14.4.6) dt where both sides are interpreted in terms of functions with values in C2 . In quantum computation, we think of a qubit as the quantum analogue of a 0/1 bit from classical computation. Thanks to Axiom 4, the complexity (dimension) of an n-qubit quantum system increases exponentially as a function of n (in fact, dim = 2n ), as opposed to the linear growth of a classical n-bit system, which is why certain apparently exponential problems, like factoring integers, can, in theory, be solved efficiently on a quantum computer. Probably the most notable “killer app” of quantum computation is Shor’s celebrated factoring algorithm, which, if implemented in a scalable fashion, would render most of the commonly used public-key encryption schemes used in almost every secure Internet transaction useless. We mention Shor’s algorithm because the heart of that algorithm is another exponential speedup in the time requred to compute a certain very specialized DFT ((log N )3 vs. N log N for the FFT). Compare Section 14.2, and see Nielsen and Chuang [NC11, Ch. 5] for details. For the reader interested in learning quantum mechanics from a physics point of view, see Griffiths [Gri04] and Shankar [Sha94]. For graduate-level mathematical introductions to quantum mechanics, see Hall [Hal13] and Teschl [Tes09], and for the operator theory used in quantum mechanics, see Reed and Simon [RS80]. H(Ψ(t)) = i~

Problems 14.4.1. In this problem, if A is a complex matrix, then A∗ denotes the conjugate transpose of A (i.e., take the transpose of A and the complex conjugate of its entries). You may also

318

CHAPTER 14. WHAT’S NEXT?

take it as given that if A and B are complex matrices, and the product AB is defined, then (AB)∗ = B ∗ A∗ ; and if furthermore, the entries of A and B are differentiable functions of t, then d dA dB (AB) = B+A . (14.4.7) dt dt dt (Note that since matrices do not commute in general, we must be careful of the order of multiplication here.) (a) Let hx, yi denote the standard dot product in Cn (Example 7.1.5). Prove that for x, y ∈ Cn , hAx, yi = hx, A∗ yi. (b) Recall that for n×n matrices U and H, to say that U is unitary means that hU x, U yi = hx, yi for all x, y ∈ Cn , and to say that H is Hermitian means that hHx, yi = hx, Hyi for all x, y ∈ Cn . Prove that U is unitary if and only if U U ∗ = I, and H is Hermitian if and only if H = H ∗ . (c) Now suppose U (t) is a family of unitary matrices whose entries are differentiable funcdU ∗ U is Hermitian. (Suggestion: Differentiate both sides of tions of t. Prove that i dt ∗ U U = I.) (d) Suppose Ψ(t) = U (t)Ψ0 , where U (t) is a family of unitary matrices whose entries are differentiable functions of t, and Ψ0 ∈ Cn is some (constant) initial state. Prove that dΨ i = H(t)Ψ(t) for some family H(t) of Hermitian matrices. (Suggestion: Insert dt ∗ I = U U at an appropriate place.)

14.5

What’s next: Spectra and number theory

Recall that in Chapter 11, we solved the heat and wave equations in 1 physical dimension, ∂2 and the key point was to find the eigenvalues of the 1-dimensional Laplacian ∆ = − 2 , ∂x subject to our desired boundary conditions, such as the Dirichlet conditions that u(x, t) = 0 when x is on the boundary of the physical region in question. In this section, we begin by asking, what happens if we try to solve the heat and wave equations on a domain U in Rn ? First, the PDEs for the heat and wave equations are not that different from their 1dimensional versions. Specifically, if we define the n-dimensional Laplacian to be ∆=−

∂2 ∂2 − · · · − , ∂x1 2 ∂xn 2

(14.5.1)

then the heat and wave equations become ∆(u) = −

∂u , ∂t

∆(u) = −

∂2u , ∂t2

(14.5.2)

respectively. It follows that if we know the eigenvalues of ∆ (again, subject to our desired boundary conditions for our domain U ), then the eigenbasis method of Section 11.2 works just as well as in the 1-dimensional versions, and we can proceed as before.

14.5. WHAT’S NEXT: SPECTRA AND NUMBER THEORY

319

However, as for computing those eigenvalues, because the geometry of a 2- or 3-dimensional domain U can be far more complicated than that of an interval, the geometry of U and its boundary becomes a dominant factor. For clarity, let us focus on one specific problem: Dirichlet eigenvalue problem: For a given domain U ⊆ R2 with boundary a simple closed curve, find all nonzero u : U → C such that ∆(u) = λu on the interior of U and u(x, y) = 0 for all (x, y) on the boundary of U . The reader who now pauses to draw a few random examples of closed curves can probably imagine how complicated the geometry of U and its boundary can get. Indeed, aside from the cases of a few highly symmetric domains like rectangles and discs (see Holland [Hol07, Ch. 7]), computing exact values of eigenfunctions and eigenvalues is an intractable task. What we can do instead is to try to understand the qualitative behavior, especially the asymptotic behavior, of the eigenvalues of the Laplacian on a given domain U . To focus on one classic example, Kac [Kac66] famously asked: Can you hear the shape of a drum? That is, if the Laplacian (with Dirichlet boundary conditions) has exactly the same eigenvalues on two domains U1 and U2 , is it always the case that U1 is congruent to U2 ? This is called “hearing” the shape of a drum because the sound of an idealized drum of shape U is determined by the eigenvalues of the Laplacian on U with Dirichlet boundary conditions. Now, while this may seem unlikely at first, Kac points out that one can “hear” the area and the perimeter of U , and one can even hear if U is perfectly circular, making the question a little less clear. Indeed, it was not until about 25 years later that isospectral (i.e., same eigenvalues) but nonisomorphic drums were constructed by Gordon, Webb, and Wolpert [GWW92a, GWW92b]; moreover, Zelditch [Zel00, Zel09] later showed that if you know that a drum satisfies some mild symmetry conditions (e.g., having at least one mirror symmetry), then remarkably, you can hear its shape. Going in another direction, recall (Section 8.5.3) that for 1, the Riemann zeta function is defined by ∞ X 1 ζ(s) = . (14.5.3) ns n=1

Remarkably, this definition can be extended uniquely to a holomorphic (i.e., complex differentiable) function on all s 6= 1 in C. This extended function is known to have zeros when 1 s is a negative even integer (the so-called trivial zeros) and when 1 (possibly dim V = ∞), then the set GL(V ) of all invertible linear operators on V , with the operation of composition, is a group. If H is a Hilbert space, then the set U (H) of all unitary linear operators T on H (i.e., operators such that hT (f ), T (g)i = hf, gi for all f, g ∈ H) is a group. The first part of the next definition should be familiar to the reader who has taken abstract algebra, but the rest of the definition may be less so. Definition 14.6.4. A homomorphism from a group G to a group H is a map ϕ : G → H such that ϕ(g1 g2 ) = ϕ(g1 )ϕ(g2 ) for all g1 , g2 ∈ G. A (linear) representation of G is a homomorphism ϕ : G → GL(V ) for some vector space V , and a unitary representation of G is a homomorphism ϕ : G → U (H) for some Hilbert space H. The dimension of a representation ϕ : G → GL(V ) is the dimension of its underlying vector space V . We also have the following definition, which even a reader who has experience with both abstract algebra and topology may not have seen before. (Note that this is actually only a special case of the usual definition, chosen to avoid discussing point-set topology.) Definition 14.6.5. If G is a group with operation ∗ that is also a subset of a metric space X, to say that G is a topological group means that the operations of ∗ and inversion define continuous maps ∗ : G × G → G and −1 : G → G. In that case, to say that G is compact means that G is a compact subset of X (Definition 2.6.6), and to say that G is locally compact means that for every g ∈ G, there is an open neighborhood U of g (Definition 2.6.1) whose closure (intersection of every closed set containing U ) is compact. Example 14.6.6. It can be shown that the groups S 1 and SO3 (R) are compact, and the groups R and SL2 (R) are locally compact but not compact. The above background material provides the language we now use to describe abstract harmonic analysis, the generalization of Fourier analysis to many topological groups G. For example, when G is abelian, we have the following key idea. ˆ be the set of all 1-dimensional unitary Definition 14.6.7. For an abelian group G, let G representations of G (which can be shown to be precisely the homomorphisms ϕ : G → S 1 ). ˆ the dual group of G, as it can be shown that G ˆ has the structure of an abelian We call G group under the operation of multiplication of functions. ˆ is countable, and functions on G are the L2 limit When G is a compact abelian group, G of a generalized Fourier series; in other words, we have much the same situation as Fourier ˆ = Z, and we get ordinary Fourier series on S 1 . If G series on S 1 . In fact, if G = S 1 , then G ˆ will be more like R, and instead of is a locally compact, but not compact, abelian group, G Fourier series, we have something more like the Fourier transform on R. In fact, if G = R,

322

CHAPTER 14. WHAT’S NEXT?

ˆ = R, and we get the ordinary Fourier transform on R. See Loomis [Loo11] for an then G account. When G is a compact nonabelian group, we again get series-type behavior, but instead of only 1-dimensional representations, we need to use representations of finite (but possibly arbitrarily high) dimension. To give a concrete example, take G = SO3 (R). There (and indeed, in any locally compact group) one can define an analogue of the Lebesgue integral via what is known as Haar measure, and one can define spaces like L2 (SO3 (R)). Harmonic analysis on SO3 (R) can then be expressed as follows: For every n ≥ 0, there exists a (2n + 1)-dimensional unitary representation ϕn of G such that for any f ∈ L2 (SO3 (R)), we have ∞ X f (g) = (2n + 1) tr(fˆ(n)ϕn (g)), (14.6.3) n=0

where fˆ(n) is a (2n + 1) × (2n + 1) matrix-valued generalized Fourier coefficient, and tr is the matrix trace (sum of the diagonal entries). See Dym and McKean [DM85, Ch. 4] for the specifics of SO3 (R), and see Loomis [Loo11, Ch. VIII] for the general compact nonabelian group. When G is a locally compact, but not compact, nonabelian group, the situation is much more complicated, and is beyond the scope of this book. For an account of what happens with G = SL2 (R) and other matrix groups over R or C, see Knapp [Kna01]. For a matrix group G(k) defined by a set of polynomial equations with a field k as a parameter, or in other words, an affine algebraic group, the problem of finding a suitable generalization of the Fourier transform is one of the central problems of what is known as the Langlands program (see Grojnowski [Gro08]). The reader should be cautioned, however, that describing the Langlands program as the search for a general notion of nonabelian Fourier transform is a bit like describing a nuclear reactor as a mechanism for boiling water: technically true, but missing out on the full flavor of what’s going on. Indeed, the Langlands program is something like a grand unified theory of classical mathematics, combining algebra, analysis, and number theory as a massive whole; in fact, with the possible exception of wavelets, every topic discussed in this chapter (including quantum mechanics; see Frenkel [Fre07]) is related to the Langlands program. Introductions to the Langlands program can be found in Gelbart [Gel84] and Knapp [Kna97]. However, perhaps the best exhibition to date of the potential power of the Langlands program is the proof of Fermat’s Last Theorem, which follows from the proof of the long-standing Taniyama-Shimura-Weil conjecture, which is in turn just one piece of the Langlands program; see Cornell, Silverman, and Stevens [CSS97] and Darmon [Dar99].

Appendix A

Rearrangements of series In this appendix, we examine the question: When does the order of summation affect the convergence or divergence of a series? For convenience, we assume that (after renumbering) the domain of every sequence is N. Definition A.1. A rearrangement of a sequence (an ) is a sequence (bn ) such that bn = aσ(n)

(A.1)

for X some bijection σ : N → N. XSimilarly, if (bn ) is a rearrangment of (an ), we also say that bn is a rearrangement of an . Our main result is that for nonnegative series (Theorem A.2), or more generally, absolutely convergent series (Corollary A.3), convergence is ind Theorem A.2. Let (an ) X be a sequence such that X an ≥ 0 for n ∈ N, and let (bn ) be a rearrangement of (an ). If an converges, then bn converges. Proof. Suppose bn = aσ(n) for some bijection σ : N → N. By the Cauchy Criterion, we know that for any > 0, there exists some Na () such that if m > k > Na (), then m X an < . (A.2) n=k

So now, for > 0, let S() = {n ∈ N | σ(n) ≤ Na ()} .

(A.3)

Since σ is a bijection, S() is a finite set, so we may define N () = max S(). Now suppose m > k > N (). Let T = {σ(n) | k ≤ n ≤ m} , (A.4) Since σ is a bijection, it maps the indices k, k+1, . . . , m injectively into T , which is contained (possibly properly) in the set {n0 | min T ≤ n0 ≤ max T }. Therefore, since the an are all nonnegative, we see that m X n=k

bn =

m X

aσ(n) ≤

max XT n0 =min T

n=k

323

an0 .

(A.5)

324

APPENDIX A. REARRANGEMENTS OF SERIES

However, since n > max S() for k ≤ n ≤ m, by definition of S(), we see that Na () < min T ≤ max T.

(A.6)

Therefore, by (A.2), m X n=k

bn ≤

max XT

an0 < .

(A.7)

n0 =min T

The theorem follows by the Cauchy Criterion. X X Corollary A.3. Any rearrangement bn of an absolutely convergent series an also converges absolutely. X X Proof. If an converges absolutely, then |bn | converges because it is a rearrangement X X of the convergent nonnegative series |an |. Therefore, bn converges absolutely. X If an converges conditionally, then rearrangements are completely unpredictable. To be precise, we have the following remarkable result, due to Riemann. X Theorem A.4 (Riemann rearrangement theorem). If an is a real-valued series that converges conditionally, then for any L ∈ R ∪ {+∞, −∞}, there is a rearrangement of X an that converges to L. X Sketch of proof. For simplicity, assume an is never 0. Let bn contain the positive terms X X X X of an , and let cn contain the negative terms. If both bn and cn converge, we would have X X X |an | = bn + cn , (A.8) X X X and an would converge absolutely. Furthermore, if bn = +∞ and cn is finite, X X X then an would diverge, and similarly for the case where bn is finite and cn = −∞. X X Therefore, it must be that bn = +∞ and cn = −∞. X So now, since an converges, Corollary 4.1.10 implies that lim an = 0, which in turn n→∞

implies that any subset of {an } must have a largest element. We may therefore rearrange the bn and cn so they are both X in decreasing order of size, and assume, by symmetry, that L ≥ 0. If L < +∞, we arrange an as follows: 1. Begin with the minimum number of positive terms bn required to achieve a sum greater than L. (This isX always possible, even after having already used any finite number of terms, because bn = +∞.) 2. Then add the minimum number of negative terms cn required to bring the partial sum back down less than L.

325 3. Keep alternating: Add positive terms until we “overshoot” L, add negative terms until we “undershoot” L, and so on. Without going into the epsilonic details, because the overshoot is always at most bn and the undershoot is always at most cn , both of which go to 0, this rearrangement has a sum that converges to L. X Similarly, if L = +∞, we arrange an as follows: 1. Begin with the minimum number of positive terms bn required to achieve a sum greater than |c1 | + 1. 2. Then add the negative term c1 , giving a total greater than 1. 3. Keep alternating: Add new positive terms to achieve a sum greater than |c2 | + 2, then add c2 ; add new positive terms to get a sum greater than |c3 | + 3, then add c3 ; and so on. Again, omitting the details, this rearrangement sums to +∞. Note that the following example shows that the issue of order of summation arises very naturally with Fourier series. Example A.5. Consider f : S 1 → C given by 1 1 for − ≤ x < . 2 2

f (x) = x

(A.9)

By Example 6.2.5 and Theorem 8.5.17, we see that f (x) = −

X (−1)n n6=0

2πin

en (x)

(A.10)

for all x ∈ S 1 except x = ± 12 , as long as we sum the series in our standard order. However, by Theorem A.4, we see that in general, we can rearrange (A.10) so that it sums to an “incorrect” value. To take a particularly extreme example, for x = 0, the right-hand side of (A.10) becomes 1 1 1 1 1 1 1 1 − 1 − + + − − + + ... , (A.11) 2πi 2 2 3 3 4 4 which indeed sums to 0 in that order, but can be rearranged to sum to any purely imaginary number we like.

326

APPENDIX A. REARRANGEMENTS OF SERIES

Appendix B

Linear algebra In this appendix, we briefly describe the basic theory of an abstract vector space over a field F , where F = C or R. This theory is not substantively necessary to the rest of this book, but the reader who has experience with linear algebra may appreciate the connections with that material. Indeed, we only ever use the case F = C, even implicitly, but we include the case F = R because it takes no extra effort and may help the reader connect with prior experience. We begin with the definition of an abstract vector space over the field of scalars F , where we will always assume that F = C or R. Definition B.1 (Vector space). We define a vector space over F to be: • A set V , whose elements are called vectors; • A binary operation + : V × V → V (vector addition), written as v + w for v, w ∈ V ; • An operation F × V → V (scalar multiplication), written as av for a ∈ F , v ∈ V ; • An element 0 ∈ V (zero element); and • For each v ∈ V , a vector −v ∈ V (negative of a vector); such that the following properties hold for all v, w, x ∈ V and r, s ∈ F : (A1) (v + w) + x = v + (w + x). (A2) v + w = w + v. (A3) v + 0 = v. (A4) v + (−v) = 0. (DL) r(v + w) = rv + rw and (r + s)v = rv + sv. (SMA) r(sv) = (rs)v. (SM1) 1v = v. 327

328

APPENDIX B. LINEAR ALGEBRA

Example B.2 (Function spaces). For a nonempty set X, we define F(X, F ) to be the set F(X, F ) = {f : X → F } ,

(B.1)

i.e., the set of all F -valued functions with domain X, and given f, g ∈ F(X, F ) and a ∈ F , we define f + g, af, 0, −f ∈ F(X, F ) by (f + g)(x) = f (x) + g(x), (af )(x) = af (x), 0(x) = 0,

(B.2)

(−f )(x) = −f (x), One may then (tediously) verify that all of the axioms of a vector space hold in F(X, F ) (Problem B.1). Of course, when one defines an algebraic object foo, the next step is usually to define what a subfoo is, and vector spaces are no exception. Definition B.3 (Subspace). Let V be a vector space over F . To say that W ⊆ V is a subspace of V means that W is a vector space over F with addition and scalar multiplication defined by restricting the operations of V . As the reader may recall from linear algebra, while Definition B.3 may be the “morally correct” definition of subspace, in practice, the following theorem serves as an equivalent definition of subspace. Theorem B.4 (Subspace theorem). For a vector space V over F and W ⊆ V , the following are equivalent: 1. W is a subspace of V . 2. The following conditions all hold: • (Zero vector) 0 ∈ W . • (Closed under addition) For all v, w ∈ V , v + w ∈ V . • (Closed under scalar multiplication) For all a ∈ F , v ∈ V , av ∈ V . Sketch of proof. Since the zero vector condition shows that W is nonempty, the most interesting thing to check is that vector addition and scalar multiplication in V are well-defined in W ; however, this is precisely the other two conditions. The axioms of a vector space all follow because they hold in the larger set V . The rest of linear algebra also works much as one might expect from previous experience, with one exception: We need to define a linear combination of an infinite set properly, as follows.

329 Definition B.5 (Linear combination). Let V be a vector space over F , and let S be a subset of V , where we do not assume that S is finite. We define a linear combination of S X0 av v (v ∈ F ), where by restricted sum (the symbol to be a restricted sum of the form v∈S

X0

), we mean that av = 0 except for finitely many v. In other words, a linear combination of S is the sum of finitely many scalar multiples of vectors in S, thus ensuring that the sum is well-defined. Note that by definition, the only linear combination of the empty set is the zero vector 0. Definition B.6 (Linear independence). For a vector space V over F and S ⊆ V , to say X0 that S is linearly independent means that if we have a linear combination av v that is v∈S

equal to 0, then every coefficient av = 0. Definition B.7 (Algebraic span). For a vector space V over F and S ⊆ V , we define the X0 algebraic span of S to be the set of all linear combinations av v. (Note that again, each v∈S

such linear combination really only involves finitely many v.) To say that S algebraically spans V means that the algebraic span of S contains V (and is therefore equal to V ). As the reader may recall: Theorem B.8. For a vector space V over F and S ⊆ V , the span of S is a subspace of V . Proof. Problem B.2. Definition B.9 (Algebraic basis). For a vector space V over F and S ⊆ V , to say that S is an algebraic basis for V means that S algebraically spans V and is linearly independent. Remark B.10. In contrast, when we define orthogonal bases (Definition 7.3.12), we implictly extend the definition of span to include “linear combinations” that are convergent infinite series. In fact, we can show that an infinite orthogonal basis for a Hilbert space H is never an algebraic basis for H (Problem B.3). As for linear independence, while it is not included explicitly in the definition of orthogonal basis, it follows from the properties of orthogonal sets of nonzero vectors; see Problem 7.3.2. Finally, recall that, in some sense, the point of linear algebra is to study the following type of functions. Definition B.11 (Linear functions). Let V and W be vector spaces. To say that a function T : V → W is linear means that for all v, w ∈ V and c ∈ F , T (v + w) = T (v) + T (w),

T (cv) = cT (v).

(B.3)

330

APPENDIX B. LINEAR ALGEBRA

Problems B.1. Prove that all of the axioms of a vector space hold in F(X, F ). (Suggestion: Checking all of the axioms is too boring to be worthwhile, so we suggest only doing a representative selection, such as (A1), (A4), (DL), and (SMA). Remember that two functions with the same domain are equal precisely when they give the same output for any given input.) B.2. (Proves Theorem B.8 ) For a vector space V over F and S ⊆ V , prove that the algebraic span of S is a subspace of V . B.3. Let H = `2 (N) (Definition 5.3.2), and recall that H is a Hilbert space (Theorem 7.6.2) with orthonormal basis B = {en | n ∈ N} (Example 7.3.16). (a) Prove that B does not algebraically span H. (Suggestion: Find a specific element of `2 (N) not contained in the algebraic span of B.) (b) Prove that any infinite orthonormal basis for a Hilbert space H does not algebraically span H. (Suggestion: Isomorphism Theorem for Fourier Series 7.6.7.)

Appendix C

Bump functions In this appendix, we prove Theorem 8.5.5, which is repeated below as Theorem C.4 for the convenience of the reader. Most of the work in proving Theorem C.4 is in the very first step. Theorem C.1. The function ϕ1 : R → R defined by ( e−1/x for x > 0, ϕ1 (x) = 0 for x ≤ 0,

(C.1)

is in C ∞ (R). Proof. Problem C.1.

Figure C.1: The “seed” ϕ1 (x) The function ϕ1 (x), as shown in Figure C.1, can be thought of as the “seed” of all bump functions, as we can grow this seed into the bump functions we need using relatively straightforward calculus. For example, the following lemmas construct a “bump with compact support” ϕ2 and a “smooth step function” ϕ3 , as shown in Figure C.2. 331

332

APPENDIX C. BUMP FUNCTIONS

Figure C.2: The C ∞ functions ϕ2 and ϕ3 Lemma C.2. If ϕ1 : R → R is defined by (C.1), then the function ϕ2 : R → R defined by ϕ2 (x) = ϕ1 (x)ϕ1 (1 − x).

(C.2)

is a C ∞ function such that ϕ2 (x) > 0 for 0 < x < 1 and ϕ2 (x) = 0 otherwise. Proof. Problem C.2. Z

1

Lemma C.3. Let ϕ2 : R → R be defined by (C.2), and let A =

ϕ2 (x) dx. Then the 0

function ϕ3 : R → R defined by ϕ3 (x) =

1 A

Z

x

ϕ2 (t) dt

(C.3)

0

is an increasing C ∞ function such that ϕ3 (x) = 0 for x ≤ 0 and ϕ3 (x) = 1 for x ≥ 1. Proof. Problem C.3. Proving Theorem C.4 (illustrated in Figure C.3 for a = 1, b = 3, δ = 1) is now a matter of precalculus. Theorem C.4. For a < b and δ > 0, there exists some ϕ : R → R such that: 1. ϕ ∈ C ∞ (R); 2. For a ≤ x ≤ b, ϕ(x) = 1; 3. For a − δ ≤ x ≤ a and b ≤ x ≤ b + δ, we have 0 ≤ ϕ(x) ≤ 1; and 4. For x ≤ a − δ and b + δ ≤ x, ϕ(x) = 0. Proof. Problem C.4.

333

Figure C.3: The bump function ϕ(x) for a = 1, b = 3, δ = 1

Problems C.1. (Proves Theorem C.1 ) Let ϕ1 : R → R be defined by (C.1). (a) Prove that if p(x) is a polynomial, then lim p(1/x)e−1/x = 0.

x→0+

(C.4)

(Suggestion: Asymptotics (Section 3.6).) (b) Prove by induction on k ≥ 0 that ( pk (1/x)e−1/x (k) ϕ1 (x) = 0

for x > 0, for x ≤ 0,

(C.5)

pk+1 (u) = u2 (pk (u) − p0k (u)).

(C.6)

where pk (u) is a polynomial defined recursively by p0 (u) = 1,

(Suggestion: Use the definition of the derivative at x = 0.) C.2. (Proves Lemma C.2 ) Let ϕ1 be defined by (C.5) and ϕ2 : R → R be defined by ϕ2 (x) = ϕ1 (x)ϕ1 (1 − x).

(C.7)

Prove that ϕ2 is a C ∞ function such that ϕ2 (x) > 0 for 0 < x < 1 and ϕ2 (x) = 0 otherwise. C.3. (Proves Lemma C.3 ) Let ϕ1 and ϕ2 be defined by (C.5) and (C.7), respectively, let Z 1 A= ϕ2 (x) dx, and let ϕ3 : R → R be defined by 0

1 ϕ3 (x) = A

Z

x

ϕ2 (t) dt

(C.8)

0

Prove that ϕ3 is an increasing C ∞ function such that ϕ3 (x) = 0 for x ≤ 0 and ϕ3 (x) = 1 for x ≥ 1.

334

APPENDIX C. BUMP FUNCTIONS

C.4. (Proves Theorem C.4 ) Prove that for a < b and δ > 0, there exists some ϕ : R → R such that: 1. ϕ ∈ C ∞ (R); 2. For a ≤ x ≤ b, ϕ(x) = 1; 3. For a − δ ≤ x ≤ a and b ≤ x ≤ b + δ, we have 0 ≤ ϕ(x) ≤ 1; and 4. For x ≤ a − δ and b + δ ≤ x, ϕ(x) = 0. (Suggestion: Do precalculus operations on the function ϕ3 given by (C.8).)

Bibliography [AG08]

Mohammed Al-Gwaiz. Sturm-Liouville Theory and its Applications. SpringerVerlag, 2008.

[Ahl79]

Lars Ahlfors. Complex Analysis. McGraw-Hill, 3rd edition, 1979.

[Apo69]

Tom M. Apostol. Calculus, Vol. 2: Multi-Variable Calculus and Linear Algebra with Applications to Differential Equations and Probability. Wiley, 2nd edition, 1969.

[AW02]

Jeremy F. Alm and James S. Walker. Time-frequency analysis of musical instruments. SIAM Rev., 44(3):457–476, 2002.

[BBM17]

Carl M. Bender, Dorje C. Brody, and Markus P. M¨ uller. Hamiltonian for the zeros of the riemann zeta function. Phys. Rev. Lett., 118, 2017.

[BKT92]

Vladimir B. Braginsky, Farid Ya Khalili, and Kip S. Thorne. Quantum Measurement. Cambridge University Press, 1992.

[BN10]

Joseph Bak and Donald J. Newman. Complex Analysis. Undergraduate Texts in Mathematics. Springer-Verlag, 3rd edition, 2010.

[B¨ us10]

Peter B¨ user. Geometry and Spectra of Compact Riemann Surfaces. Modern Birkh¨ auser Classics. Birkh¨auser, 2010.

[Car66]

Lennart Carleson. On convergence and growth of partial sums of Fourier series. Acta Math., 116:135–157, 1966.

[Con78]

John B. Conway. Functions of One Complex Variable I, volume 11 of Graduate Texts in Mathematics. Springer-Verlag, 2nd edition, 1978.

[Con96]

John B. Conway. Functions of One Complex Variable II, volume 159 of Graduate Texts in Mathematics. Springer-Verlag, 1996.

[CSS97]

Gary Cornell, Joseph H. Silverman, and Glenn Stevens, editors. Modular Forms and Fermat’s Last Theorem. Springer-Verlag, 1997.

[Dar99]

Henri Darmon. A proof of the full Shimura-Taniyama-Weil conjecture is announced. Notices Amer. Math. Soc., 46(11):1397–1401, 1999. 335

336

BIBLIOGRAPHY

[Dau92]

Ingrid Daubechies. Ten Lectures on Wavelets. CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM: Society for Industrial and Applied Mathematics, 1992.

[DD10]

Kenneth R. Davidson and Allan P. Donsig. Real Analysis and Applications: Theory in Practice. Undergraduate Texts in Mathematics. Springer-Verlag, 2010.

[DLM]

NIST digital library of mathematical functions. http://dlmf.nist.gov/, Release 1.0.14 of 2016-12-21. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller and B. V. Saunders, eds.

[DM85]

Harry Dym and Henry P. McKean. Fourier Series and Integrals. Academic Press, 1985.

[DS00]

Jack Dongarra and Francis Sullivan. Guest editors’ introduction: The top 10 algorithms. Computing in Science & Engineering, 2(1):22–23, 2000.

[DS05]

Fred Diamond and Jerry Shurman. A First Course in Modular Forms, volume 228 of Graduate Texts in Mathematics. Springer-Verlag, 2005.

[Edw01]

Harold M. Edwards. Riemann’s Zeta Function. Dover Publications, 2001.

[ER85]

Robert Eisberg and Robert Resnick. Quantum Physics of Atoms, Molecules, Solids, Nuclei, and Particles. Wiley & Sons, 2nd edition, 1985.

[Eva10]

Lawrence C. Evans. Partial Differential Equations. American Mathematical Society, 2nd edition, 2010.

[Fey11]

Richard P. Feynman. The Feynman Lectures on Physics. Basic Books, New Millenium edition, 2011.

[Fle]

Patrick J. Van Fleet. Image compression: How math led to the JPEG2000 standard. http://www.whydomath.org/node/wavlets/index.html. Accessed: 2017-05-08.

[Fre07]

Edward Frenkel. Lectures on the Langlands program and conformal field theory. In Frontiers in number theory, physics, and geometry. II, pages 387–533. Springer, Berlin, 2007.

[Fri08]

Avner Friedman. Partial Differential Equations of Parabolic Type. Dover Publications, 2008.

[Gel84]

Stephen Gelbart. An elementary introduction to the Langlands program. Bull. Amer. Math. Soc. (N.S.), 10(2):177–219, 1984.

[Gri04]

David J. Griffiths. Introduction to Quantum Mechanics. Pearson Prentice Hall, 2nd edition, 2004.

BIBLIOGRAPHY [Gro08]

337

Ian Grojnowski. Representation theory. In Timothy Gowers, June BarrowGreen, and Imre Leader, editors, The Princeton companion to mathematics, chapter IV.9, pages 419–431. Princeton University Press, Princeton, NJ, 2008.

[GWW92a] C. Gordon, D. Webb, and S. Wolpert. Isospectral plane domains and surfaces via Riemannian orbifolds. Invent. Math., 110(1):1–22, 1992. [GWW92b] Carolyn Gordon, David L. Webb, and Scott Wolpert. One cannot hear the shape of a drum. Bull. Amer. Math. Soc. (N.S.), 27(1):134–138, 1992. [Haa10]

Alfred Haar. Zur Theorie der orthogonalen Funktionensysteme. Math. Ann., 69(3):331–371, 1910.

[Hal13]

Brian C. Hall. Quantum Theory for Mathematicians, volume 267 of Graduate Texts in Mathematics. Springer-Verlag, 2013.

[Har02]

Philip Hartman. Ordinary differential equations, volume 38 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002. Corrected reprint of the second (1982) edition [Birkh¨ auser, Boston, MA; MR0658490 (83e:34002)], With a foreword by Peter Bates.

[Hol07]

Samuel S. Holland. Applied Analysis by the Hilbert Space Method: An Introduction with Applications to the Wave, Heat, and Schr¨ odinger Equations. Dover Publications, 2007.

[H¨or90]

Lars H¨ ormander. The Analysis of Linear Partial Differential Operators I. Springer-Verlag, 2nd edition, 1990.

[HS75]

Edwin Hewitt and Karl Stromberg. Real and Abstract Analysis, volume 25 of Graduate Texts in Mathematics. Springer-Verlag, 1975.

[IK04]

Henryk Iwaniec and Emmanuel Kowalski. Analytic Number Theory, volume 53 of Colloquium Publications. American Mathematical Society, 2004.

[Joh15]

William Johnston. The Lebesgue Integral for Undergraduates. MAA Textbooks. Mathematical Association of America, 2015.

[Kac66]

Mark Kac. Can one hear the shape of a drum? Amer. Math. Monthly, 73(4, part II):1–23, 1966.

[Kat66]

Yitzhak Katznelson. Sur les ensembles de divergence des s´eries trigonom´etriques. Studia Math., 26:301–304, 1966.

[Kna97]

Anthony W. Knapp. Introduction to the Langlands program. In Representation theory and automorphic forms (Edinburgh, 1996), volume 61 of Proc. Sympos. Pure Math., pages 245–302. Amer. Math. Soc., Providence, RI, 1997.

338

BIBLIOGRAPHY

[Kna01]

Anthony W. Knapp. Representation Theory of Semisimple Groups: An Overview Based on Examples, volume 36 of Princeton Mathematical Series. Princeton University Press, 2001.

[Kol26]

Andrey Kolmogorov. Une s´erie de Fourier-Lebesgue divergente partout. C. R. Acad. Sci. Paris S´er. A-B, 183:1327–1328, 1926.

[K¨or89]

Thomas W. K¨ orner. Fourier Analysis. Cambridge University Press, 1989.

[KS99]

Nicholas M. Katz and Peter Sarnak. Zeroes of zeta functions and symmetry. Bull. Amer. Math. Soc. (N.S.), 36(1):1–26, 1999.

[Loo11]

Lynn H. Loomis. Introduction to Abstract Harmonic Analysis. Dover Publications, 2011.

[Mal08]

Stephane Mallat. A Wavelet Tour of Signal Processing: The Sparse Way. Academic Press, 3rd edition, 2008.

[Mes97]

Robert Messer. Linear Algebra: Gateway to Mathematics. Pearson, 1997.

[Mun97]

James R. Munkres. Analysis on Manifolds. Westfield Press, 1997.

[Mun00]

James R. Munkres. Topology. Pearson, 2nd edition, 2000.

[NC00]

Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000.

[NC11]

Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 10th anniversary edition, 2011.

[Nee99]

Tristan Needham. Visual Complex Analysis. Clarendon Press, 1999.

[Nel15]

Gail S. Nelson. A User-friendly Introduction to Lebesgue Measure and Integration. Student Mathematical Library. American Mathematical Society, 2015.

[Nie00]

Yves Nievergelt. Wavelets Made Easy. Birkha¨ user, corrected edition, 2000.

[Opi96]

M. Opitz. Method of simulating a room and/or sound impression, August 6 1996. US Patent 5,544,249.

[OWN97]

Alan V. Oppenheim, Alan S. Willsky, and S. Hamid Nawab. Signals and Systems. Prentice-Hall, 2nd edition, 1997.

[RF10]

Halsey Royden and Patrick Fitzpatrick. Real Analysis. Pearson, 4th edition, 2010.

[Roc00]

Daniel Rockmore. The FFT: An algorithm the whole family can use. Computing in Science & Engineering, 2(1):60–64, 2000.

BIBLIOGRAPHY

339

[Ros13]

Kenneth A. Ross. Elementary Analysis: The Theory of Calculus. Springer, 2nd edition, 2013.

[RS80]

Michael Reed and Barry Simon. Functional Analysis. Academic Press, 1980.

[RS96]

Ze´ev Rudnick and Peter Sarnak. Zeros of principal L-functions and random matrix theory. Duke Math. J., 81(2):269–322, 1996. A celebration of John F. Nash, Jr.

[Rud76]

Walter Rudin. Principles of Mathematical Analysis. McGraw-Hill, 3rd edition, 1976.

[Rud86]

Walter Rudin. Real and Complex Analysis. McGraw-Hill, 3rd edition, 1986.

[Rud91]

Walter Rudin. Functional Analysis. McGraw-Hill, 2nd edition, 1991.

[Sha94]

Ramamurti Shankar. Principles of Quantum Mechanics. Plenum Press, 2nd edition, 1994.

[SS03]

Elias M. Stein and Rami Shakarchi. Fourier Analysis: An Introduction. Princeton University Press, 2003.

[Ter13]

Audrey Terras. Harmonic Analysis on Symmetric Spaces — Euclidean Space, the Sphere, and the Poincar´e Upper Half-Plane. Springer-Verlag, 2nd (reprint) edition, 2013.

[Tes09]

Gerald Teschl. Mathematical Methods in Quantum Mechanics: With Applications to Schr¨ odinger Operators, volume 99 of Graduate Studies in Mathematics. American Mathematical Society, 2009.

[Wal08]

James S. Walker. A Primer on Wavelets and Their Scientific Applications. Chapman and Hall, 2nd edition, 2008.

[Wat12]

William C. Waterhouse. Square root as a homomorphism. Monthly, 119(3):235–239, 2012.

[Zel00]

Steve Zelditch. Spectral determination of analytic bi-axisymmetric plane domains. Geom. Funct. Anal., 10(3):628–677, 2000.

[Zel09]

Steve Zelditch. Inverse spectral problem for analytic domains. II. Z2 -symmetric domains. Ann. of Math. (2), 170(1):205–269, 2009.

Amer. Math.

Index a.e., 152 absolute value, 14 abstract harmonic analysis, 321 acoustics, 4 affine algebraic group, 322 algebraic basis, 329 algebraic span, 329 almost all, 152 almost everywhere, 152 alternating series, 76 Always Better Theorem, 148 angle, 117 anti-commutator, 296 Arbitrarily Close Criterion, 11, 20 asymptotics, 60, 63 autocorrelation function, 187, 305 average value, 112 band-limited, 304 basis, 149 Best Approximation Theorem, 148 Black-Scholes equation, 201 Bolzano-Weierstrass Theorem, 24–26 bound upper, 10 boundary conditions, 203, 232, 242, 243 Dirichlet, 203, 243, 318 Neumann, 204, 243 boundary value problem, 203, 242 bounded operator, 214 set, 18, 141 bump function, 185 Cc0 (R), 162 Cc∞ (R), 185

C ∞ (X), 114 C r (X), 114 Cauchy, 23 in norm metric, 143 uniformly, 82 Cauchy Criterion for improper integrals, 102 for series, 74 Ces`aro sum, 177 circle S 1 , 115 closed, 19, 26, 27 closed disc, 19, 26 closed r-neighborhood, 26 coefficient of a power series, 90 of a trigonometric polynomial, 121 collapse (of a quantum state), 257, 292–294, 316 commutator, 296 compact, 26, 32, 321 Comparison Test, 74 for improper integrals, 102 complement, 19 completeness, 24 Cauchy, 24 in norm metric, 143 order, 10 complex analysis, 41 complex conjugate, 13 complex numbers, 13 complex plane, 13 conjugate, see complex conjugate continuous function, 30 -δ, 30 in norm metric, 142 340

INDEX -δ, 142 on a set, 142 sequentially, 142 metric, 30 -δ, 30 on a set, 30 sequentially, 30 on a set, 30 piecewise, 35 sequentially, 30 uniformly, 32 with compact support, 162 convergence absolute, 75 of a sequence, 17 of a series, 72 of a two-sided series, 73 pointwise, 77 uniform, 81 consequences of, 83–86 Convergence of Monotone Sequences, 21 convolution, 171 on R, 267 on S 1 , 173 correlated, 187, 305 critical line, 319

341 Dirac delta function, 174, 311 Dirac kernel, 171 on R, 267 on S 1 , 175, 300 Dirichlet kernel, 176 discrete Fourier transform, 312 discrete-time Fourier transform, 186 distribution, 311 tempered, 311 divergence of a sequence, 17 of a series, 72 of a two-sided series, 73 domain (of an operator), 211 dominate, 159 dot product, 111, 117, 136 dual group, 321 eigenbasis, 207, 208, 224 eigenbasis method, 227 eigenfunction, 221 eigenvalue, viii, 207, 211, 221 associated (with an eigenbasis), 224 eigenvector, 207, 211, 221 energy, 4 conservation of, 206 kinetic, 206 potential, 205 energy operator, 257, 315 even extension, 128, 243 even function, 128 expected value, 294 extended nonnegative real number, 157 Extra Derivative Lemma, 172, 181 Extreme Value Theorem, 33

Daniell-Riesz integral, 310 decay, 99, 182 Dedekind cuts, 10 dense, 21 diagonal, 207 diagonalization, 207, 213, 225 Diagonalization Theorem, 225 differentiable function, 37, 315 on a set, 37 fˆ(γ), 271 piecewise, 35 fˆ(n), 123 totally, 65 fN (x), 123 differential equation, 3 fast Fourier transform, 312 differentiating under the integral sign, 68, Fej´er kernel, 176 103 field, 10 diffusion, 201 ordered, 10 formal solution, 234 dimension, 321

342 Fourier coefficient, 123 generalized, 147 Fourier cosine series, 129 Fourier polynomial, 123 Fourier series, viii, 4, 123 generalized, 147 lacunary, 188 real, 127 Fourier sine series, 129 Fourier transform, viii, 252, 263, 264 on L2 , 276 on S(R), 271 Fourier’s Law, 200 frequency response, 305 Fubini’s Theorem, 66 on R, 105 function space, viii, 2, 111, 113 fundamental frequency, 4 Fundamental Theorems of Calculus, 56–58 Gauss kernel, 268 generating function, 299 geometric series, 75 Gram-Schmidt orthogonalization, 250 group, 320 abelian, 320 additive abelian, 10 Haar measure, 322 Haar wavelet, 312, 313 Hamiltonian, 316 harmonic first, 4 nth, 4 second, 4 heat equation, 199, 201 heat kernel, 300 Heaviside function, 282 Heaviside’s method, 284 Heisenberg Uncertainty Principle, 295, 296 Hermite functions, 252, 289 normalized, 254 Hermite polynomials, 253, 290 Hilbert space, 135, 165

INDEX Hilbert Space Absolute Convergence Theorem, 167 Hilbert Space Comparision Test, 168 holomorphic, 41 homomorphism, 321 ideal lowpass filter, 311 imaginary part (of a complex number), 14 incomplete, 24 infimum, 12 initial value problem, 200 inner product, 117, 135 L2 , 136 inner product space, 135, 136 integrable by separation, 103 integral, see Lesbesgue integral or Riemann integral integration by parts, 59 on R, 102 Intermediate Value Theorem, 33 interval chain, 153 interval of continuity, 35 inverse Fourier transform, 274 Inversion Theorem for Fourier Series, 172 for the Fourier Transform, 264, 277 in S(R), 274 isomorphism of Hilbert spaces, 169 Isomorphism Theorem for Fourier Series, 168 for the Fourier Transform, 277 in S(R), 275 isospectral, 319 Jacobi theta function, 298 L’Hˆopital’s Rule, 60 Lagrange’s four squares theorem, 299 Langlands program, 322 Laplacian, 218 Laurent polynomial, 122 least upper bound, see supremum Lebesgue integrable, 157 Lebesgue integral, 135, 144, 150, 151, 157

INDEX axioms for, 157–162, 164 Legendre polynomials, 248 length (of an interval), 152 lim sup, 76 limit, 315 limit of a function, 33 at infinity, 60 -δ, 34 sequential, 33 limit point, 33 linear combination, 329 linear map, 211, 329 linear operator, see operator linearly independent, 116, 329 Lipschitz function, 191 piecewise, 35, 191 local linear approximation, 38 Local Linearity, 38 locally compact, 321 locally integrable, 101 locally rectangular, 64 lowering operator, 253

343 inner product, 136 L∞ , 140 L1 , 140 L2 , 136 normed space, 140 nth Term Test for Divergence, 75 Nyquist frequency, 305 Nyquist rate, 305 Nyquist sampling theorem, 305 Nyquist-Shannon sampling theorem, 305

observable, 257, 292, 316 odd extension, 128, 243 odd function, 128 open, 26, 27 open cover, countable, 152 open disc, 19, 26 open r-neighborhood, 26 operator, viii, 207, 208, 211 diagonal, 213 Hermitian, 211, 216, 318 on Cn , 207 positive, 211, 216–218 self-adjoint, 315 M -test, Weierstrass, 83 skew-Hermitian, 295 mean value, 294 unitary, 275, 321 Mean Value Theorem, 39 operator notation (Fourier transform), 271 measurable function, 157 order, total, 10 measure, 151, 153 orderable, 10 measure zero, 151, 152 measurement (of a quantum state), 257, 292, orthogonal (vectors), 117, 137 orthogonal basis, 146, 149, 329 293, 316 orthogonal set, 146 metric, 15, 113 orthonormal, 117 L∞ , 115 2 orthonormal basis, 149 L , viii, 117 orthonormal set, 146 norm, 141 metric space, 15 p-series, 75 modular form, 299 Parseval’s identity, 170 momentum operator, 292, 315 partial derivative, 64, 65 momentum space, 293 partial differential equation (PDE), 199 Montgomery-Odlyzko law, 320 hyperbolic, 239 multiresolution, 314 parabolic, 229 partial fractions, 284 nonabelian, 320 partial sums, 1, 72 norm, 14, 136, 140

344 of a two-sided series, 73 partition, 43 standard, 44 Pass the Hat, 273 path, 39 path-connected, 39 piecewise (property), 35 Planck’s constant, 316 plane wave, 203 Poisson summation, 281, 298 position operator, 292, 315 position space, 293 power series, 90 power spectrum, 187, 305 projection, 137, 147 Pythagorean Theorem, 137, 146 quantized, 205 quantum computing, 317 quantum harmonic oscillator, 206 qubit, 317 radius of convergence, 90, 91 raising operator, 253 rapidly decaying, 99, 131 Ratio Test, 75 real numbers, 10 real part (of a complex number), 14 rearrangement, 323 rectangle, 20, 64 refinement, 43 common, 43 relative error, 38 representation, 321 restricted sum, 329 Riemann hypothesis, 186, 319 Riemann integrability, 45 on R, 101 Riemann integral, 42, 45 improper, 101 convergence, 101 existence, 101 indefinite, 57 lower, 44

INDEX upper, 44 Riemann sum, 42 lower, 44 upper, 44 Riemann zeta function, 186, 319 Riemann-Lebesgue lemma, 191 ring, commutative — with unity, 10 Rodrigues’ formula, 250 root test, 76 S(R), 99 sampling theorem, 304 scalar, 327 Schr¨odinger’s equation, 206 Schwartz space, 99, 212, 264 separation of variables, 228, 232 sequence, 17 two-sided, 72 Sequential Criteria for Integrability, 45, 46, 48 series, 72 of functions, 77 two-sided, 72 shift operator, 213 signal processing, 310 Simultaneous Diagonalization, 225 smooth function, 92 span, 116 spectrum, 315 continuous, 292, 316 discrete, 316 square root (of an operator), 219 Squeeze Lemma for functions, 34 for sequences, 21 standard deviation, 295 state function, 206, 256, 292 state space, 315 Sturm-Liouville equation, 227 Sturm-Liouville theory, 259 Sturmian operator, 258 regular, 258 singular, 258

INDEX subinterval, 43 subsequence, 17 subspace, 113, 328 Sup Inequality Trick, 11 support, 162 supremum, 10 tensor product, 317 term-by-term differentiation, 2, 88 timbre, 5 time series, 186, 305 tone, 4 topological group, 321 topology, 26, 27 point-set, 27 translation invariant, 173 trigonometric polynomial, 121 trivial zeros (of zeta function), 319 unitary (matrix), 318 unitary representation, 321 variance, 295 vector, 113, 327 vector space, 327 wave equation, 199, 203 wavelets, 312 Weierstrass Approximation Theorem, 184 Wiener-Khinchin Theorem Continuous-time, 306 Discrete-time, 187 Wronskian, 259 Zero Derivative Theorem, 40 zero function, 113 zero vector, 113

345

A second course in analysis [PDF]

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch