Functional Data Analysis An introduction David Wooff
[email protected]
University of Durham
FDA – p. 1/42
Main steps in FDA • Collect, clean, and organize the raw data. • Convert the data to functional form. • Explore the data through plots and summary statistics • Register the data, if necessary, so that important features
occur at the same argument values. • Carry out exploratory analysis, such as functional principal
components analysis • Construct models, if appropriate • Evaluate model performance
FDA – p. 2/42
Example: Weather data • Daily average temperature computed for N = 365
consecutive days at each of 35 Canadian weather stations • Are there common patterns of interest? • Is there variation in amplitude (peakedness)? • Is there variation in phase (timing of features)?
FDA – p. 3/42
Raw data: Daily mean temperatures at 35 Canadian weather stations
−10 −20 −30
Temperature
0
10
20
Average daily temperature 35 Canadian weather stations
0
100
200
300
Day FDA – p. 4/42
Averaged data: Monthly mean of daily mean temperatures at 35 Canadian weather stations
−10 −20 −30
Temperature
0
10
20
Average monthly temperature 35 Canadian weather stations
2
4
6
8
10
12
Month FDA – p. 5/42
Summary of visual features • Clear that temperatures rise in the summer, fall in the winter • Differences in level: some very cold places in arctic Canada • Strong differences in amplitude. • Coastal stations display little amplitude: cool winters
and summers. • Continental stations show high amplitude/peakedness:
cold winters and hot summers. • Some variation in phase: peak for some (coastal?) stations
is after the peak for others (continental?)
FDA – p. 6/42
Basis construction • Main idea is that we take each observed series xi (t) and
approximate it by xˆi (t) chosen from the same functional family. • We represent a function x by a linear expansion
x(t) =
K X
ck φk (t)
k=1
in terms of K known basis functions φk (t). • c1 , . . . , cK are coefficients to be estimated. • Let n be the sample size, i.e. the number of observations in
each series.
FDA – p. 7/42
Basis construction • We hope that small K leads to a reasonable fit and the
capture of essential characteristics, balanced by large degrees of freedom to allow computation of CIs. • Once we have the approximation, we have
x(t) = xˆ(t) + (t), and observed residual series ˆi (t) = xi (t) − xˆi (t), from which we may obtain a standard measure of quality of fit for an individual series: s2i
1 = n−K
n X j=1
ˆi (tj )2 . FDA – p. 8/42
Basis choice • Several possibilities: Bsplines, exponential, polynomial
(φk = tk ). Bsplines mostly used for aperiodic data. • Reasonable to assume that the weather data is periodic • Reasonable choice is Fourier basis system:
xˆ(t) = c1 + c2 sin ωt + c3 cos ωt + c4 sin 2ωt + c5 cos 2ωt + . . . , where ω = 2π/T , here T = 11 • Here we choose K = 5; for a Fourier basis system we
choose an odd number to capture variation in phase, i.e. we require sine, cosine pairs • Various ways of estimating c1 , . . . , cK : the simplest is
ordinary least squares (used for this example), but other approaches - e.g. roughness penalty - may be superior.
FDA – p. 9/42
The basis curves, xˆ(t) for the weather data
−10 −20 −30
Temperature
0
10
20
Average monthly temperature: 35 Canadian weather stations Fourier basis, K=5
2
4
6
8
10
12
Month FDA – p. 10/42
Residual plots, (ˆ xi (t) − xi (t)), i = 1, . . . , 35, for K = 5.
0 −2 −4
Temperature
2
4
Raw data minus basis representation Fourier basis, K=5
2
4
6
8
10
12
Month FDA – p. 11/42
Residual plots • Estimated curves appear plausible • Should also explore the fitted curves: plot raw residuals
ˆi (t) = xi (t) − xˆi (t). • Periodicity in the residuals is probably an artefact of the
basis representation. • Do we get a better answer if we choose more basis
functions?
FDA – p. 12/42
Residual plots for K = 5 and K = 7
2 −4
−2
0
Temperature
0 −2 −4
Temperature
2
4
Raw data minus basis representation Fourier basis, K=7
4
Raw data minus basis representation Fourier basis, K=5
2
4
6 Month
8
10
12
2
4
6
8
10
12
Month FDA – p. 13/42
Residual plots • Raw residuals decrease as we increase K (of course), but
the model is more complicated. • Simple way of assessing the number of basis functions to
s2i
1 n−K
P use is to examine = (ˆ xi (t) − xi (t))2 P 2 • For example, we can plot si versus K - note that s2i is penalized by the number of basis functions used • Analagous to the usual measure of variability in LS
regression,
1 n−p
Pn
2 (y − y ˆ ) . i i i=1
FDA – p. 14/42
s2i versus K
100 80 60
Penalized RSS
120
140
Plots of
P
3
4
5
6
7
8
9
basis functions, K FDA – p. 15/42
Number of basis functions • The plot suggests that there is a big advantage in using
K = 5 rather than K = 3 • Not much extra advantage in using K = 7: we are starting
to lose degrees of freedom • Pragmatic choice seems to be K = 5 • This is a black art!
FDA – p. 16/42
Exploring derivatives • Derivatives of the fitted curves can be explored and can be
interpreted • First derivative may suggest common landmarks • Plots of derivatives may suggest certain models, eg the
differential equation ∆x = −α(x − b1 ) is consistent with an exponential model, x(t) = b1 + b2 eαt , so a plot showing linear association between x(t) and x0 (t) suggests the exponential model. FDA – p. 17/42
The basis curves, xˆ0 (t)
0 −5 −10
Temperature
5
10
Average monthly temperature: 35 Canadian weather stations Fourier basis, K=5, First derivative
2
4
6
8
10
12
Month FDA – p. 18/42
The basis curves, xˆ00 (t)
0 −5
Temperature
5
10
Average monthly temperature: 35 Canadian weather stations Fourier basis, K=5, Second derivative
2
4
6
8
10
12
Month FDA – p. 19/42
Derivatives plots • Plot of first derivative shows the locations of winter minima
and summer maxima at xˆ0 (t) = 0. • These don’t quite align, indicating that the change points
differ according to location • Plot of second derivative can highlight more subtle
differences - e.g. the July shape is peaked for most locations, but troughed for others. • We may need to be careful when interpreting such features,
they may simply be artefacts of the fitting process. • Avoid post-hoc rationalization
FDA – p. 20/42
Plotting the mean curve, x¯(t) =
1 N
PN
i=1
xˆi (t)
0 −5 −10
Temperature
5
10
15
Average monthly temperature: Mean curve Fourier basis, K=5
2
4
6
8
10
12
Month FDA – p. 21/42
Superimposing the mean curve on the estimated curves
−10 −20 −30
Temperature
0
10
20
Average monthly temperature: 35 Canadian weather stations Fourier basis, K=5, mean curve added
2
4
6
8
10
12
Month FDA – p. 22/42
The deviations from the mean curve: shows quite a lot of variation beyond the mean
0 −10 −20
Temperature
10
Deviations from the mean: 35 Canadian weather stations Fourier basis, K=5, mean curve added
2
4
6
8
10
12
Month FDA – p. 23/42
Plotting the standard deviation curve, q PN 1 xi (t) − x¯(t))2 sX (t) = N −1 i=1 (ˆ
7 6 5 4
Temperature
8
9
Average monthly temperature: Standard deviation curve Fourier basis, K=5
2
4
6
8
10
12
Month FDA – p. 24/42
Mean curve with ad hoc SD envelope based on x¯(t) ±
√1 sX (t) N
0 −10 −20
Temperature
10
20
Average monthly temperature: Mean curve + standard deviation envelope Fourier basis, K=5
2
4
6
8
10
12
Month FDA – p. 25/42
Interpretation • Variation across the series is highest in the winter months,
and lowest in the summer • Can plot an ad-hoc confidence envelope for the underlying
mean curve, assuming these 35 stations to be a random sample from a superpopulation of weather stations. • Prediction envelope for a new weather station would be
much wider.
FDA – p. 26/42
Principal components analysis for functional data • To extract the first functional PC we seek a further function
R
ζ1 (t) = β(s)x(s)ds which in some sense maximises variation over the space of interest. • This requires choosing weight functions β(s). • This is analagous to standard PCA, in which we choose
weights β1 , . . . , βn to maximise n X var( βi xi ). i=1
• Subsequent PCs are ζ2 , . . . , ζK , with the extraction
arranged so that the PCs are orthogonal: Z ζi (s)ζj (s)ds = 0, ∀i 6= j. FDA – p. 27/42
Principal components estimation • The estimation of the PCs is usually carried out on centred
data, and so the components may be considered as perturbations of the mean curve. • Estimation procedure is quite complicated: see Ramsay &
Silverman.
FDA – p. 28/42
Monthly data PCA, raw components shown as weights PCA function 2 (Percentage of variability 8.5 )
0.2 −0.2 −0.4
−0.4
2
4
6
8
10
12
2
4
6
8
10
12
Month
PCA function 3 (Percentage of variability 1.9 )
PCA function 4 (Percentage of variability 0.4 )
0.2 0.0 −0.2 −0.4
−0.2
0.0
Harmonic 4
0.2
0.4
0.4
Month
−0.4
Harmonic 3
0.0
Harmonic 2
0.0 −0.2
Harmonic 1
0.2
0.4
0.4
PCA function 1 (Percentage of variability 89.1 )
2
4
6 Month
8
10
12
2
4
6
8
10
12
Month
FDA – p. 29/42
The first PC for the weather data • ζ1 (t) explains 89.1% of the variation. It contrasts winter
months (high variation) with summer months (low variation), i.e. the weights placed on winter measurements are higher than than the weights for summer measurements, and so lead to higher variability. • Scores can be computed for each data series (weather
station), as ζ1i (t) =
R
β(s)xi (s)ds.
• Weather stations with high positive values of ζ1i (t) will
have much warmer than average winters and warmer than average summers. • Weather stations with high negative values of ζ1i (t) will
have much colder than average winters and colder than average summers. FDA – p. 30/42
Subsequent PCs for the weather data • ζ2 (t) explains 8.5% of the variation. It contrasts warm
winter months with cool summer months. • Weather stations with high positive values of ζ2i (t) will
have warmer than average winters and colder than average summers. • Weather stations with high negative values of ζ2i (t) will
have colder than average winters and warmer than average summers (for example, for mid-continental stations). • The remaining PCs appear consonant with random
variation and explain little of the variation.
FDA – p. 31/42
Re-adding the mean curve to the PCs • An alternative visualization is obtained by plotting
x¯(t) ± γζj for some appropriate value of γ. • That is, we add the average to the PCs and so can see the
implication of high positive and high negative values for each ζ(t). • γ can chosen by inspection, to distinguish sufficiently
between the curves; alternatively Ramsay et al suggest a semi-automatic procedure for choosing γ.
FDA – p. 32/42
Monthly data PCA: plotting components as perturbations of the mean PCA function 2 (Percentage of variability 8.5 )
10 5 −5 −15
−20
2
4
6
8
10
12
2
4
6
8
10
12
Month
PCA function 3 (Percentage of variability 1.9 )
PCA function 4 (Percentage of variability 0.4 )
5 0 −10
−5
−5
0
5
Harmonic 4
10
10
15
15
Month
−10
Harmonic 3
0
Harmonic 2
0 −10
Harmonic 1
10
15
20
PCA function 1 (Percentage of variability 89.1 )
2
4
6 Month
8
10
12
2
4
6
8
10
12
Month
FDA – p. 33/42
Interpreting the remaining PCs • Using this visualization, we might now interpret ζ3 (t) as a
slight time shift component: high values have slightly later summers and winters, and vice-versa for low values. • Still hard to interpret ζ4 (t), but might concern timing of
Autumn and Spring. • It is also possible to carry out rotations of the PCs in order
to improve interpretability. • These can be orthogonal or oblique, as desired. • Oblique rotations lead to correlated components, which is
ok in my view, but . . . . . . • . . . . . . rotation doesn’t give appreciably better answers for
this data. FDA – p. 34/42
Daily data PCA, K = 65. We don’t seem to have lost much by smoothing from daily to monthly data. PCA function 2 (Percentage of variability 8.5 )
10
−20
−10
0
Harmonic 2
0 −10
Harmonic 1
10
20
20
PCA function 1 (Percentage of variability 88.4 )
0
100
200
300
0
200
300
Day
PCA function 3 (Percentage of variability 2 )
PCA function 4 (Percentage of variability 0.5 )
5 0 −5 −15
−5
0
5
Harmonic 4
10
10
15
15
Day
−15
Harmonic 3
100
0
100
200 Day
300
0
100
200
300
Day
FDA – p. 35/42
Plot of PC scores of 35 weather stations. Neighbouring
15
stations have similar PC scores Resolute
10
Iqaluit
St. Johns Victoria
Vancouver Yarmouth
5
Sydney
Scheffervll Churchill
Pr. George Halifax Charlottvl Calgary
0
Whitehorse Inuvik Fredericton Sherbrooke Edmonton Thunderbay Toronto London Kamloops
−5
Quebec Bagottville Arvida Montreal Ottawa Yellowknife Uranium Cty
Dawson
−10
PC2: Colder in the winter and warmer in the summer
Pr. Rupert
−60
−40
−20
The Pas Pr. AlbertRegina
Winnipeg
0
20
PC1: Colder in the winter and colder in the summer FDA – p. 36/42
Registration •
•
•
•
•
•
The weather data have a natural starting and ending position. This is not so for other data. After transforming to functional form, it may be necessary to align them prior to further exploration. The process of aligning curves according to their principal features is called landmark registration. Example: 68 cross-sections of ends knee bones were taken; each datum is a bitmap image. 12 landmarks were identified as capturing the essential features A smooth curve is fitted through the landmarks and standard fda methods applied FDA – p. 37/42
Warp functions • One way of aligning functional objects is via a warping
function. • A warping function maps from (0, T ) to (0, T ) in the x
direction, but compresses part of the range whilst stretching the remainder. • An example with warping parameter α is:
eαx − 1 h(x, α) = α , e −1
α 6= 0.
• α < 0 deforms to the right and α > 0 deforms to the left.
We have limα→0 h(x, α) = x, so that a value of α = 0 implies no warping. • Each data object requires the warping parameter to be
estimated.
FDA – p. 38/42
Warping applied to height in order to align oil reservoir porosities, h(x, α) = (eαx − 1)/(eα − 1)
0.2
0.4
0.6
0.8
1.0
Actual Warped
0.0
0.2
0.4
0.6
0.8
Location 117, warp factor 1
Location 117, warp factor 2
Porosity 0.2
0.4
0.6
Elevation
0.8
1.0
0.11 0.12 0.13 0.14 0.15
Elevation
0.11 0.12 0.13 0.14 0.15
Elevation
Actual Warped
0.0
0.11 0.12 0.13 0.14 0.15
Actual Warped
0.0
Porosity
Location 117, warp factor −1
Porosity
0.11 0.12 0.13 0.14 0.15
Porosity
Location 117, warp factor −2
1.0
Actual Warped
0.0
0.2
0.4
0.6
0.8
1.0
Elevation
FDA – p. 39/42
Other techniques • Canonical correlation: which modes of variability in two
sets of curves are most closely associated? • Discriminant analysis: determination of a function which
separates two sets of curves • Functional linear models: • The response variable x(t) is a function • At least one explanatory variable is a function • both the above • Functional analysis of variance • Linear differential equations: principal differential analysis
FDA – p. 40/42
Summary • Data treated as functions, with functional summaries and
functional analogues to classical methods. • Methods extend to more complicated objects such as
shapes in many dimensions. • Tend to work with smoothed versions of the data and not
the raw data: understates actual variability • Powerful methods with much elegant theory • Needs some experience to make interpretations, may be a
danger in post-hoc rationalization • Useful exploratory tool. If the voice of the data is strong,
can also be a useful inferential tool. • Used R and fda package for computation and plots FDA – p. 41/42
References and resources • J.O. Ramsay & B.W. Silverman (2002) Applied functional
data analysis: methods and case studies. New York: Springer. Contains a lot of examples, but little theory. • http://www.stats.ox.ac.uk/ silverma/fdacasebook/ Data and
some (Matlab, R, S+) code for the case studies book. • J.O. Ramsay & B.W. Silverman (2005) Functional data
analysis. New York: Springer. Second edition, much expanded • http://ego.psych.mcgill.ca/misc/fda/ Main FDA website,
containing R library package and data sets. • J.O. Ramsay & B.W. Silverman (2009) Functional data
analysis with R and Matlab. New York: Springer. FDA – p. 42/42