Temporal disaggregation with cubic splines in R

August 1, 2016

I implement cubic spline interpolation as described here to disaggregate a low-frequency series to a higher-frequency series, subject to an additivity constraint (appropriate for flows such as GDP or indexes such as CPI).

The objective is to obtain a “smooth” curve whose integral over each period \([i-1, i]\) equals the observed value \(X_i\). This is the constraint of temporal additivity:

\[\int_{i-1}^{i} f(t) dt = X_i\]

A cubic spline function \(f(t)\) is a piecewise cubic polynomial, specified as follows:

\[f(t) = f_i(t) \text{ for } t \in [i-1,i]\] \[\text{where } f_i(t) = a_i + b_it + c_it^2 + d_it^3\]

To count as “smooth”, our spline should at the very least be continuous (no level jumps) and its derivative or slope should be continuous (no kinks). That is, at each join \( 1, 2, … , T-1\) where adjacent cubic polynomials meet, they should have the same level as well as the same slope:

\[f_t(t) = f_{t+1}(t)\] \[{f'}_t(t) = {f'}_{t+1}(t)\]

Continuity

\[f_t(t) = f_{t+1}(t)\] \[a_t + b_tt + c_tt^2 + d_tt^3 = a_{t+1} + b_{t+1}t + c_{t+1}t^2 + d_{t+1}t^3\] \[a_t + b_tt + c_tt^2 + d_tt^3 - a_{t+1} - b_{t+1}t - c_{t+1}t^2 - d_{t+1}t^3 = 0\]

A1 = matrix(0, 4*k, k-1)
# Constraint (LHS)
for (i in 1:(k-1)){
	idx = (4*i-3):(4*(i+1))
	A1[idx,i] = c(1, i, i^2, i^3, -1, -i, -i^2, -i^3)
}
b1 = rep(0, k-1)  # RHS

Continuity of first derivative

\[{f'}_t(t) = {f'}_{t+1}(t)\] \[b_t + 2c_tt + 3d_tt^2 = b_{t+1} + 2c_{t+1}t + 3d_{t+1}t^2\] \[b_t + 2c_tt + 3d_tt^2 - b_{t+1} - 2c_{t+1}t - 3d_{t+1}t^2 = 0\]

A2 = matrix(0, 4*k, k-1)
# Constraint (LHS)
for (i in 1:(k-1)){
	idx = (4*i-3):(4*(i+1))
	A2[idx,i] = c(0, 1, 2*i, 3*i^2, 0, -1, -2*i, -3*i^2)
}
b2 = rep(0, k-1)  # RHS

Additivity

\[\int_{i-1}^{i} f_i(t) dt = X_i\] \[\int_{i-1}^{i} a_t + b_tt + c_tt^2 + d_tt^3 dt = X_i\] \[a_i + \frac{b_i}{2}[i^2-(i-1)^2] + \frac{c_i}{3}[i^3-(i-1)^3] + \frac{d_i}{4}[i^4-(i-1)^4] = X_i\]

A3 = matrix(0, 4*k, k)
# Constraint (LHS)
for (i in 1:k){
    idx = (4*i - 3):(4*i)
    A3[idx, i] = c(1, (1/2)*(i^2-(i-1)^2), (1/3)*(i^3-(i-1)^3),
		   (1/4)*(i^4-(i-1)^4))
}
b3 = x  # RHS

A = cbind(A1, A2, A3)
b = c(b1, b2, b3)

\[\text{min} \sum_{i=1}^{k} \left(\frac{\partial^2f_i}{\partial t^2}\right)^2 dt\] \[\text{min} \sum_{i=1}^{k} 4c_i^2 + 12(2i-1)c_id_i + 12(3i^2 - 3i + 1)d_i^2\]

# Objective function
d = rep(0, 4*k)  # no linear terms
D = matrix(0, 4*k, 4*k)  # quadratic terms
for (i in 1:k){
	tmp = matrix(c(4, 12*(2*i-1), 12*(2*i-1), 12*(3*i^2 - 3*i + 1)),
		     2, 2)
	idx = (4*i-1):(4*i)
	D[idx,idx] = tmp
}

Together, \(A\), \(b\), \(D\), and \(d\) define a quadratic program: minimize \(\frac{1}{2}\theta^\top D\theta - d^\top\theta\) subject to \(A^\top\theta = b\). We solve it with solve.QP from the quadprog package. Because \(D\) is only positive semidefinite (the objective doesn’t penalize \(a_i\) or \(b_i\) directly), we first find the nearest positive definite approximation using Matrix::nearPD.

Given the spline coefficients, disaggregating by a factor f (e.g. f=3 for annual-to-quarterly) is a matter of integrating each piece over sub-intervals of width \(1/f\):

splint = function(x, f=3) {
  k = length(x)
  coef = solve.splint(x)
  coef.mat = matrix(coef, 4, k)

  x.int = rep(NA, k*f)
  for (i in 1:(k*f)){
    t1 = i/f
    t0 = (i-1)/f
    idx = (i-1) %/% f + 1
    x.int[i] = f*sum(coef.mat[,idx] * c(1/f, (1/2)*(t1^2 - t0^2),
                                              (1/3)*(t1^3 - t0^3),
                                              (1/4)*(t1^4 - t0^4)))
  }
  x.int
}

The full implementation is on GitHub: splint.R.