Some algorithms for logistic regression in Excel and R

In an effort to teach myself more about Excel/VBA programming and maximum likelihood estimation, I’ve been implementing various algorithms for estimating logistic regression models. In this note I try document my progress.

Some technical preliminaries

Feel free to skip this section. Consider the generic logistic regression model specification:

\[\text{logit}(p_i) = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_k x_{ki}\] \[y_i \sim \text{Bernoulli}(p_i)\]

Logistic regression finds the vector \(\hat{\beta} = (\hat{\beta}_0, …, \hat{\beta}_k)\) that maximizes the likelihood function,

\[\mathcal{L}(\beta;y) = \prod_{i=1}^n h(\beta;x_i)^{y_i} (1-h(\beta;x_i))^{1-y_i}\]

or the log likelihood function,

\[\ell(\beta;y) = \sum_{i=1}^n y_i \text{log}(h(\beta;x_i)) + (1-y_i) \text{log}(1-h(\beta;x_i))\]

where

\[h(\beta;x_i) = \text{logit}^{-1}(\beta_0 + \beta_1 x_{1i} + ... + \beta_k x_{ki})\]

Here \(h(\beta;x_i)\) is the function that predicts the response probability \(p_i\), or \(\text{P}(y_i=1)\), from the parameter vector \(\beta\) and the feature vector \(x_i\), according to the model specification. We can express this more compactly like so:

\[h(\beta;x_i) = \frac{1}{1+e^{-\beta^T x_i}}\]

A first stab using the Excel Solver

The Solver can optimize arbitrary (or at least well-behaved) continuous functions subject to various user-defined constraints. All you need is to define your function (in practice, any spreadsheet that starts with numeric inputs and computes a numeric value somewhere), specify the input variables to optimize over, and perhaps add constraints to the problem.

The optimization problem for logistic regression is easy to set up in Excel. We start with the data, \(y\) and \(X\), and an initial list of parameters \(\beta\). We create a spreadsheet that calculates the log-likelihood of this initial model. In an Excel macro we could do it as follows, after defining the appropriate named ranges:

XBETA.FormulaArray = "=MMULT(X, BETA)"
H_XBETA.FormulaArray = "=1 / (1 + exp(-XBETA))"
LOGLIK.FormulaArray = "=Y * ln(H_XBETA) + (1-Y) * ln(1-H_XBETA)"
LOGLIK_TOTAL.Formula = "=sum(LOGLIK)"

Then we can use the Solver to maximize the log-likelihood over \(\beta\):

SolverReset
SolverOk SetCell:=LOGLIK_TOTAL.Address, MaxMinVal:=1, ValueOf:=0, _
    ByChange:=BETA.Address, Engine:=1, EngineDesc:="GRG Nonlinear"
SolverOptions Assumenonneg:=False
SolverSolve userFinish:=True

An Excel spreadsheet implementing this approach is available here. On the example dataset (the Bangladeshi well data from ARM), with 3020 observations and 5 variables, the process takes about a second, and the parameter estimates agree with those produced by the R function glm to 4 significant figures.

Newton-Raphson method

The Newton-Raphson method involves the following iteration, which converges to the maximum likelihood estimate \(\hat{\beta}\):

\[\beta := \beta - \frac{\partial \ell(\beta; y)/\partial\beta}{\partial^2 \ell(\beta; y)/(\partial\beta)^2}\]

The individual partial derivatives evaluate to

\[\frac{\partial \ell}{\partial\beta_j}(\beta; y) = \sum_{i=1}^n (y_i - h(\beta;x_i))x_{ij}\]

We can express the gradient in matrix form as follows:

\[\frac{\partial \ell}{\partial\beta}(\beta; y) = X^\top (y - h(\beta;X))\]

where \(h(\beta;X) = 1/(1+e^{-X\beta})\) is the vector of the predicted response probabilities. The individual second derivatives are given by

\[\frac{\partial^2 \ell}{\partial\beta_j \partial\beta_k}(\beta; y) = - \sum_{i=1}^n h(\beta; x_i) (1-h(\beta; x_i)) x_{ij} x_{ik}\]

or in matrix form,

\[\frac{\partial^2 \ell}{(\partial\beta)^2}(\beta; y) = -X^\top V X\]

where \(V\) is the diagonal matrix with \(h(\beta;X) \circ (1 - h(\beta;X))\) on the diagonal (\(\circ\) is the elementwise product). We can then write the Newton-Raphson iteration compactly as follows:

\[\beta := \beta + (X^\top V X)^{-1} X^\top (y - h(\beta;X))\]

[I don’t remember where I was going with this.]