optimParallel: speed up your optim() calls

 

Optimizing target functions with long evaluations times can be tedious to impossible. Especially if it is hard to parallelize the target function, one might wonder whether it is possible to parallelize the optimization methods itself. Indeed, parallel implementations are available for some stochastic optimizers; see the CRAN Task View on Optimization. However, also the widely used gradient-based optimization methods like the “L-BFGS-B” method from optim() can profit from parallelization. More precisely, at each step the target function and the (approximate) gradient can be evaluated in parallel. Taking up on this idea, we have developed the R package optimParallel, which provides parallel versions of the gradient-based optimization methods of optim(). Its main function optimParallel() has the same usage and output as optim() and speeds up optimization significantly.

A simple example

Executing a gradient-based optim() call in parallel requires to following steps:

  1. install and load optimParallel from CRAN,
  2. setup a default cluster for parallel execution using the R package parallel,
  3. replace optim() by optimParallel().

For illustration, we consider the following optimization task. Note the use of Sys.sleep() to mimic a computationally intensive function.

set.seed(13)
x <- rnorm(1000, 5, 2)
negll <- function(par, x) {
    Sys.sleep(1)
    -sum(dnorm(x=x, mean=par[1], sd=par[2], log=TRUE))
}
optim(par=c(1,1), fn=negll, x=x, method = "L-BFGS-B",
      lower=c(-Inf, .0001))

The parallel version of the same task is:

install.packages("optimParallel")
library("optimParallel")
cl <- makeCluster(5)     # set the number of processor cores
setDefaultCluster(cl=cl) # set 'cl' as default cluster
optimParallel(par=c(1,1), fn=negll, x=x,
              method = "L-BFGS-B", lower=c(-Inf, .0001))

Reduction of the optimization time

The following figure shows the results of a benchmark experiment comparing the “L-BFGS-B” method from optimParallel() and optim(); see the arXiv preprint for more details. Plotted are the elapsed times per iteration (y-axis) and the evaluation time of the target function (x-axis). The colors indicate the number of parameters of the target function and whether an analytic gradient was specified.  The elapsed times of optimParallel() (solid line) are smaller for all tested scenarios.

benchmark

Trace the optimization path

Besides the parallelization, optimParallel() provides additional innovations. For example, it can return log-information, which allow the user to trace the optimization path.

optimParallel(par=c(1,1), fn=negll, x=x,
              method = "L-BFGS-B", lower=c(-Inf, .0001),
              parallel=list(loginfo=TRUE))

optimization_path

Links