Optimizing target functions with long evaluations times can be tedious to impossible. Especially if it is hard to parallelize the target function, one might wonder whether it is possible to parallelize the optimization methods itself. Indeed, parallel implementations are available for some stochastic optimizers; see the CRAN Task View on Optimization. However, also the widely used gradient-based optimization methods like the “L-BFGS-B” method from `optim()`

can profit from parallelization. More precisely, at each step the target function and the (approximate) gradient can be evaluated in parallel. Taking up on this idea, we have developed the R package **optimParallel**, which provides parallel versions of the gradient-based optimization methods of `optim()`

. Its main function `optimParallel()`

has the same usage and output as `optim()`

and speeds up optimization significantly.

### A simple example

Executing a gradient-based `optim()`

call in parallel requires to following steps:

- install and load
**optimParallel**from CRAN, - setup a default cluster for parallel execution using the R package
**parallel**, - replace
`optim()`

by`optimParallel()`

.

For illustration, we consider the following optimization task. Note the use of `Sys.sleep()`

to mimic a computationally intensive function.

```
set.seed(13)
x <- rnorm(1000, 5, 2)
negll <- function(par, x) {
Sys.sleep(1)
-sum(dnorm(x=x, mean=par[1], sd=par[2], log=TRUE))
}
optim(par=c(1,1), fn=negll, x=x, method = "L-BFGS-B",
lower=c(-Inf, .0001))
```

The parallel version of the same task is:

```
install.packages("optimParallel")
library("optimParallel")
cl <- makeCluster(5) # set the number of processor cores
setDefaultCluster(cl=cl) # set 'cl' as default cluster
optimParallel(par=c(1,1), fn=negll, x=x,
method = "L-BFGS-B", lower=c(-Inf, .0001))
```

### Reduction of the optimization time

The following figure shows the results of a benchmark experiment comparing the “L-BFGS-B” method from `optimParallel()`

and `optim()`

; see the arXiv preprint for more details. Plotted are the elapsed times per iteration (y-axis) and the evaluation time of the target function (x-axis). The colors indicate the number of parameters of the target function and whether an analytic gradient was specified. The elapsed times of `optimParallel()`

(solid line) are smaller for all tested scenarios.

### Trace the optimization path

Besides the parallelization, `optimParallel()`

provides additional innovations. For example, it can return log-information, which allow the user to trace the optimization path.

```
optimParallel(par=c(1,1), fn=negll, x=x,
method = "L-BFGS-B", lower=c(-Inf, .0001),
parallel=list(loginfo=TRUE))
```