LBFGS minimzer without curvatures

This script shows how to use the LBFGS minimizer without curvatures (gradients only), with and without bounds.
Authors: Pavel V. Afonine, Dorothee Liebschner

The LBFGS algorithm

The limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) algorithm is an optimization method to find the minimum of a function f(x), with x being a vector in \( \mathbb{R}^3 \) and f(x) a differentiable function.

The basic idea of minimization algorithms is to find iteratively the minimum of a function \( f(x) \). For example, the gradient descent method starts off at a random initial point \( f(x_0) \) and iteratively takes steps that are proportional to the negative gradient of \( f(x) \) at the current point. As this is quite inefficient (a lot of iterations are required to find the minimum), the second order behavior (second derivatives = curvatures) can be used in addition. This is the Newton method, where the iterative scheme is given by:

\begin{equation} x_{k+1} = x_k - [H(x_k)]^{-1} \nabla f(x_k) \end{equation}

The Newton method is quite efficient, but it is computational expensive to calculate the inverse of the Hessian matrix. This is where the BFGS algorithm kicks in: Instead of explicitly calculating the second derivates, it uses approximations of the Hessian matrix to converge faster. At each iteration k+1, an approximate inverse second derivative is calculated using quantities from previous steps (the first derivative from the previous two iterations k and k−1):

\begin{equation} x_{k+1} = x_k - f^\prime(x_k) \frac{x_k-x_{k-1}}{f^\prime(x_k) - f^\prime(x_k-1)} \end{equation}

The L-BFGS algorithm is an approximation of the BFGS procedure. It uses a limited amount of computer memory.
L-BFGS does not use boundaries as it was developed to solve unconstrained optimization problems. However, there is an extension of L-BFGS, called L-BFGS-B (Zhu et al., 1997), that can handle simple bounds. Both options are available in the cctbx.

Generally, the gradient based optimizers have a larger convergence radius but have a harder time to approach the minimum.
Second derivative based minimizers have much smaller convergence radius, but if you are close enough to the minimum, they can get there much more efficiently.
Therefore it is best to use both: when you are far from minimum, use gradient based minimizers, and once you are close enough switch to use 2nd derivatives. However, this option is not available in the cctbx.

Further reading:

  • Introduction to BFGS.
  • David M. Himmelblau, Applied Nonlinear Programming, 1972 (out of print, but can be found second-hand)

The Rosenbrock function

The Rosenbrock function is a test function for evaluate the performance of optimization algorithms, such as convergence rate, precision and robustness.
Simple functions like this are useful to debug and test new algorithms as they are fast to implement and to execute. If the algorithm does not perform well on this standard problem, it indicates that it most likely won't work well on real life problems.
It is defined by:

\begin{equation} f(x,y) = (a-x)^2 +b(y-x^2)^2 \end{equation}

The partial derivatives are:

\begin{equation} \frac{\partial f(x,y)}{\partial x} = -2a + 2x - 4bxy + 4bx^3 \end{equation} \begin{equation} \frac{\partial f(x,y)}{\partial y} = 2by - 2bx^2 \end{equation}

To find the global minimum, we set the partial derivatives to 0. Equation (5) gives the condition \( y = x^2 \). Inserting this into equation (4) gives the coordinates of the global minimum: \( x = a \), \( y = a^2 \).

Code

The code below will compute the minimum of the Rosenbrock function with parameters a = 20 and b = 10: \( f(x,y) = (20-x)^2 +10(y-x^2)^2 \)

from __future__ import division, print_function
from scitbx.array_family import flex
from libtbx import adopt_init_args
import scitbx.lbfgs
from scitbx import lbfgsb

class rosenbrock(object):
  def __init__(self, a, b, x):
    adopt_init_args(self, locals())
    assert self.x.size() == 2

  def update(self, x):
    self.x = x
    assert self.x.size() == 2

  def target(self):
    return (self.a-self.x[0])**2+self.b*(self.x[1]-self.x[0]**2)**2

  def gradients(self):
    g1 = 2*(self.x[0]-self.a) + 4*self.b*(self.x[0]**3-self.x[0]*self.x[1])
    g2 = 2*self.b*(self.x[1]-self.x[0]**2)
    return flex.double([g1,g2])

def lbfgs_run(target_evaluator, use_bounds, lower_bound, upper_bound):
  minimizer = lbfgsb.minimizer(
    n   = target_evaluator.n,
    l   = lower_bound, # lower bound
    u   = upper_bound, # upper bound
    nbd = flex.int(target_evaluator.n, use_bounds)) # flag to apply both bounds
  minimizer.error = None
  try:
    icall = 0
    while 1:
      icall += 1
      x, f, g = target_evaluator()
      have_request = minimizer.process(x, f, g)
      if(have_request):
        requests_f_and_g = minimizer.requests_f_and_g()
        continue
      assert not minimizer.requests_f_and_g()
      if(minimizer.is_terminated()): break
  except RuntimeError as e:
    minimizer.error = str(e)
  minimizer.n_calls = icall
  return minimizer

class minimizer_bound(object):

  def __init__(self,
               calculator,
               use_bounds,
               lower_bound,
               upper_bound,
               initial_values):
    adopt_init_args(self, locals())
    self.x = initial_values
    self.n = self.x.size()

  def run(self):
    self.minimizer = lbfgs_run(
      target_evaluator=self,
      use_bounds=self.use_bounds,
      lower_bound = self.lower_bound,
      upper_bound = self.upper_bound)
    self()
    return self

  def __call__(self):
    self.calculator.update(x = self.x)
    self.f = self.calculator.target()
    self.g = self.calculator.gradients()
    return self.x, self.f, self.g

class minimizer_unbound(object):
  #
  def __init__(self, max_iterations, calculator):
    adopt_init_args(self, locals())
    self.x = self.calculator.x
    self.minimizer = scitbx.lbfgs.run(
      target_evaluator=self,
      termination_params=scitbx.lbfgs.termination_parameters(
        max_iterations=max_iterations))

  def compute_functional_and_gradients(self):
    self.calculator.update(x = self.x)
    t = self.calculator.target()
    g = self.calculator.gradients()
    return t,g

def run():
  # Instantiate rosenbrock class
  calculator = rosenbrock(a = 20, b = 10, x = flex.double([0,0]))
  #
  print('Run L-BFGS (no boundaries)')
  m_unbound = minimizer_unbound(max_iterations=100, calculator=calculator)
  print('\tMinimum: ', list(m_unbound.x))
  print()
  print('Run L-BFGS-B with boundaries')
  m_bound = minimizer_bound(
    calculator     = calculator,
    use_bounds     = 2,
    lower_bound    = flex.double([-10000,-10000]),
    upper_bound    = flex.double([10000,10000]),
    initial_values = flex.double([0,0])).run()
  print('\tMinimum: ', list(m_bound.x))

if (__name__ == "__main__"):
  run()

Overview

The script consists of several parts:

Imports: At the beginning of the script are import statements. They are used to import the modules needed for a task.

Main function: The run() function is used to call the minimizer and to print the result to standard output.

class Rosenbrock: The minimizers use this class to compute values and gradients of the Rosenbrock function.

class minimizer_unbound: Class to perform minimization of the Rosenbrock function, without bounds.

class minimizer_bound: Class to perform minimization of the Rosenbrock function, without bounds.

function lbfgs_run Helper function to run the minimizer with bounds. The code could be part of the class but it helps keeping tasks apart by moving it to a separate function.

The class rosenbrock()

The minimizer needs a calculator object that represents the function to be minimized. In our example, this is the class rosenbrock(). The class is initialized in the run() function with the two parameters a and b and a starting value \( x = (0,0) \):

calculator = rosenbrock(a = 20, b = 10, x = flex.double([0,0]))

Note that x is a two-dimensional array that represents the vector (x,y) in equation (3); so x[0] = x and x[1] = y, if we translate the notation in the code to the notation in the equation.
The update() method assigns the vector (x,y) as attribute x of the rosenbrock class. We get the current value of x from the minimizer and it is used to calculate the next value of the minimization procedure. After enough iterations, this value should correspond to the minimum of the function.
The target() method returns the value of the rosenbrock function in equ. (3) calculated at (x,y).
The gradients() method returns the array of partial derivatives (equations 4 and 5).

The run() function

The run() function instantiates the rosenbrock() class with parameters a=20 and b=10:

calculator = rosenbrock(a = 20, b = 10, x = flex.double([0,0]))

Then it creates the minimizer_unbound() class:

m_unbound = minimizer_unbound(max_iterations=100, calculator=calculator)

and the minimizer_bound() class:

  m_bound = minimizer_bound(
    calculator     = calculator,
    use_bounds     = 2,
    lower_bound    = flex.double([-10000,-10000]),
    upper_bound    = flex.double([10000,10000]),
    initial_values = flex.double([0,0])).run()

In both cases, thiw will launch the minimization. For both computations, we print the result to standard output.

Minimization without bounds

For minimization without bounds, we will use the minimizer from scitbx.lbfgs.
First, we initialize the minimizer: We set the starting value (x,y) to the initial value supplied to the rosenbrock class.

self.x = self.calculator.x

In our example, this is \( x = (0,0) \).

Then, we start the minimizer:

self.minimizer = scitbx.lbfgs.run(
  target_evaluator=self,
  termination_params=scitbx.lbfgs.termination_parameters(
    max_iterations=max_iterations))

As the name suggests, the scitbx.lbfgs.termination_parameters object supplies parameters relevant for decision-making to terminate the minimization. The full set of termination parameters can be found in scitbx/lbfgs/__init__.py. In our example, we only change one parameter, the maximum number of iterations, which we set to 100 when we instantiated the minimizer_unbound() class in the run() function.

To perform L-BFGS minimization, we need to calculate the value and the derivative of our function at a point (x,y). this is done in the compute_functional_and_gradients() method:

def compute_functional_and_gradients(self):
  self.calculator.update(x = self.x)
  t = self.calculator.target()
  g = self.calculator.gradients()
  return t,g

This is all we need for minimization without boundaries. The attribute x of the minimizer_unbound() object will give us the minimum of the Rosenbrock function. Let's print the value in the run() function:

print('\tMinimum: ', list(m_unbound.x))

Minimization with bounds

Sometimes, the variables of the function to be minimized are not expected to exceed or be inferior to certain values. Then we can use this information and apply boundaries to the minimization procedure. This means we will use the L-BFGS-B algorithm, which is in scitbx:

from scitbx import lbfgsb

The lbfgsb procedure needs to be constructed a bit differently than the lbfgs minimizer, but the principle is the same.
First, we construct the minimizer_bound() class in the run() function:

m_bound = minimizer_bound(
  calculator     = calculator,
  use_bounds     = 2,
  lower_bound    = flex.double([-10000,-10000]),
  upper_bound    = flex.double([10000,10000]),
  initial_values = flex.double([0,0])).run()

By providing the parameters for use_bounds, lower_bound and upper_bound, we apply a minimum value of -10,000 and a maximum value of 10,000 for both x and y. Note that we could have chosen different values for each variable.

In the class, the minimizer is first initialized. As in the previous example (minimizer_unbound), the starting value is set to \( x = (0,0) \).

def __init__(self,
             calculator,
             use_bounds,
             lower_bound,
             upper_bound,
             initial_values):
  adopt_init_args(self, locals())
  self.x = initial_values
  self.n = self.x.size()

The run() method calls our helper function lbfgs_run().

def run(self):
  self.minimizer = lbfgs_run(
    target_evaluator=self,
    use_bounds=self.use_bounds,
    lower_bound = self.lower_bound,
    upper_bound = self.upper_bound)
  self()
  return self
The call() method calculates the function value and the derivatives, a bit like in the compute_functional_and_gradients() method of the unbound minimizer.

def __call__(self):
  self.calculator.update(x = self.x)
  self.f = self.calculator.target()
  self.g = self.calculator.gradients()
  return self.x, self.f, self.g

Let's look at the lbfgs_run() function. We need to define this additional function because the lbfgsb minimizer does not take an evaluator class as input. Instead, we have to provide the variable, its function value and the gradient explicitly:

have_request = minimizer.process(x, f, g)

The rest of the code in the try-except part handles errors and makes sure the minimizer stops when appropriate, according to termination parameters.

Output

Let's have a look at the output from running the script. We know that the minimum of the Rosenbrock function is at \( x = a \), \( y = a^2 \), so in our example, it is at \( (x,y) = (20, 400) \). The value of the function at this point is 0. The lbfgs minimizer script with our parameters gives:

Run L-BFGS (no boundaries)
  Minimum:  [19.99999855596643, 399.99994289915264]

Run L-BFGS-B with boundaries
  Minimum:  [19.99999999985263, 399.9999999940399]

The values are very close to the analytical value. Both algorithms converged to the minimum.

We can also quickly check how the minimization progresses. For example for the L-BFGS-B method, by adding a print statement at the end of the while loop in the lbfgs_run() function, we get the current values of f(x,y), i.e. the current value of the Rosenbrock function during the minimization. We can then plot the target value as a function of iterations for each method:

fig

In our example, the L-BFGS minimizer without bounds converged after 59 iterations, while the L-BFGS-B minimizer finished after 64 iterations. For both methods, the minimization progresses in a similar fashion.