LBFGS minimzer with curvatures

This script shows how to use the LBFGS minimizer with curvatures.
Authors: Pavel V. Afonine, Dorothee Liebschner

See the LBFGS script without curvature for some information about the LBFGS algorightm.

The Rosenbrock function

To illustrate the use of the minimizer, we will use the Rosenbrock function (see also LBFGS script without curvature).
It is defined by:

\begin{equation} f(x,y) = (a-x)^2 +b(y-x^2)^2 \end{equation}

We will use the coefficients a=1 and b=100:

\begin{equation} f(x,y) = (1-x)^2 +100(y-x^2)^2 \end{equation}

The partial first derivatives are:

\begin{equation} \frac{\partial f(x,y)}{\partial x} = -2 (1-x) + 400 (-xy + x^3) \end{equation} \begin{equation} \frac{\partial f(x,y)}{\partial y} = 200(y - x^2) \end{equation}

The partial second derivatives are (the diagonal elements of the Hessian matrix):

\begin{equation} \frac{\partial^2 f(x,y)}{\partial x ^2} = 2 + 400(-y+3x^2) \end{equation} \begin{equation} \frac{\partial^2 f(x,y)}{\partial y ^2} = 200 \end{equation}

The global minimum is where the partial derivatives are equal to zero. For the Rosenbrock function with coefficients a=1 and b=100, this is at the coordinates: \( x = 1 \), \( y = 1 \). So the LBFGS minimizer should find coordinates equal (or very close) to this point. The function value of the Rosenbrock function at (1,1) is equal to 0.

Code

from __future__ import division, print_function
from scitbx import lbfgs as scitbx_lbfgs
from scitbx.array_family import flex
from libtbx import adopt_init_args

# Rosenbrock's function, gradients and curvatures

def target(x,y):
  return (1-x)**2+100*(y-x**2)**2

def grad_x(x,y):
  return -2*(1-x) + 400*(-x*y+x**3)

def grad_y(x,y):
  return 2*100*(y-x**2)

def curv_xx(x,y):
  return 2 + 400*(-y+3*x**2)

def curv_yy(x,y):
  return 200

def lbfgs_run(target_evaluator,
              min_iterations=0,
              max_iterations=None,
              traditional_convergence_test=True,
              use_curvatures=False):
  ext = scitbx_lbfgs.ext
  minimizer = ext.minimizer(target_evaluator.n)
  minimizer.error = None
  is_converged = ext.traditional_convergence_test(target_evaluator.n)
  try:
    icall = 0
    requests_f_and_g = True
    requests_diag = use_curvatures
    while 1:
      if (requests_f_and_g):
        icall += 1
      x, f, g, d = target_evaluator(
        requests_f_and_g=requests_f_and_g,
        requests_diag=requests_diag)
      if (use_curvatures):
        if (d is None): d = flex.double(x.size())
        have_request = minimizer.run(x, f, g, d)
      else:
        have_request = minimizer.run(x, f, g)
      if (have_request):
        requests_f_and_g = minimizer.requests_f_and_g()
        requests_diag = minimizer.requests_diag()
        continue
      assert not minimizer.requests_f_and_g()
      assert not minimizer.requests_diag()
      if (minimizer.iter() >= min_iterations and is_converged(x, g)): break
      if (max_iterations is not None and minimizer.iter() >= max_iterations):
        break
      if (use_curvatures):
        have_request = minimizer.run(x, f, g, d)
      else:
        have_request = minimizer.run(x, f, g)
      if (not have_request): break
      requests_f_and_g = minimizer.requests_f_and_g()
      requests_diag = minimizer.requests_diag()
  except RuntimeError as e:
    minimizer.error = str(e)
  minimizer.n_calls = icall
  return minimizer

class minimizer:

  def __init__(self, xx=-3, yy=-4, min_iterations=0, max_iterations=10000):
    adopt_init_args(self, locals())
    self.x = flex.double([xx, yy])
    self.n = self.x.size()

  def run(self, use_curvatures=False):
    self.minimizer = lbfgs_run(
      target_evaluator=self,
      min_iterations=self.min_iterations,
      max_iterations=self.max_iterations,
      use_curvatures=use_curvatures)
    self(requests_f_and_g=True, requests_diag=False)
    return self

  def __call__(self, requests_f_and_g, requests_diag):
    self.xx, self.yy = self.x
    if (not requests_f_and_g and not requests_diag):
      requests_f_and_g = True
      requests_diag = True
    if (requests_f_and_g):
      self.f = target(self.xx,self.yy)
      self.g = flex.double(
        (grad_x(self.xx, self.yy),
         grad_y(self.xx, self.yy)))
      self.d = None
    if (requests_diag):
      self.d = flex.double(
        (curv_xx(self.xx, self.yy),
         curv_yy(self.xx, self.yy)))
      assert self.d.all_ne(0)
      self.d = 1 / self.d
    return self.x, self.f, self.g, self.d

def run():
  for use_curvatures in (False, True):
    print("use_curvatures:", use_curvatures)
    m = minimizer().run(use_curvatures=use_curvatures)
    print(tuple(m.x), "final")
    if (abs(m.x[0]-1) > 1.e-4 or abs(m.x[1]-1) > 1.e-4):
      print(tuple(m.x), "failure, use_curvatures="+str(use_curvatures))
    print("iter,exception:", m.minimizer.iter(), m.minimizer.error)
    print("n_calls:", m.minimizer.n_calls)
    print()
    assert m.minimizer.n_calls == m.minimizer.nfun()

if (__name__ == "__main__"):
  run()

Overview

The script consists of several parts:

Imports: At the beginning of the script are import statements. They are used to import the modules needed for our task.

Rosenbrock function: target returns the value of the Rosenbrock function, grad_x and grad_y return the partial derivatives, curve_xx and curve_yy return the diagonal elements of the Hessian matrix (matrix of partial second derivatives).

Main function: The run() function is used to call the minimizer and to print the result to standard output.

Class minimizer: Class to perform minimization of the Rosenbrock function.

Function lbfgs_run: Helper function to run the scitbx minimizer.

The run() function

The run() function exectues the run() method of the minimizer() class, once with curvatures and once without curvatures:

m = minimizer().run(use_curvatures=use_curvatures)

This will launch the minimization. For both computations, we print some results to standard output.

The minimizer class

The minimizer class has three methods: __init__(), run() and __call__().

The __init__() method initializes our class with the starting point (-3, -4) (xx and yy), and sets the number of minimum and maximum iterations to 0 and 10,000, respectively.
x is a two-dimensional array that represents the x and y coordinates, so x[0] = -3 and x[1] = -4 are the (arbitrarily chosen) starting points of the minimization task.
self.n represents the length of the array x.

The run() method starts the L-BFGS minimizer:

self.minimizer = lbfgs_run(
  target_evaluator=self,
  min_iterations=self.min_iterations,
  max_iterations=self.max_iterations,
  use_curvatures=use_curvatures)

Then it executes the __call__() method:

self(requests_f_and_g=True, requests_diag=False)

The __call__() method allows an instance of the minimzer class to behave like a function, i.e. it can be called like a function; minimizer(requests_f_and_g=True, requests_diag=False) is then a shorthand for minimizer.__call__(requests_f_and_g=True, requests_diag=False).
Depending on the values of the boolean parameters requests_f_and_g and requests_diag, the __call__() method will update the target and the gradients or the diagonal elements of the Hessian matrix (second derivatives). Note that it will actually calculate the inverse of the diagonal elements (the Newton method uses the inverse of the Hessian matrix and the LBFGS algorithm approximates the Hessian matrix):

assert self.d.all_ne(0)
self.d = 1 / self.d
(As we are calculating the inverse, we first assert that the diagonal elements are different from zero.)

The lbgfs_run function

Here, we import the lbfgs module from scitbx:

ext = scitbx_lbfgs.ext

Then we instantiate the minimizer from lbfgs with the dimension of our function (in our case: n=2).

minimizer = ext.minimizer(target_evaluator.n)

Then we get a function to check if the minimization converged. Its arguments are the function value and the gradients.

is_converged = ext.traditional_convergence_test(target_evaluator.n)

Now we can start our minimization loop. We get the values of the booleans requests_f_and_g and requests_diag from the scitbx minimizer, but the starting values are True and the value of use_curvatures.

We get the coordinates, the function value, the gradients and the curvatures from our minimizer class.

x, f, g, d = target_evaluator(
  requests_f_and_g=requests_f_and_g,
  requests_diag=requests_diag)

Then we run the scitbx minimizer with the coordinates, the function value and the gradients as input parameters. If we are using curvatures, we also supply the second derivatives. The minimizer will return requests for function values and the gradients and/or for the curvatures:

requests_f_and_g = minimizer.requests_f_and_g()
requests_diag = minimizer.requests_diag()
If the minimizer requests values, we continue in the loop. Otherwise, we check if we meet the convergence criteria. Finally, the function returns the scitbx minimizer object.

Output

The run() function has some output. It will print the final coordinates, the number of iterations in the scitbx minimizer and an exception message, if present.

use_curvatures: False
(0.9999998308201578, 0.9999996829964546) final
iter,exception: 27 None
n_calls: 34

use_curvatures: True
(1.0000002135019253, 1.000000406037093) final
iter,exception: 43 None
n_calls: 60

Both methods converged to the minimum at (1,1).