Go to the first, previous, next, last section, table of contents.

Multidimensional Minimization

This chapter describes routines for finding minima of arbitrary multidimensional functions. The library provides low level components for a variety of iterative minimizers and convergence tests. These can be combined by the user to achieve the desired solution, while providing full access to the intermediate steps of the algorithms. Each class of methods uses the same framework, so that you can switch between minimizers at runtime without needing to recompile your program. Each instance of a minimizer keeps track of its own state, allowing the minimizers to be used in multi-threaded programs. The minimization algorithms can be used to maximize a function by inverting its sign.

The header file `gsl_multimin.h' contains prototypes for the minimization functions and related declarations.

Overview

The problem of multidimensional minimization requires finding a point x such that the scalar function,

f(x_1, ..., x_n)

takes a value which is lower than at any neighboring point. For smooth functions the gradient g = \nabla f vanishes at the minimum. In general there are no bracketing methods available for the minimization of n-dimensional functions. All algorithms proceed from an initial guess using a search algorithm which attempts to move in a downhill direction. A one-dimensional line minimisation is performed along this direction until the lowest point is found to a suitable tolerance. The search direction is then updated with local information from the function and its derivatives, and the whole process repeated until the true n-dimensional minimum is found.

Several minimization algorithms are available within a single framework. The user provides a high-level driver for the algorithms, and the library provides the individual functions necessary for each of the steps. There are three main phases of the iteration. The steps are,

initialize minimizer state, s, for algorithm T
update s using the iteration T
test s for convergence, and repeat iteration if necessary

Each iteration step consists either of an improvement to the line-mimisation in the current direction or an update to the search direction itself. The state for the minimizers is held in a gsl_multimin_fdfminimizer struct.

Caveats

Note that the minimization algorithms can only search for one local minimum at a time. When there are several local minima in the search area, the first minimum to be found will be returned; however it is difficult to predict which of the minima this will be. In most cases, no error will be reported if you try to find a local minimum in an area where there is more than one.

It is also important to note that the minimization algorithms find local minima; there is no way to determine whether a minimum is a global minimum of the function in question.

Initializing the Multidimensional Minimizer

The following function initializes a multidimensional minimizer. The minimizer itself depends only on the dimension of the problem and the algorithm and can be reused for different problems.

Function: gsl_multimin_fdfminimizer * gsl_multimin_fdfminimizer_alloc (const gsl_multimin_fdfminimizer_type *T, size_t n): This function returns a pointer to a a newly allocated instance of a minimizer of type T for an n-dimension function. If there is insufficient memory to create the minimizer then the function returns a null pointer and the error handler is invoked with an error code of GSL_ENOMEM.

Function: int gsl_multimin_fdfminimizer_set (gsl_multimin_fdfminimizer * s, gsl_multimin_function_fdf *fdf, const gsl_vector * x, double step_size, double tol): This function initializes the minimizer s to minimize the function fdf starting from the initial point x. The size of the first trial step is given by step_size. The accuracy of the line minimization is specified by tol. The precise meaning of this parameter depends on the method used. Typically the line minimization is considered successful if the gradient of the function g is orthogonal to the current search direction p to a relative accuracy of tol, where dot(p,g) < tol |p| |g|.

Function: void gsl_multimin_fdfminimizer_free (gsl_multimin_fdfminimizer *s): This function frees all the memory associated with the minimizer s.

Function: const char * gsl_multimin_fdfminimizer_name (const gsl_multimin_fdfminimizer * s)

This function returns a pointer to the name of the minimizer. For example,

printf("s is a '%s' minimizer\n", 
       gsl_multimin_fdfminimizer_name (s));

would print something like s is a 'conjugate_pr' minimizer.

Providing a function to minimize

You must provide a parametric function of n variables for the minimizers to operate on. You also need to provide a routine which calculates the gradient of the function and a third routine which calculates both the function value and the gradient together. In order to allow for general parameters the functions are defined by the following data type:

Data Type: gsl_multimin_function_fdf

This data type defines a general function of n variables with parameters and the corresponding gradient vector of derivatives,

double (* f) (const gsl_vector * x, void * params): this function should return the result f(x,params) for argument x and parameters params.
int (* df) (const gsl_vector * x, void * params, gsl_vector * g): this function should store the n-dimensional gradient g_i = d f(x,params) / d x_i in the vector g for argument x and parameters params, returning an appropriate error code if the function cannot be computed.
int (* fdf) (const gsl_vector * x, void * params, double * f, gsl_vector * g): This function should set the values of the f and g as above, for arguments x and parameters params. This function provides an optimization of the separate functions for f(x) and g(x) -- it is always faster to compute the function and its derivative at the same time.
size_t n: the dimension of the system, i.e. the number of components of the vectors x.
void * params: a pointer to the parameters of the function.

The following example function defines a simple paraboloid with two parameters,

/* Paraboloid centered on (dp[0],dp[1]) */

double
my_f (const gsl_vector *v, void *params)
{
  double x, y;
  double *dp = (double *)params;
  
  x = gsl_vector_get(v, 0);
  y = gsl_vector_get(v, 1);
 
  return 10.0 * (x - dp[0]) * (x - dp[0]) +
           20.0 * (y - dp[1]) * (y - dp[1]) + 30.0; 
}

/* The gradient of f, df = (df/dx, df/dy). */
void 
my_df (const gsl_vector *v, void *params, 
       gsl_vector *df)
{
  double x, y;
  double *dp = (double *)params;
  
  x = gsl_vector_get(v, 0);
  y = gsl_vector_get(v, 1);
 
  gsl_vector_set(df, 0, 20.0 * (x - dp[0]));
  gsl_vector_set(df, 1, 40.0 * (y - dp[1]));
}

/* Compute both f and df together. */
void 
my_fdf (const gsl_vector *x, void *params, 
        double *f, gsl_vector *df) 
{
  *f = my_f(x, params); 
  my_df(x, params, df);
}

The function can be initialized using the following code,

gsl_multimin_function_fdf my_func;

double p[2] = { 1.0, 2.0 }; /* center at (1,2) */

my_func.f = &my_f;
my_func.df = &my_df;
my_func.fdf = &my_fdf;
my_func.n = 2;
my_func.params = (void *)p;

Iteration

The following function drives the iteration of each algorithm. The function performs one iteration to update the state of the minimizer. The same function works for all minimizers so that different methods can be substituted at runtime without modifications to the code.

Function: int gsl_multimin_fdfminimizer_iterate (gsl_multimin_fdfminimizer *s): These functions perform a single iteration of the minimizer s. If the iteration encounters an unexpected problem then an error code will be returned.

The minimizer maintains a current best estimate of the minimum at all times. This information can be accessed with the following auxiliary functions,

Function: gsl_vector * gsl_multimin_fdfminimizer_x (const gsl_multimin_fdfminimizer * s)
Function: double gsl_multimin_fdfminimizer_minimum (const gsl_multimin_fdfminimizer * s)
Function: gsl_vector * gsl_multimin_fdfminimizer_gradient (const gsl_multimin_fdfminimizer * s): These functions return the current best estimate of the location of the minimum, the value of the function at that point and its gradient, for the minimizer s.

Function: int gsl_multimin_fdfminimizer_restart (gsl_multimin_fdfminimizer *s): This function resets the minimizer s to use the current point as a new starting point.

Stopping Criteria

A minimization procedure should stop when one of the following conditions is true:

A minimum has been found to within the user-specified precision.
A user-specified maximum number of iterations has been reached.
An error has occurred.

The handling of these conditions is under user control. The functions below allow the user to test the precision of the current result.

Function: int gsl_multimin_test_gradient (const gsl_vector * g,double epsabs)

This function tests the norm of the gradient g against the absolute tolerance epsabs. The gradient of a multidimensional function goes to zero at a minimum. The test returns GSL_SUCCESS if the following condition is achieved,

|g| < epsabs

and returns GSL_CONTINUE otherwise. A suitable choice of epsabs can be made from the desired accuracy in the function for small variations in x. The relationship between these quantities is given by \delta f = g \delta x.

Algorithms

There are several minimization methods available. The best choice of algorithm depends on the problem. Each of the algorithms uses the value of the function and its gradient at each evaluation point.

Minimizer: gsl_multimin_fdfminimizer_conjugate_fr: This is the Fletcher-Reeves conjugate gradient algorithm. The conjugate gradient algorithm proceeds as a succession of line minimizations. The sequence of search directions to build up an approximation to the curvature of the function in the neighborhood of the minimum. An initial search direction p is chosen using the gradient and line minimization is carried out in that direction. The accuracy of the line minimization is specified by the parameter tol. At the minimum along this line the function gradient g and the search direction p are orthogonal. The line minimization terminates when dot(p,g) < tol |p| |g|. The search direction is updated using the Fletcher-Reeves formula p' = g' - \beta g where \beta=-|g'|^2/|g|^2, and the line minimization is then repeated for the new search direction.

Minimizer: gsl_multimin_fdfminimizer_conjugate_pr: This is the Polak-Ribiere conjugate gradient algorithm. It is similar to the Fletcher-Reeves method, differing only in the choice of the coefficient \beta. Both methods work well when the evaluation point is close enough to the minimum of the objective function that it is well approximated by a quadratic hypersurface.

Minimizer: gsl_multimin_fdfminimizer_vector_bfgs: This is the vector Broyden-Fletcher-Goldfarb-Shanno conjugate gradient algorithm. It is a quasi-Newton method which builds up an approximation to the second derivatives of the function f using the difference between successive gradient vectors. By combining the first and second derivatives the algorithm is able to take Newton-type steps towards the function minimum, assuming quadratic behavior in that region.

Minimizer: gsl_multimin_fdfminimizer_steepest_descent: The steepest descent algorithm follows the downhill gradient of the function at each step. When a downhill step is successful the step-size is increased by factor of two. If the downhill step leads to a higher function value then the algorithm backtracks and the step size is decreased using the parameter tol. A suitable value of tol for most applications is 0.1. The steepest descent method is inefficient and is included only for demonstration purposes.

Examples

This example program finds the minimum of the paraboloid function defined earlier. The location of the minimum is offset from the origin in x and y, and the function value at the minimum is non-zero. The main program is given below, it requires the example function given earlier in this chapter.

int
main (void)
{
  size_t iter = 0;
  int status;

  const gsl_multimin_fdfminimizer_type *T;
  gsl_multimin_fdfminimizer *s;

  /* Position of the minimum (1,2). */
  double par[2] = { 1.0, 2.0 };

  gsl_vector *x;
  gsl_multimin_function_fdf my_func;

  my_func.f = &my_f;
  my_func.df = &my_df;
  my_func.fdf = &my_fdf;
  my_func.n = 2;
  my_func.params = &par;

  /* Starting point, x = (5,7) */

  x = gsl_vector_alloc (2);
  gsl_vector_set (x, 0, 5.0);
  gsl_vector_set (x, 1, 7.0);

  T = gsl_multimin_fdfminimizer_conjugate_fr;
  s = gsl_multimin_fdfminimizer_alloc (T, 2);

  gsl_multimin_fdfminimizer_set (s, &my_func, x, 0.01, 1e-4);

  do
    {
      iter++;
      status = gsl_multimin_fdfminimizer_iterate (s);

      if (status)
        break;

      status = gsl_multimin_test_gradient (s->gradient, 1e-3);

      if (status == GSL_SUCCESS)
        printf ("Minimum found at:\n");

      printf ("%5d %.5f %.5f %10.5f\n", iter,
              gsl_vector_get (s->x, 0), 
              gsl_vector_get (s->x, 1), 
              s->f);

    }
  while (status == GSL_CONTINUE && iter < 100);

  gsl_multimin_fdfminimizer_free (s);
  gsl_vector_free (x);

  return 0;
}

The initial step-size is chosen as 0.01, a conservative estimate in this case, and the line minimization parameter is set at 0.0001. The program terminates when the norm of the gradient has been reduced below 0.001. The output of the program is shown below,

         x       y         f
    1 4.99629 6.99072  687.84780
    2 4.98886 6.97215  683.55456
    3 4.97400 6.93501  675.01278
    4 4.94429 6.86073  658.10798
    5 4.88487 6.71217  625.01340
    6 4.76602 6.41506  561.68440
    7 4.52833 5.82083  446.46694
    8 4.05295 4.63238  261.79422
    9 3.10219 2.25548   75.49762
   10 2.85185 1.62963   67.03704
   11 2.19088 1.76182   45.31640
   12 0.86892 2.02622   30.18555
Minimum found at:
   13 1.00000 2.00000   30.00000

Note that the algorithm gradually increases the step size as it successfully moves downhill, as can be seen by plotting the successive points.

The conjugate gradient algorithm finds the minimum on its second direction because the function is purely quadratic. Additional iterations would be needed for a more complicated function.

References and Further Reading

A brief description of multidimensional minimization algorithms and further references can be found in the following book,

C.W. Ueberhuber, Numerical Computation (Volume 2), Chapter 14, Section 4.4 "Minimization Methods", p. 325--335, Springer (1997), ISBN 3-540-62057-5.

Go to the first, previous, next, last section, table of contents.