E04 Class
関数リスト一覧   NagLibrary Namespaceへ  ライブラリイントロダクション  本ヘルプドキュメントのchm形式版

An optimization problem involves minimizing a function (called the objective function) of several variables, possibly subject to restrictions on the values of the variables defined by a set of constraint functions. Most methods in the Library are concerned with function minimization only, since the problem of maximizing a given objective function F(x) is equivalent to minimizing -Fx. Some methods allow you to specify whether you are solving a minimization or maximization problem, carrying out the required transformation of the objective function in the latter case.
In general methods in this chapter find a local minimum of a function f, that is a point x* s.t. for all x near x*fxfx*.
The E05 class contains methods to find the global minima of a function f. At a global minimum x* fxfx* for all x.
The H (not in this release) contains methods typically regarded as belonging to the field of operations research.
This introduction is only a brief guide to the subject of optimization designed for the casual user. Anyone with a difficult or protracted problem to solve will find it beneficial to consult a more detailed text, such as Gill et al. (1981) or Fletcher (1987).
If you are unfamiliar with the mathematics of the subject you may find some sections difficult at first reading; if so, you should concentrate on [Types of Optimization Problems], [Geometric Representation and Terminology], [Scaling], [Analysis of Computed Results] and [Recommendations on Choice and Use of Available Methods].

Syntax

C#
public static class E04
Visual Basic (Declaration)
Public NotInheritable Class E04
Visual C++
public ref class E04 abstract sealed
F#
[<AbstractClassAttribute>]
[<SealedAttribute>]
type E04 =  class end

Background to the Problems

Types of Optimization Problems

The solution of optimization problems by a single, all-purpose, method is cumbersome and inefficient. Optimization problems are therefore classified into particular categories, where each category is defined by the properties of the objective and constraint functions, as illustrated by some examples below.
Properties of Objective Function Properties of Constraints
Nonlinear Nonlinear
Sums of squares of nonlinear functions Sparse linear
Quadratic Linear
Sums of squares of linear functions Bounds
Linear None
For instance, a specific problem category involves the minimization of a nonlinear objective function subject to bounds on the variables. In the following sections we define the particular categories of problems that can be solved by methods contained in this chapter. Not every category is given special treatment in the current version of the Library; however, the long-term objective is to provide a comprehensive set of methods to solve problems in all such categories.

Unconstrained minimization

Nonlinear least-squares problems

Minimization subject to bounds on the variables

Minimization subject to linear constraints

Minimization subject to nonlinear constraints

Minimization subject to bounds on the objective function

Multi-objective optimization

Geometric Representation and Terminology

Figure 1 is a contour diagram of Fx. The contours labelled F0,F1,,F4 are isovalue contours, or lines along which the function Fx takes specific constant values. The point x*=12,-1T is a local unconstrained minimum, that is, the value of Fx* (=0) is less than at all the neighbouring points. A function may have several such minima. The lowest of the local minima is termed a global minimum. In the problem illustrated in Figure 1, x* is the only local minimum. The point x- is said to be a saddle point because it is a minimum along the line AB, but a maximum along CD.
If we add the constraint x10 (a simple bound) to the problem of minimizing Fx, the solution remains unaltered. In Figure 1 this constraint is represented by the straight line passing through x1=0, and the shading on the line indicates the unacceptable region (i.e., x1<0). The region in Rn satisfying the constraints of an optimization problem is termed the feasible region. A point satisfying the constraints is defined as a feasible point.
If we add the nonlinear constraint c1x:x1+x2-x1x2-320, represented by the curved shaded line in Figure 1, then x* is not a feasible point because c1x*<0. The solution of the new constrained problem is x^1.1825,-1.7397T, the feasible point with the smallest function value (where Fx^3.0607).

Gradient vector

The vector of first partial derivatives of Fx is called the gradient vector, and is denoted by gx, i.e.,
gx= Fx x1 , Fx x2 ,, Fx xn T.
For the function illustrated in
Figure 1,
gx= Fx+ex18x1+ 4x2 ex1 4x2+ 4x1+ 2 .
The gradient vector is of importance in optimization because it must be zero at an unconstrained minimum of any function with continuous first derivatives.

Hessian matrix

Jacobian matrix; matrix of constraint normals

Sufficient Conditions for a Solution

All nonlinear functions will be assumed to have continuous second derivatives in the neighbourhood of the solution.

Unconstrained minimization

Minimization subject to bounds on the variables

Linearly-constrained minimization

For the sake of simplicity, the following description does not include a specific treatment of bounds or range constraints, since the results for general linear inequality constraints can be applied directly to these cases.
At a solution x*, of a linearly-constrained problem, the constraints which hold as equalities are called the active or binding constraints. Assume that there are t active constraints at the solution x*, and let A^ denote the matrix whose columns are the columns of A corresponding to the active constraints, with b^ the vector similarly obtained from b; then
A^Tx*=b^.
The matrix Z is defined as an n×n-t matrix satisfying:
A^TZ=0; ZTZ=I.
The columns of Z form an orthogonal basis for the set of vectors orthogonal to the columns of A^.
Define
  • gZx=ZTgx, the projected gradient vector of Fx;
  • GZx=ZTGxZ, the projected Hessian matrix of Fx.
At the solution of a linearly-constrained problem, the projected gradient vector must be zero, which implies that the gradient vector gx* can be written as a linear combination of the columns of A^, i.e., gx*=i=1tλi*a^i=A^λ*. The scalar λi* is defined as the Lagrange multiplier corresponding to the ith active constraint. A simple interpretation of the ith Lagrange multiplier is that it gives the gradient of Fx along the ith active constraint normal; a convenient definition of the Lagrange multiplier vector (although not a recommended method for computation) is:
λ*=A^TA^-1A^Tgx*.
Sufficient conditions for x* to be the solution of a linearly-constrained problem are:
(i) x* is feasible, and A^Tx*=b^; and
(ii) gZx*=0, or equivalently, gx*=A^λ*; and
(iii) GZx* is positive-definite; and
(iv) λi*>0 if λi* corresponds to a constraint a^iT x* b^i ;
λi*<0 if λi* corresponds to a constraint a^iT x* b^i .
The sign of λi* is immaterial for equality constraints, which by definition are always active.

Nonlinearly-constrained minimization

For nonlinearly-constrained problems, much of the terminology is defined exactly as in the linearly-constrained case. The set of active constraints at x again means the set of constraints that hold as equalities at x, with corresponding definitions of c^ and A^: the vector c^x contains the active constraint functions, and the columns of A^x are the gradient vectors of the active constraints. As before, Z is defined in terms of A^x as a matrix such that:
A^TZ=0; ZTZ=I
where the dependence on x has been suppressed for compactness.
The projected gradient vector gZx is the vector ZTgx. At the solution x* of a nonlinearly-constrained problem, the projected gradient must be zero, which implies the existence of Lagrange multipliers corresponding to the active constraints, i.e., gx*=A^x*λ*.
The Lagrangian function is given by:
Lx,λ=Fx-λTc^x.
We define gLx as the gradient of the Lagrangian function; GLx as its Hessian matrix, and G^Lx as its projected Hessian matrix, i.e., G^L=ZTGLZ.
Sufficient conditions for x* to be the solution of a nonlinearly-constrained problem are:
(i) x* is feasible, and c^x*=0; and
(ii) gZx*=0, or, equivalently, gx*=A^x*λ*; and
(iii) G^Lx* is positive-definite; and
(iv) λi*>0 if λi* corresponds to a constraint of the form c^i0.
The sign of λi* is immaterial for equality constraints, which by definition are always active.
Note that condition (ii) implies that the projected gradient of the Lagrangian function must also be zero at x*, since the application of ZT annihilates the matrix A^x*.

Background to Optimization Methods

Most of the methods construct a sequence x k  satisfying:
x k+1 =x k +α k p k ,
where the vector p k  is termed the direction of search, and α k  is the steplength. The steplength α k  is chosen so that Fx k+1 <Fx k  and is computed using one of the techniques for one-dimensional optimization referred to in [One-dimensional optimization].

One-dimensional optimization

Methods for unconstrained optimization

The distinctions among methods arise primarily from the need to use varying levels of information about derivatives of Fx in defining the search direction. We describe three basic approaches to unconstrained problems, which may be extended to other problem categories. Since a full description of the methods would fill several volumes, the discussion here can do little more than allude to the processes involved, and direct you to other sources for a full explanation.
(a) Newton-type Methods (Modified Newton Methods)
Newton-type methods use the Hessian matrix Gx k , or a finite-difference approximation to Gx k , to define the search direction. The methods in the Library either require a method that computes the elements of Gx k  directly, or they approximate Gx k  by finite-differences.
Newton-type methods are the most powerful methods available for general problems and will find the minimum of a quadratic function in one iteration. See Sections 4.4 and 4.5.1 of Gill et al. (1981).
(b) Quasi-Newton Methods
Quasi-Newton methods approximate the Hessian Gxk by a matrix Bk which is modified at each iteration to include information obtained about the curvature of F along the current search direction pk. Although not as robust as Newton-type methods, quasi-Newton methods can be more efficient because Gxk is not computed directly, or approximated by finite-differences. Quasi-Newton methods minimize a quadratic function in n iterations, where n is the number of variables. See Section 4.5.2 of Gill et al. (1981).
(c) Conjugate-Gradient Methods
Unlike Newton-type and quasi-Newton methods, conjugate-gradient methods do not require the storage of an n by n matrix and so are ideally suited to solve large problems. Conjugate-gradient type methods are not usually as reliable or efficient as Newton-type, or quasi-Newton methods. See Section 4.8.3 of Gill et al. (1981).

Methods for nonlinear least-squares problems

Methods for handling constraints

Methods for handling multi-objective optimization

Suppose we have objective functions fxx, i>1, all of which we need to minimize at the same time. There are two main approaches to this problem:
(a) Combine the individual objectives into one composite objective. Typically this might be a weighted sum of the objectives, e.g.,
w1 f1x + w2 f2x + + wn fnx
Here you choose the weights to express the relative importance of the corresponding objective. Ideally each of the fxx should be of comparable size at a solution.
(b) Order the objectives in order of importance to you. Suppose fi are ordered s.t. fxx is more important than fi+1x, for i=1,,2,,n-1. Then in the lexicographical approach to multi-objective optimization a sequence of subproblems are solved. Firstly solve the problem for objective function f1x and denote by r1 the value of this minimum. If i-1 subproblems have been solved with results ri-1 then subproblem i becomes minfx,x subject to rkfkxrk, for k=1,2,,i-1 plus the other constraints.
Clearly the bounds on fk might be relaxed at your discretion.
In general, ifnAG methods from the E04 class are used then only local minima are found. This means that a better solution to an individual objective might be found without worsening the optimal solutions to the other objectives. Ideally you seek a Pareto solution; one in which an improvement in one objective can only be achieved by a worsening of another objective.
To obtain a Pareto solution methods from E05 class might be used or, alternatively, a pragmatic attempt to derive a global minimum might be tried. In this approach a variety of different minima are computed for each subproblem by starting from a range of different starting points. The best solution achieved is taken to be the global minimum. The more starting points chosen the greater confidence you might have in the computed global minimum.

Scaling

Scaling (in a broadly defined sense) often has a significant influence on the performance of optimization methods. Since convergence tolerances and other criteria are necessarily based on an implicit definition of ‘small’ and ‘large’, problems with unusual or unbalanced scaling may cause difficulties for some algorithms. Although there are currently no user-callable scaling methods in the Library, scaling is automatically performed by default in the methods which solve sparse LP, QP or NLP problems and in some newer dense solver methods. The following sections present some general comments on problem scaling.

Transformation of variables

One method of scaling is to transform the variables from their original representation, which may reflect the physical nature of the problem, to variables that have certain desirable properties in terms of optimization. It is generally helpful for the following conditions to be satisfied:
(i) the variables are all of similar magnitude in the region of interest;
(ii) a fixed change in any of the variables results in similar changes in Fx. Ideally, a unit change in any variable produces a unit change in Fx;
(iii) the variables are transformed so as to avoid cancellation error in the evaluation of Fx.
Normally, you should restrict yourself to linear transformations of variables, although occasionally nonlinear transformations are possible. The most common such transformation (and often the most appropriate) is of the form
xnew=Dxold,
where D is a diagonal matrix with constant coefficients. Our experience suggests that more use should be made of the transformation
xnew=Dxold+v,
where v is a constant vector.
Consider, for example, a problem in which the variable x3 represents the position of the peak of a Gaussian curve to be fitted to data for which the extreme values are 150 and 170; therefore x3 is known to lie in the range 150170. One possible scaling would be to define a new variable x-3, given by
x-3=x3170.
A better transformation, however, is given by defining x-3 as
x-3=x3-16010.
Frequently, an improvement in the accuracy of evaluation of Fx can result if the variables are scaled before the methods to evaluate Fx are coded. For instance, in the above problem just mentioned of Gaussian curve-fitting, x3 may always occur in terms of the form x3-xm, where xm is a constant representing the mean peak position.

Scaling the objective function

Scaling the constraints

Analysis of Computed Results

Convergence criteria

Checking results

Monitoring progress

Confidence intervals for least-squares solutions

When estimates of the parameters in a nonlinear least-squares problem have been found, it may be necessary to estimate the variances of the parameters and the fitted function. These can be calculated from the Hessian of Fx at the solution.
In many least-squares problems, the Hessian is adequately approximated at the solution by G=2JTJ (see [Methods for nonlinear least-squares problems]). The Jacobian, J, or a factorization of J is returned by all the comprehensive least-squares methods and, in addition, a method is available in the Library to estimate variances of the parameters following the use of most of the nonlinear least-squares methods, in the case that G=2JTJ is an adequate approximation.
Let H be the inverse of G, and S be the sum of squares, both calculated at the solution x-; an unbiased estimate of the variance of the ith parameter xi is
varx-i=2S m-n Hii
and an unbiased estimate of the covariance of x-i and x-j is
covarx-i,x-j=2S m-n Hij.
If x* is the true solution, then the 1001-β% confidence interval on x- is
x-i-varx-i. t1-β/2,m-n<xi*<x-i+varx-i.t1-β/2,m-n,  i=1,2,,n
where t1-β/2,m-n is the 1001-β/2 percentage point of the t-distribution with m-n degrees of freedom.
In the majority of problems, the residuals fi, for i=1,2,,m, contain the difference between the values of a model function ϕz,x calculated for m different values of the independent variable z, and the corresponding observed values at these points. The minimization process determines the parameters, or constants x, of the fitted function ϕz,x. For any value, z-, of the independent variable z, an unbiased estimate of the variance of ϕ is
varϕ=2S m-n i=1n j=1n ϕ xi z- ϕ xj z- Hij.
The 1001-β% confidence interval on F at the point z- is
ϕz-,x--varϕ.tβ/2,m-n<ϕz-,x*<ϕz-,x-+varϕ.tβ/2,m-n.
For further details on the analysis of least-squares solutions see Bard (1974) and Wolberg (1967).

Recommendations on Choice and Use of Available Methods

The choice of method depends on several factors: the type of problem (unconstrained, etc.); the level of derivative information available (function values only, etc.); your experience (there are easy-to-use versions of some methods); whether or not storage is a problem; whether or not the method is to be used in a multithreaded environment; and whether computational time has a high priority. Not all choices are catered for in the current version of the Library.

Easy-to-use and Comprehensive Methods

Reverse Communication Methods

Most of the methods in this chapter are called just once in order to compute the minimum of a given objective function subject to a set of constraints on the variables. The objective function and nonlinear constraints (if any) are specified by you and written as methods to a very rigid format described in the relevant method document. Such methods usually appear in the argument list of the minimization method.
For the majority of applications this is the simplest and most convenient usage. Sometimes however this approach can be restrictive:
(i) when the required format of the method does not allow useful information to be passed conveniently to and from your calling program;
(ii) when the minimization method is being called from another computer language, such as Visual Basic, which does not fully support procedure arguments in a way that is compatible with the Library.
A way around these problems is to utilize reverse communication methods. Instead of performing complete optimizations, these methods perform one step in the solution process before returning to the calling program with an appropriate flag (irevcm) set. The value of irevcm determines whether the minimization process has finished or whether fresh information is required. In the latter case you calculate this information (in the form of a vector or as a scalar, as appropriate) and re-enter the reverse communication method with the information contained in appropriate arguments. Thus you have the responsibility for providing the iterative loop in the minimization process, but as compensation, you have an extremely flexible and basic user-interface to the reverse communication method.
The only reverse communication methods in this chapter are e04uf, which solve dense NLP problems using a sequential quadratic programming method.

Service Methods

One of the most common errors in the use of optimization methods is that user-supplied delegates do not evaluate the relevant partial derivatives correctly. Because exact gradient information normally enhances efficiency in all areas of optimization, you are encouraged to provide analytical derivatives whenever possible. However, mistakes in the computation of derivatives can result in serious and obscure run-time errors. Consequently, service methods are provided to perform an elementary check on the gradients you supplied. These methods are inexpensive to use in terms of the number of calls they require to user-supplied delegates.
The appropriate checking methods are as follows:
Minimization method Checking method(s)
e04kd e04hc
e04lb e04hc and e04hd
(e04gb not in this release) e04ya
e04gd e04ya
e04he e04ya and e04yb
It should be noted that methods
e04uf, e04us, e04vh and e04wd each incorporate a check on the gradients being supplied. This involves verifying the gradients at the first point that satisfies the linear constraints and bounds. There is also an option to perform a more reliable (but more expensive) check on the individual gradient elements being supplied. Note that the checks are not infallible.
A second type of service method computes a set of finite-differences to be used when approximating first derivatives. Such differences are required as input parameters by some methods that use only function evaluations.
(e04yc not in this release) estimates selected elements of the variance-covariance matrix for the computed regression parameters following the use of a nonlinear least-squares method.
e04xa estimates the gradient and Hessian of a function at a point, given a method to calculate function values only, or estimates the Hessian of a function at a point, given a method to calculate function and gradient values.
e04zc checks that user-supplied delegates for evaluating an objective function, constraint functions and their first derivatives produce derivative values which are consistent with the function and constraint values calculated.

Function Evaluations at Infeasible Points

Choosing Between Variant Methods for Some Problems

As evidenced by the wide variety of methods available in E04 class, it is clear that no single algorithm can solve all optimization problems. It is important to try to match the problem to the most suitable method, and that is what the decision trees in [Decision Trees] help to do.
Sometimes in E04 class more than one method is available to solve precisely the same minimization problem. Thus, for example, the general nonlinear programming methods e04uc and e04wd have very similar argument lists and are based on similar methods. Experience shows that although both methods can usually solve the same problem and get similar results, sometimes one method will be faster, sometimes one might find a different local minimum to the other, or, in difficult cases, one method may obtain a solution when the other one fails.
After using one of these methods, if the results obtained are unacceptable for some reason, it may be worthwhile trying the other method instead. In the absence of any other information, in the first instance you are recommended to try using e04uc, and if that proves unsatisfactory, try using e04wd. Although the algorithms used are very similar, the two methods each have slightly different optional arguments which may allow the course of the computation to be altered in different ways.
Other pairs of methods which solve the same kind of problem are e04nk or e04nq, for sparse quadratic or linear programming problems, and e04ug or e04vh, for sparse nonlinear programming. In these cases the argument lists are not so similar as e04uc or e04wd, but the same considerations apply.

Decision Trees

Tree 1: Selection chart for unconstrained problems

Only one variable? _
yes
Are first derivatives available? _
yes
e04bb
| no
|
| e04ab
no
|
Does the function have many discontinuities? _
yes
e04cb
no
|
Is store size a problem? _
yes
e04dg
no
|
Is the function a sum of squares? _
yes
Are you an experienced user? _
yes
Are first derivatives available? _
yes
Are second derivatives available? _
yes
e04he
| | | no
|
| | | Are there more than ten variables? _
yes
(e04gb not in this release)
| | | no
|
| | | e04gd
| | no
|
| | e04fc
| no
|
| Are first derivatives available? _
yes
Are second derivatives available? _
yes
e04hy
| | no
|
| | Are there more than ten variables? _
yes
e04gy
| | no
|
| | e04gz
| no
|
| e04fy
no
|
Are you an experienced user? _
yes
Are first derivatives available? _
yes
Are second derivatives available? _
yes
e04lb
| | no
|
| | Is computational cost critical? _
yes
e04uc, e04uf, e04ug, e04vh or e04wd
| | no
|
| | e04kd
| no
|
| e04uc, e04uf, e04ug, e04vh or e04wd
no
|
Are first derivatives available? _
yes
Are second derivatives available? _
yes
e04ly
| no
|
| Is computational cost critical? _
yes
e04ky
| no
|
| e04kz
no
|
e04jy

Tree 2: Selection chart for bound-constrained, linearly-constrained and nonlinearly-constrained problems

Are there any nonlinear constraints? _
yes
Is the objective function a sum of squares? (A least-squares problem) _
yes
e04us
| no
|
| Are the constraints sparse? _
yes
e04ug or e04vh
| no
|
| e04uc, e04uf or e04wd
no
|
Is the objective function linear? (An LP problem) _
yes
See Tree 3
no
|
Is the objective function quadratic? (A QP or least-squares problem) _
yes
Is the problem a least-squares problem? _
yes
e04nc
| no
|
| See Tree 4
no
|
Is the objective function a sum of squares? (A least-squares problem) _
yes
e04us
no
|
Are the constraints simple bounds? _
yes
Are you an experienced user? _
yes
Are first derivatives available? _
yes
Are second derivatives available? _
yes
e04lb
| | | no
|
| | | e04kd, e04uc, e04uf, e04ug or e04vh
| | no
|
| | e04uc, e04uf, e04ug, e04vh or e04wd
| no
|
| Are first derivatives available? _
yes
Are second derivatives available? _
yes
e04ly
| | no
|
| | Is computational cost critical? _
yes
e04ky
| | no
|
| | e04kz
| no
|
| e04jy
no
|
e04uc, e04uf, e04ug or e04vh

Tree 3: Linear programming

Is the objective function linear (an LP problem) and is the linear constraint matrix sparse? _
yes
e04nk, e04nq, e04ug or e04vh
no
|
e04mf

Tree 4: Quadratic programming

Is the linear constraint matrix sparse? _
yes
e04nk, e04nq, e04ug or e04vh
no
|
Is the problem a convex QP problem? _
yes
e04nc
no
|
(e04nf not in this release)

References

Inheritance Hierarchy

See Also