Optimization is the process of finding parameters that minimize or maximize a certain criterion; the convention in optimization literature is to treat all problems as minimization problems (a maximization problem can easily be turned into a minimization problem by flipping the sign of the criterion).

There are several families of methods that perform optimization. One popular class of these is represented by methods that use the gradient (or the higher derivatives) of the criterion. Given that they have this information to work with, these methods tend to be quite effective. However, since gradient provides local information, these methods are only able to discover local minima and not global minima. This is, of course, sufficient for convex functions which have no local minima and for some other classes of functions, where local minima are not significantly inferior to the global minimum. In other cases, or if the gradient of the criterion is not available, one might need to resort to other approaches.

Another class of optimization methods, is that of metaheuristic methods. These methods do not offer a lot of guarantees, but they can often find reasonably good solutions in a reasonable amount of time. Each of them is based on some general heuristic principles, e.g. on a biologically-inspired concept such as the survival of the fittest, genetics, the behaviour of swarms and more. Many of them operate on populations of candidate solutions, which gives them a better chance of discovering the global minimum, while also making them good candidates for performing multi-objective optimization.

Finally, there is a large number of other optimization methods that are relevant to the area of artificial intelligence and machine learning such as the entire sub-area of discrete optimization methods, which deal with problems, where the parameters are discrete numbers.