\part{Nonlinear programming}

\chapter{Unconstrained}

For LPs, using the theory of convex analysis and linear algebra, we have the simplex method; an algorithm that derives \emph{the} solution for that optimization problem. We can them preoccupy ourselves about reducing time complexity and understanding how 'continuously' changing the LP changes feasible regions and optimal solutions (sensitivity analysis).

A Nonlinear Program (NLP) is an optimization problem that is not an LP; maybe it has a nonlinear OF, or one, some or all constraints are nonlinear.  Basically, such a problem is not an LP. 

As one can imagine, this complicates matters imensely. LPs don't vary much in behaviour, but NLPs cover an extremely general range of problems; this variation between types of NLPs means that a 'one-size-fits-all' solution like the simplex method is not going to happen.

Real analysis, convex analysis, vector analysis, linear algebra and many other branches offer some critical theory in developing methods to solve NLPs, but alas, since generic functions are so diverse in behaviour we often resort to numerical methods to approximate the solution. Even so, this all assumes that the functions we're dealing with have some 'nice' properties like continuity and differentiability!

So we concede a little bit; we're going to conduct analysis on NLPs that are \emph{unconstrained}, that is, their feasible regions are intervals, rectangles, entire spaces, and other types of nice convex sets.

- define NLPs as optimization problems as $(x,f,g)$ Where $X = \{ \mathbf{x} : \forall i \in \mathbb{N}\cap [1,n] a_i (\mathbf{x}) \leq \mathbf{0} \} $
Where at least one of the $a_i$ or $f$ isa nonlinear function. Actually, a formal definition for NLPs isn't really necessary; it's just an optimization problem that isn't an LP.

This chapter will look specifically at problems $(X,f,g)$ Where $X = \mathbb{R}^n$ and $f : \mathbb{R}^n \to \mathbb{R}$ is a twice (totally) differentiable function


\section{Theoretical basis}


Like with LPs, we draw our theory from branches of mathematical analysis. For single variate real functions, the familiar derivative test provides the conditions when a differentiable function is at a local minimum.

\begin{proposition}[Derivative test]
If $f'(x)=0$ and $f''(x) > 0$, then $x$ is a local minimum of $f$.
\end{proposition}


A similar test exists for multivariate real functions.

Recall the positive definite property of a matrix; note that we can verify that a matrix is positive definite by assuring that all eigenvalues are positive.

\begin{proposition}[Multivariate derivative test]
If a function $f$ with continuous second derivatives has $\nabla f(\mathbf{x}_0)=\mathbf{0}$ and $\mathbf{H}_f (\mathbf{x}_0)$ is positive definite, then $\mathrm{x}_0$ is a local minimum of $f$.
\end{proposition}

Some results from convex analysis can help us deduce which local minima are also global minima.

\begin{proposition}
If $f$ is convex and $\mathrm{dom}(f)$ is a convex set, any local minimum of $f$ is a global minimum.
\end{proposition}


\section{Derivative tests}

\begin{proposition}[Derivative test]
If a real function $f$ with continuous second derivatives has $f'(x_0)=0$ and $f''(x_0) > 0$, then $x_0$ is a local minimum of $f$.
\end{proposition}


\begin{proposition}[Multivariate derivative test]
If a real multivariate function $f$ with continuous second derivatives has $\nabla f(\mathbf{x}_0)=\mathbf{0}$ and $\mathbf{H}_f (\mathbf{x}_0)$ is positive definite, then $\mathrm{x}_0$ is a local minimum of $f$.
\end{proposition}


\begin{proposition}
Let $f$ have continuous second-order partial derivatives on $U$. Then $f$ is convex on $U$ iff its Hessian matrix $\mathbf{H}_{f}$ is positive semi-definite.
\end{proposition}

\begin{proposition}
Let $f$ have continuous second-order partial derivatives on $U$. If the Hessian matrix $\mathbf{H}_{f}$ is positive definite on $U$, then $f$ is convex on $U$.
\end{proposition}


\section{Gradient methods}

We now have sufficient theory to develop the following numerical methods that are applicable to many differentiable real functions.

\subsection{Steepest descent method}


\begin{itemize}
	\item $\varepsilon$ is the tolerance
	\item $f$ is the multivariate differentiable function to be minimized
\end{itemize}
\begin{algorithm}
\begin{algorithmic}
	\WHILE{$ \| \nabla f(\mathbf{x}) \| < \varepsilon $}
	\STATE $\mathbf{d} \gets -\nabla f(\mathbf{x})$
	\STATE $t_0 \gets \mathrm{argmin}_{t > 0} f(\mathbf{x}+ t \mathbf{d})$
	\STATE $\mathbf{x} \gets \mathbf{x} + t_0 \mathbf{d}$
	\ENDWHILE
\end{algorithmic}
\end{algorithm}


\subsection{Newton's method}

\begin{itemize}
	\item $\varepsilon$ is the tolerance
	\item $f$ is the multivariate differentiable function to be minimized
\end{itemize}
\begin{algorithm}
\begin{algorithmic}
	\WHILE{$\| \nabla f(\mathbf{x}) \| < \varepsilon$}
	\STATE $\mathbf{x} \gets \mathbf{x} - \mathbf{H}(f(\mathbf{x}))^{-1} ( \nabla f(\mathbf{x}))$
	\ENDWHILE
\end{algorithmic}
\end{algorithm}


\chapter{Constrained}

This chapter will instead stuy problems of the form $(X,f,g)$, where $f : \mathbb{R}^n \to \mathbb{R}$ is a twice (totally) differentiable function, although $X=\{ \mathbf{x} \in \mathbb{R}^n : \mathbf{A}\mathbf{x} = \mathbf{b} \}$

\section{Reduced function}

These types of problems can be converted to unconstrained NLPs.


\subsection{Single linear condition}
\subsection{Generalization to multiple linear conditions}
\[ \mathbf{x} = \mathbf{x}_{\mathbf{b}} + \mathbf{x}_{\mathbf{N}}\]


\[ \mathbf{x} = \mathbf{B}^{-1} ( \mathbf{x}_{\mathbf{B}} + \mathbf{x}_{\mathbf{N}} ) \]
\[ \mathbf{x} = \overline{\mathbf{x}} +\mathbf{Z}\mathbf{x}_{\mathbf{N}} \]

\[ \phi (\mathbf{x}_{\mathbf{N}})= f(\overline{\mathbf{x}} +\mathbf{Z}\mathbf{x}_{\mathbf{N}}) \]
This is the \emph{reduced function of $f$}, notice that $\mathbf{Z}$ is a basis for the kernel of $\mathbf{A}$.
We realize that the chain rule permits a simplified form of the gradient and Hessian of the reduced function in terms of the original function. 

\[ \nabla \phi( \mathbf{x}_{\mathbf{N}}) = \mathbf{Z}^{\intercal} \nabla f(\mathbf{x}) \]


\[ \mathbf{H}( \phi( \mathbf{x}_{\mathbf{N}})) = \mathbf{Z}^{\intercal} \mathbf{H}( f(\mathbf{x})) \mathbf{Z} \]

Using these, so long as we can calulate $\mathbf{Z}$, our constrained NLP becomes an unconstrained NLP; from here we can apply numerical methods and derivative tests.
\section{Lagrangian relaxation method}

\[\mathcal{L}_{f}(\mathbf{x},\lambda) = f(\mathbf{x}) + \sum^{m}_{i=1} \lambda_i (b_i - g_i (\mathbf{x})) \]

One can prove that optimizing the Lagrangian $\mathcal{L}_{f}$ also optimizes $f$, hence we have again mapped a constrained NLP to an unconstrained NLP.

Since stationary points occur when $\nabla \mathcal{L}_{f} (\mathbf{x}) = \mathbf{0}$, stationary points of the Lagrangian occur on the following condition.

\[ \nabla f(\mathbf{x} = - \sum^{m}_{i=1} \lambda_i \nabla g_i (\mathbf{x}) \]

The following proposition helps us determine the nature of stationary points when using the Lagrangian on a convex function.

\begin{proposition}
If $f$ is convex, then the optimal values of $\mathcal{L}_{f}$ are minima.
\end{proposition}


\section{Karush-Kuhn-Tucker conditions}