\part{Fundamentals}

\chapter{Real Multivariate and vector functions}


\section{Multivariate functions}

The goal of this chapter is to translate knowledge from real analysis to higher dimensions of Euclidean spaces. Real analysis studied functions of the form $f : \mathbb{R} \to \mathbb{R}$, however we now want to consider $f : \mathbb{R}^m \to \mathbb{R}^n$; we'll be looking at functions between real Euclidean spaces, possibly with different dimensions. One consequence is that the absolute value as a metric generalizes to the norm.

We will draw extensively from ideas in linear algebra, and we will consider topological consequences of considering functions in multidimensional spaces.

It's worth mentioning that many of the ideas addressed in this book are standards for any multidimensional generalization of analysis. We first look at three types of functions that are used across all types of multidimenional analysis.

\begin{definition}[vector valued function]
A \emph{vector valued function}  is a function $\mathbf{r} : X^n \to \mathbb{R}^m$ that maps elements of a space to a Euclidean space.
\end{definition}

There are 2 particular type of vector valued functions that are particularly important due to their commonplace in physics.

\begin{definition}[Vector field]
A \emph{vector field} is a vector valued function $\mathbf{f} : X^n \to \mathbb{R}^n$ that assigns each vector in some space $X^n$ to a vector in a real space of the same dimension. It is a vector valued function where the domain and image spaces have the same dimension.
\end{definition}


\begin{definition}[Scalar field]
A \emph{scalar field} is a vector valued function $f : X^n \to \mathbb{R}$ that assigns a real number to each pont in some space $X^n$.
\end{definition}


Though these definitions apply to functions whose domains are arbitrary spaces $X$, this book will only \emph{vector valued functions with Euclidean domains} $f : \mathbb{R}^m \to \mathbb{R}^n$, so the following variants of these 3 definitions will be frequent objects of interest within this book.

\begin{itemize}
\item Real multivariate functions $f : \mathbb{R}^n \to \mathbb{R}, n > 1$; scalar fields with Euclidean domain.
\item Real vector functions $\mathbf{f} : \mathbb{R}^m \to \mathbb{R}^n, m,n > 1$; vector valued functions with Euclidean domain.
\item Real vector fields $\mathbf{f} : \mathbb{R}^n \to \mathbb{R}^n, n > 1$; vector fields with Euclidean domain.
\end{itemize}

There are special classes within these functions, such as \emph{differentiable curves and surfaces}; we'll delve into this theory soon.


\begin{example}
The following function $f : \mathbb{R}^2 \to \mathbb{R}$ is an example of real multivariate function.
\[ f(x,y)=  x^2 -y^2 \]
\end{example}
\begin{example}
The following function $\mathbf{r} : [0,6\pi] \to \mathbb{R}^3$ is an example of real vector function (it's also a 'differentiable curve') that maps a single variable to a $\mathbb{R}^2$ vector. This is a helix that makes 3 revolutions.
\[ \mathbf{r}(t)= \begin{bmatrix} \cos (t) \\ \sin (t) \\ t \end{bmatrix}\]
\end{example}
\begin{example}
The following function $\mathbf{f} : \mathbb{R}^2 \to \mathbb{R}^2$ is an example of a real vector field. It returns a vector that is orthogonal to the input vector.
\[ \mathbf{f}(\mathbf{x})= \begin{bmatrix} \mathbf{x}_2 \\ -\mathbf{x}_1  \end{bmatrix} = \mathbf{x}_2 \mathbf{i} -\mathbf{x}_1 \mathbf{j} \]
\end{example}


\section{Multivariate limits}
In terms of theory and rigor, limits are defined in an extremely similar way to real analysis. The primary difference is that in real analysis, it suffices to check that both the left and right limits exist and are equal. In higher dimensions, there are an uncountably infinite directions in which a limit may be approached. 


When going from $\mathbb{R}$ to $\mathbb{R}^n$, we have an analogous definition of a continuous function.

\begin{definition}[Limit in $\mathbb{R}^n$]
For a real vector function $\mathbf{f}$, its \emph{limit at $\mathbf{p}$} is a vector $\mathbf{L}$ such that for any positive $\varepsilon$, we can find a positive $\delta$ so that whenever $\|\mathbf{x}-\mathbf{p} \| < \delta$ we have  $\| \mathbf{f}(\mathbf{x}) -\mathbf{L} \| < \varepsilon$. Basically, as $\mathbf{x}$ converges to $\mathbf{p}$, $\mathbf{f}(\mathbf{x})$ converges to $\mathbf{L}$.

\[ \lim_{\mathbf{x} \to \mathbf{p}} \mathbf{f}(\mathbf{x}) = \mathbf{L} \iff \forall \varepsilon \in (0,\infty) ( \exists \delta \in (0,\infty) [ \| \mathbf{x}-\mathbf{p}\| < \delta \implies \| \mathbf{f}(\mathbf{x})-\mathbf{L} \| < \varepsilon ] )\]
\end{definition}

The difference is the use of the norm rather than the absolute value, which makes sense now that we need to deal with vectors rather than scalars. The real question is how does this change the way we need to think about limits in multidimensional spaces?

Given some $x$, there are only 2 $y$ such that $|x-y|=c$, and they are $y=x \pm c$. However given some $\mathbf{x}$, there are uncountably infinite $\mathbf{y}$ such that $\|\mathbf{x}-\mathbf{y}\|=\mathbf{v}$. This is because in $\mathbb{R}$  we have the luxury of considering limits from either the left or right and a the function has a limit at a point when both notions agree. This doesn't generalize to $\mathbf{R}^n$ since there are countably infinite directions in which a a limit of a point may be approached; given a point, one must consider all elements within a given radius of that point.


However a special case for the limit of $\mathbf{f}$ at $\mathbf{p}$ is when we have a polar (or spherical) parametrization for $\mathbf{f}(\mathbf{x}-\mathbf{p})$ that is purely radial; since all elements that satisfy $\|\mathbf{x} - \mathbf{p}\| < \varepsilon$ satisfy $|r| < \varepsilon$ and vice versa, allowing us to transform our question in terms of real analysis!

When this isn't the case, we must resort to other means of bounding $\|\mathbf{f}(\mathbf{x}) - \mathbf{L}\|$.


\begin{proposition}
for real vector functions limits areclosed under
	addition
	subtraction
	scalar multiplication	
	dot product
	cross product
	norm
\end{proposition}


\section{Multivariate continuity}

Once we establish limits, we can define continuity in an analogous way to single variate real functions.

\begin{proposition}[Continuous vector valued function]
\[ \mathbf{f} \text{ is continuous at } \mathbf{x}_0 \iff lim_{\mathbf{x} \to\mathbf{x}_0} \mathbf{f}(\mathbf{x}) = \mathbf{f}(\mathbf{x}_0)  \]
\end{proposition}

\begin{proposition}
Let $\mathbf{f} : \mathbb{R}^m \to \mathbb{R}^n$. $\mathbf{f}$ is continuous at $\mathbf{x}_0$ iff each component function $\mathbf{f}_{i}$ continuous at $\mathbf{x}_{0}$.
\end{proposition}


\begin{proposition}
for real vector functions continuity is closed under
	addition
	subtraction
	multiplication
	composition
\end{proposition}


\section{Multivariate differentiation}


In real analysis, the derivative is a tool that describes the 'tangent' or 'best linear approximator' at a point of a function.

With one variable, linear functions are of the form $T(x)=ax$, however multivariate functions have linear transforms of the form $T(\mathbf{x}) = \mathbf{A} \mathbf{x}$; ultimately we'd like some notion of a derivative to revolve around the existence of such a matrix that serves as the best linear approximant.

Before tackling this goal, let's look at some.
For multivariate Functions, there are several directions to approach domain elements by and the function may increase at different rates for different directions taken; tangents in one direction may not be tangents in another. 

One idea is to only consider the derivative along a specific direction to a domain element; this is what \emph{directional derivatives and partial derivatives} strive to do, and it will prove an adequate stating point for now.


\subsection{Directional derivative}


As we have seen, $\mathbb{R}^n$ has a different topological behaviour of limits; with an infinite way of approaching them rather than just 2 like in $\mathbb{R}$. Fortunately, the stucture of the linear space gives us a way to codify direction along a vector.


For scalar fields, we can translate the derivative from real analysis by considering the derviatives at a point with respect to a specific direction specified by a unit vector. These are the \emph{directional derivatives}.
\begin{definition}[Directional derivarive of $\mathbf{f}$ ]
Let $\mathbf{f}: \mathbb{R}^m \to \mathbb{R}$  be a real vector function and $\mathbf{u}$ be a unit vector, the \emph{directional derivative of $\mathbf{f}$ at $\mathbf{x}_0$} is defined as the following. It represents the derivative in the sense of real analysis along $\mathbf{u}$.
\[ \nabla_{\mathbf{u}} \mathbf{f} (\mathbf{x}_0) = \lim_{h \to 0} \frac{\mathbf{f}(\mathbf{x}_0 + h \mathbf{u}) - \mathbf{f}(\mathbf{x}_0) }{ h }\]
\end{definition}


%Recall how a partial derivative finds the derivative of the univariate projection of the function onto one variable, however what about an arbitrary direction?
%Partial derivatives will serve as fundamental for linking our understanding of differentiation of multivariate functions to those of univariate functions. By themselves, they are quite limited however.

%The partial derivatives allow us to apply diffeentiation from real analysis to differentiate with respect to the direction of increase of some variable. What about arbitrary direction?

%As stated, partial derivatives are our fundamentals, and if we can specif


%Partial derivatives can take the derivative with respect to any variable, however does there exist a way of taking the derivative of a multivariate function with respect to any arbitrary 'direction', rather than the directions dictated by the variables (usually axis')? If one represents their direction by $\mathbf{u}$, the \emph{directional derivative} acts as a framework to take the derivative along $\mathbf{u}$.

%To be precise, by direction we are refering to a unit vector decomposed into the standard basis


%The idea is to interpret the directional derivative as a total derivative with respect to some $u$ and use the multivariate chain rule.


\subsection{Partial derivative}

We would ideally like powerful methods of evaluating the directional derivative. In the guise of linear algebra, perhaps it is easier to work with the directional derivative of basis elements.

If we consider the standard basis spanning $\mathbb{R}^n$, the directional derivatives taken with respect to these basis elements are called \emph{partial derivatives}.

Indeed, partial derivatives find extensive use in vector analysis.


%As a starting point, we can consider when the function is only changing with respect to one of the variables and apply our definition of differentiation from real analysis. These are the \emph{partial derivatives}; the derivative of a multivariate function with respect to a single variable.

\begin{definition}[Partial derivative]
The \emph{partial derivative of $\mathbf{f}$ at $\mathbf{x}_0$ with respect to the standard basis element $\mathbf{e}_k$} is defined as the following.
%\[ \mathbf{f}_{\mathbf{x}_k} (\begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \\ \vdots \\ \mathbf{x}_m \end{bmatrix}) = \lim_{h \to 0} \frac{\mathbf{f}( \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k +h \\ \vdots \\ \mathbf{x}_m \end{bmatrix} ) - f(\begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \\ \vdots \\ \mathbf{x}_m \end{bmatrix}) }{ h }\]

	\[ \mathbf{f}_{\mathbf{x}_k} ( \mathbf{x} ) = \nabla_{\mathbf{e}_k}\mathbf{f}(\mathbf{x})  = \lim_{h \to 0} \frac{\mathbf{f}(  \mathbf{x}+ h \mathbf{e}_{k} ) - f( \mathbf{x} ) }{ h }\]
\end{definition}

Due to our choice of the standard basis, this is essentially differentiation in the sense of real analysis where other variables behave as constants. We will soon see ways to extend this notion to other coordinate systems.

We've used the notation $f_{x}$ to describe the partial derivative of $f$ with respect to $x$, however most notations of analysis support for partial derivatives.
\emph{Leibniz notation} expresses partial derivatives in the following way.
\[ \frac{\partial f}{\partial x} \]
\[ \frac{\partial^2 f}{\partial x \partial y}\]

\emph{Lagrange notation} expresses partial derivatives in the following way.
\[ f_x \]
\[ f_{xy} \]

\emph{Euler-Arbogast notation} expresses partial derivatives in the following way.
\[ \partial_x f \]
\[\partial_{xy}  f \]

\emph{Newton's notation} does not support partial derivatives.

The following proposition allows us to calculate the directional derivative by means of the partial derivatives.

\begin{proposition}
\[ \nabla_{\mathbf{u}} \mathbf{f} (\mathbf{x}_0) = \sum^{n}_{i=1} \mathbf{f}_{\mathbf{x}_i}(\mathbf{x}_0) \mathbf{u}_i \]
\end{proposition}


\subsection{The Jacobian matrix and Hessian matrix}

Now that the notions of partial derivatives are accessible, we discuss 2 interesting matrixes; they will be useful tools later on.


\begin{definition}[Hessian matrix]
The \emph{Hessian matrix of $f$} is the matrix of second-order partial derivatives of a real multivariate function.
\[ \mathbf{H}_{f} \]
\[ (\mathbf{H}_{f})_{ij} = \frac{\partial^2 f}{\partial \mathbf{x}_{i} \partial \mathbf{x}_{j}} \]
\end{definition}

\begin{definition}[Jacobian matrix]
The \emph{Jacobian matrix of $\mathbf{f}$} is the matrix of first-order partial derivatives for each function within a real vector function.
\[ \mathbf{J}_{\mathbf{f}} \]
\[ (\mathbf{J}_{\mathbf{f}})_{ij} = \frac{\partial f_{i}}{\partial \mathbf{x}_j} \]
\end{definition}

\subsubsection{Properties of Jacobian matrix}
Now that we've discovered that the Jacobian matrix embodies the total differential in a Euclidean space, it is fitting that we study this structure more closely. 

\begin{proposition}
	\[\mathbf{J}_{\mathbf{f}+\mathbf{g}} = \mathbf{J}_{\mathbf{f}} + \mathbf{J}_{\mathbf{g}} \]
	\[\mathbf{J}_{w \mathbf{f}} = w \mathbf{J}_{\mathbf{f}} + \mathbf{f}\mathbf{J}_{w} \]
\end{proposition}


\section{Total derivative}

Though a point may have directional derivatives defined for all directions, it is still possible for the function to be discontinuous at that point! Just because a function is differentiable and continuous through at a point on a certain path, the connection between these paths may not be continuous itself.

A more holistic approach that captures the geometry of multidimensional space is needed for deeper insight of differentiability of such functions. Specifically, differentiability should captrue the idea that at that point, the function has \emph{best linear approximator}; informally, if we 'zoom in' infinitely on the function at that point, it is a linear function.

Let's try and formulate this in the sense of real analysis.

\[ f(x_0+h) \approx f(x_0) + f'(x_0)h \]

\[ \lim_{h \to 0} \frac{ f(x_0+h) -f(x_0) }{h} =f'(x_0)\]
\[ \lim_{h \to 0} \frac{ f(x_0+h) -f(x_0) }{h} - f'(x_0) = 0 \]
\[ \lim_{h \to 0} \frac{ f(x_0+h) -f(x_0) - f'(x_0)h }{h} = 0 \]
\[ \lim_{h \to 0} \frac{ f(x_0+h) - [ f(x_0) + f'(x_0)h] }{h} = 0 \]

This proves that differentiable functions are such that $f(x_0 +h)$ can be represented by a linear approximate $f(x_0) + f'(x_0)h +r(h)$, where $r(h)$ is some remainder function that is dominated by $h$ in the sense that $\lim_{h \to 0}\frac{r(h)}{h}=0$; it is 'negligible' in the limit.

Differentiation from this perspective is easier to generalize; we don't necessarily need to define differentiability by expressing a direct Newton quotient, but by specifying that some linear function \emph{exists} that can approximate the function in this specific way. All we need to do now is generalize the idea of a real linear function from real analysis to real vector linear function!

Linear transforms are the generalization for higher dimension spaces, and these can be represented my matrixes; the analogue of out $f'(x_0)h$ will be $\mathbf{J}\mathbf{h}$, where $\mathbf{J}$ is some matrix that makes such a linear transform. In this vain, we can let our linear approximator be $\mathbf{f}(\mathbf{x}_0) + \mathbf{J}\mathbf{h}$.


The derivative characterizes the best linear approximator of a function around a point

From our studies in linear algebra, we know that linear maps for finite dimensional linear spaces can be specified by a matrix. This means that a notion of a 'derivative' will be a matrix that specifies the linear map rather than a scalar that represents the gradient.


\begin{definition}[Differentiable real vector function]
A real vector function $\mathbf{f} : \mathbb{R}^m \to \mathbb{R}^n$ is \emph{differentiable at $\mathbf{x}_0$} iff there exists some matrix (linear map) $\mathbf{J}$ where the following holds. The matrix $\mathbf{J}$ is the \emph{derivative of $\mathbf{f}$ at $\mathbf{x}_0$} sometimes specified as $\mathbf{f}'(\mathbf{x}_0)$.
\[  \lim_{\mathbf{h} \to \mathbf{0}} \frac{\| \mathbf{f}(\mathbf{x}_0 +\mathbf{h}) - ( \mathbf{f}(\mathbf{x}_0) + \mathbf{J}\mathbf{h} )  \|}{\| \mathbf{h} \|} = 0  \]

\[ \mathbf{f} \text{ is differentiable at }\mathbf{x}_0 \iff  \exists \mathbf{J} [ \lim_{\mathbf{h} \to \mathbf{0}} \frac{\| \mathbf{f}(\mathbf{x}_0 +\mathbf{h}) - ( \mathbf{f}(\mathbf{x}_0) + d\mathbf{f}(\mathbf{h}) )  \|}{\| \mathbf{h} \|} = 0 ] \]
\end{definition}

This is the way to generalize differentiability for any real vector function, and it is backwards compatible with the definition of a differentiable real function (the Jacobian is a $1\times 1$ matrix containing the derivative).

When considering multidimensional Euclidean spaces, this condition is stronger than the notions of directional and partial derivatives; it entirely encapsulates the idea that the function can locally be approximated by a single linear transform.

We introduce a second equivalent totally differentiable function that mentions the use of a remainder function; this is extremely useful in proofs.

\begin{proposition}[Replace this using matrix notation]
A real vector function $\mathbf{f} : \mathbb{R}^m \to \mathbb{R}^n$ is differentiable iff there exists some linear transform $d\mathbf{f}_{\mathbf{x}_0}$ and real vector function $\mathbf{r}$ where the following holds.
	\[ \mathbf{f}(\mathbf{x}_0 + \mathbf{h} ) = \mathbf{f}(\mathbf{x}_0)+ d\mathbf{f}_{\mathbf{x}_0}(\mathbf{h}) + \mathbf{r}(\mathbf{h}) \]
\[  \lim_{\mathbf{h} \to \mathbf{0}} \frac{ \| \mathbf{r}(\mathbf{h}) \| }{ \| \mathbf{h \|}} = 0  \]
\end{proposition}

We now delve into the properties of totally differentiable functions.


\begin{proposition}
Let $\mathbf{f}$ be differentiable at $\mathbf{x}_0$, then $\mathbf{f}$ is continuous at $\mathbf{x}_0$.
\end{proposition}


\begin{proposition}
Differentiable implies directional derivative exists in all directions.
\end{proposition}


The following lemma will be used prove a theorem that will tell us exactly how to calculate the total differential, however the lemma also has some standalone value since it links the total differential to the directional derivative.

\begin{lemma}
	Let $\mathbf{f}$ be a function differentiable at $\mathbf{x}_0$ and let $d\mathbf{f}(\mathbf{x}=\mathbf{M}\mathbf{x}$, then $\nabla_{\mathbf{u}}\mathbf{f}(\mathbf{x}_0) =  \mathbf{M}^i(\mathbf{x}_0) \mathbf{u}$ , where $\mathbf{M}^i$ is the $i$th column of $\mathbf{M}$.
\end{lemma}

We now prove the following revelation.

\begin{theorem}
	Let $\mathbf{f}$  be differentiable at $\mathbf{x}_0$, then the total differential at $\mathbf{x}_0$ is defined by the Jacobian matrix evaluated at $\mathbf{x}_0$ $d\mathbf{f}(\mathbf{u}) = \mathbf{J}_{\mathbf{f}}(\mathbf{x}_0) \mathbf{u}$ 
\end{theorem}

It was our Jacobain matrix all along; the Jacobian matrix represent this best linear transform at a point! If for a real function $f(x_0+h) \approx f(x_0) + f'(x_0)h$, then for a real vector function we have $\mathbf{f}(\mathbf{x}_0)$!

Using this fact, we can prove the following.
\begin{corollary}
	\[ \nabla_{\mathbf{u}} \mathbf{f} = \sum^{n}_{i=1} \mathbf{u}_{i} f_{\mathbf{x}_i} \]
\end{corollary}

We will have an even more elegant representation for this when we study differential operators.


\begin{proposition}
\[ ( \mathbf{f} \circ \mathbf{g} )'(\mathbf{x}) = ( \mathbf{f}'\circ \mathbf{g})(\mathbf{x}) \mathbf{g}'(\mathbf{x})  \]
\[ ( \mathbf{f} + \mathbf{g} )'(\mathbf{x}) =  \mathbf{f}'(\mathbf{x}) + \mathbf{g}'(\mathbf{x})  \]
\end{proposition}


\subsection{Multivariate chain rule}

We've already seen that composition of continuous functions are continuous in any space; however are compositions of differentiable functions differentiable? We've seen this to be true in real analysis, and the chain rule gives us a neat formula on how to compute derivatives with respect to the composed functions and their derivatives.

In higher dimensions the chain rule holds similarly in terms of the Jacobian matrix.

\begin{theorem}[Multivariate chain rule]
\[ ( \mathbf{f} \circ \mathbf{g} )'(\mathbf{x}) = ( \mathbf{f}'\circ \mathbf{g})(\mathbf{x}) \mathbf{g}'(\mathbf{x})  \]
\[ \mathbf{J}_{\mathbf{f} \circ \mathbf{g}} = \mathbf{J}_{\mathbf{f}}(\mathbf{g})  \mathbf{J}_{\mathbf{g}} \]
\[ \mathbf{J}_{\mathbf{f} \circ \mathbf{g}} = \mathbf{J}_{\mathbf{f}}(\mathbf{g})  \mathbf{J}_{\mathbf{g}} \]
\end{theorem}


We require a generalization of the chain rule for such compositions.
\begin{proposition}
Let $\mathbf{f} : \mathbb{R}^m \to mathbb{R}^n $ and $\mathbf{x} : \mathbb{R} \to \mathbb{R}^m $ (parametrized by $u$) be real vector functions, then the following holds.
\[ \frac{\partial \mathbf{f} }{\partial u } = \sum^{m}_{j=1} \frac{\partial \mathbf{f}}{\partial \mathbf{x}_j } \frac{\partial \mathbf{x}_j }{\partial u}\]
\end{proposition}

\subsection{Clairaut's theorem}


\begin{theorem}[Clairaut's theorem; Schwarz' theorem; Young's theorem]
	Let $\mathbf{f} : \mathbb{R}^m \to \mathbb{R}^n $ be a real vector function with continuous second order partial derivatives at $\mathbf{p}$, then the following holds.
\frac{\partial^2 \mathbf{f}}{\partial \mathbf{x}_i \partial \mathbf{x}_j }(\mathbf{p}) =  \frac{\partial^2 \mathbf{f}}{\partial \mathbf{x}_j \partial \mathbf{x}_i }(\mathbf{p})
\end{theorem}

This theorem has many names \emph{Clairaut's theorem, Young's theorem, Schwarz' theorem,} or simply the \emph{symmetry of second derivatives}.

\subsection{Multivariate inverse function theorem}


\begin{theorem}[Multivariate inverse function theorem]
	Let $\mathbf{f} : S \subseteq \mathbb{R}^n \to \mathbb{R}^n$ be a continuously differentiable real vector function and $\mathbf{x}_0 \in S$. If $\mathbf{J}_{\mathbf{f}}(\mathbf{x}_0)$ is an invertible matrix, then there exists neighborhoods $U$ of $\mathbf{x}_0$ and $V$ of $\mathbf{f}(\mathbf{x}_0)$
	\[ \mathbf{J}_{\mathbf{f}^{-1}}(\mathbf{x}_0) = [\mathbf{J}_{\mathbf{f}}(\mathbf{x}_0)]^{-1}\]
\end{theorem}


\begin{theorem}[Inverse function theorem]
	Let $\mathbf{f} : D \subseteq \mathbb{R}^n \to \mathbb{R}^n$ be a continuously differentiable real function and $\mathbf{x}_0 \in D$. If $\mathbf{J}_{\mathbf{f}}(\mathbf{x}_0)$ is an invertible matrix, then there exists open sets $U$ of $\mathbf{x}_0$ and $V$ of $\mathbf{f}(\mathbf{x}_0)$ such that $\mathbf{f}$ is invertible on $U$ and  $\mathbf{f}^{-1} : V \to U$ is continuously differentiable.
Moreover, the following holds for all $\mathbf{y} \in V$.
\[ \mathbf{J}_{\mathbf{f}^{-1}}(\mathbf{y}) = [\mathbf{J}_{\mathbf{f}}(\mathbf{f}^{-1}(\mathbf{y}))]^{-1}\]
\end{theorem}


\subsubsection{Multivariate inverse function theorem}
\subsubsection{Multivariate implicit function theorem}


\subsection{Multivariate Taylor's theorem}

Taylor's theorem generalizes to higher dimensions seamlessly
\subsubsection{Multivariate Taylor's theorem}
\subsubsection{Multivariate Taylor series}

\[T(f,\mathbf{x}_0,n;\mathbf{x}) = \sum^{n}_{k=0} \frac{f^{(k)}(\mathbf{x}_0)}{k!}(\mathbf{x} - \mathbf{x}_0)^k\]

\subsection{Derivative tests}
\subsubsection{Properties of the Hessian matrix}

\begin{proposition}[Multivariate derivative test]
If a real multivariate function $f$ with continuous second derivatives has $\nabla f(\mathbf{x}_0)=\mathbf{0}$ and $\mathbf{H}_f (\mathbf{x}_0)$ is positive definite, then $\mathrm{x}_0$ is a local minimum of $f$.
\end{proposition}


\begin{proposition}
Let $f$ have continuous second-order partial derivatives on $U$. Then $f$ is convex on $U$ iff its Hessian matrix $\mathbf{H}_{f}$ is positive semi-definite.
\end{proposition}

\begin{proposition}
Let $f$ have continuous second-order partial derivatives on $U$. If the Hessian matrix $\mathbf{H}_{f}$ is positive definite on $U$, then $f$ is convex on $U$.
\end{proposition}

\subsubsection{Lagrangian multipliers}


\section{Iterated integration of scalar fields}

By fixing 


\section{Multiple integration of scalar fields}


When studying real analysis, (Riemann) integration represents area bound by a function and a closed interval.
.

We desire an analogue for general scalar fields on $\mathbb{R}^n$, a type of integration that represents the volume bound by a function and a hyperrectangle.

However sets in $\mathbb{R}^n$ 


\[\int (\hdots ( \int  f(\mathbf{x}) d[\mathbf{x}]_n ) \hdots ) d[\mathbf{x}]_{m}\]

One can define multiple integrals in a Riemann style, essentially defining the Riemann multiple integral in a partition-like fashion for hyperrectangles and then covering the domain with finite hyperrectangles.

Alternatively, one can define multiple integrals in a Lebesgue style, which employs the Lebesgue measure to 


Though one can be taught how to calculate multiple integrals in the same way regardless of how multiple integrati and even check whether integrals may be swapped by using Fubini's theorem (engineering moment), the rigorous foundations of multiple integrals are nested in measure theory; a masterful understanding of the subject is inaccessible without measure theory and it is indeed the framework in which Fubini-Tonelli theorem was developed.


\subsection{Fubini's theorem and Tonelli's theorem}

The theorems of Fubini and Tonelli assist immensely in the calculation and interpretation of multivariate integrals. They provide two different sufficient conditions for when one can swap the order of integral nesting.  It turns out we can do this for a relatively rich class of functions.

\begin{theorem}[Fubini's theorem]
Let $f$ be Lebesgue (absolute) integrable on $X \times Y$, then we have the following.
\[ \iint_{X\times Y} f(x,y) d(x,y) =\int_{X} ( \int_{Y} f(x,y) dy )dx = \int_{Y} ( \int_{X} f(x,y) dx )dy \]
\end{theorem}


Tonelli discovered that slighltly weaker assumptions can be made that still permit the swapping of multipe integrals, therefore Tonelli's theorem generalizes Fubini's theorem, however it requires more background in measure theory to comprehend.


\begin{theorem}[Tonelli's theorem]
Let $f$ be a nonnegative measurable function on $X \times Y$, then we have the following.
\[ \iint_{X\times Y} f(x,y) d(x,y) =\int_{X} ( \int_{Y} f(x,y) dy )dx = \int_{Y} ( \int_{X} f(x,y) dx )dy \]
\end{theorem}


Usually functions whose multiple integrals are not swappable are not of interest to integrate, so we usually use the following notation on the right to compactly denote multiple integrals.
\[\int \hdots \int_{D}  f(x_1, \hdots, x_n) dx_1 \hdots dx_n = \int_{D} f(\mathbf{x}) d^n \mathbf{x} \]