\chapter{Matrix theory}


\section{Matrixes and Vectors}


Matrixes and vectors introduce a notation that simplifies systems of linear equations into one 'matrix equation'.

Study on the properties of matrixes completes our understanding about solutions to systems of linear equations, and allows us to develop a deeper theory about them; they will become indispensable tools when we study \emph{linear spaces}. The study of linear spaces (and particularly linear maps) also motivates much of matrix theory, hence this chapter serves only as an introduction to the notion of matrixes.

However before we begin, we must acknowledge the existence of an algebraic structure called a \emph{field}.


We mostly deal with the field $\mathbb{R}$, however the sets $\mathbb{Q},\mathbb{C}$ also form fields; indeed there are many more fields out there. We don't require deep knowledge about fields (ring theory) to do the linear algebra of this book.

\begin{definition}
A \emph{scalar} is an element of a field
\end{definition}

\begin{definition}[Matrix]
A \emph{$n \times m$ matrix} is a grid with $n$ rows (horizontal) and $m$ columns (vertical) that indexes mathematical objects. WHen all its indexes are members of the same field $F$, we call it a \emph{matrix over the field $F$}
\[ \mathbf{A} = \begin{bmatrix} \mathbf{A}_{11} & \hdots & \mathbf{A}_{1m} \\ \vdots & \ddots & \vdots \\ \mathbf{A}_{n1} & \hdots & \mathbf{A}_{nm} \end{bmatrix} \]
\end{definition}


Usually the objects indexed in a matrix are numbers, however they could be anything from functions, random variables, sets, etc.
In this book, we will mostly be using matrixes whose entries are elements of $\mathbb{R}$ (matrixes over the field $\mathbb{R}$).

\begin{definition}[Column vector]
A \emph{$n$-coordinate vector} is a $n \times 1$ matrix.
\end{definition}

put in 3.tex
Column vectors are often used for finite dimension linear spaces to represent vectors with respect to their standard basis. $\begin{bmatrix} c_1 \\ \vdots \\ c_n \end{bmatrix} = \sum^{n}_{k=1} c_k \mathbf{e}_{k}$


\begin{definition}[Square matrix]
A \emph{square matrix} a $n \times n$ matrix. In other words,it is a matrix such that the amount of rows equals the amount of columns.
\end{definition}


\section{Matrix algebra}


Just like numbers, there are operations defined on matrixes. We can add two matrixes, multiply two matrixes, and scale a matrix by a number
In primary school, one learns numbers, and then algorithms to add and multiply them. When one learns 'algebra' in highschool, they learn manipulation laws for solving algebraic equations.

When dealing with matrixes, we will follow a similar program; we will discuss what matrixes are, provide algorithms to define what it means to add and multiply (and scale; more on this later) matrixes, and finally discuss the manipulation laws of 'matrix algebra'. 


It is important to remember that in these definitions, we are assuming that entries are elements of $\mathbb{R}$. When matrixes were defined earlier, we allowed the possibility of its entries to contain any type of mathematical object; in this chapter we are defining our matrix algebra only for when entries are elements of $\mathbb{R}$! 

\begin{definition}[Identity matrix]
The \emph{identity matrix of $n$ dimensions} is the square matrix $\mathbf{I}_n$ with dimensions $n \times n$ with the following entries.
Using matrix indexing notation, we have the folliwing.
\[(\mathbf{I}_n)_{ij}= \begin{cases} 1 & i=j \\ 0 i\neq j \end{cases}\]
When context regarding the dimension of the ideneity matrix is clear, one may write the identity matrix as $\mathbf{I}$.
\end{definition}

\begin{definition}[Zero matrix]
A \emph{zero matrix of $n \times m$ dimensions} is the $n \times m$ matrix where all entries zero.
\end{definition}

\subsection{Matrix addition and scalar multiplication}
Matrix addition works 'component wise'; the entries of the same index of both matrixes is added together to make that index for the resultant matrix. That means only matrixes of the same dimension may be added together.
\[ (\mathbf{A}+\mathbf{B})_{ij} = \mathbf{A}_{ij} + \mathbf{B}_{ij} \]

Matrix scalar multiplication works similarly
\[ (k\mathbf{A})_{ij} = k\mathbf{A}_{ij} \]


\subsection{Laws of matrix addition and scalar multiplication}
Matrix addition and scalar multiplication is faithful to regular real number addition and multuplication.
\[\mathbf{A}+\mathbf{B}=\mathbf{B}+\mathbf{A}\]
\[\mathbf{A}+(\mathbf{B}+\mathbf{C})=(\mathbf{A}+\mathbf{B})+\mathbf{C}\]
\[\mathbf{A}+\mathbf{0}= \mathbf{B}+\mathbf{A}=\mathbf{A}\]
\[k(\mathbf{A}+\mathbf{B})= k\mathbf{A}+\mathbf{B}=\mathbf{A}\]
\[\exists \mathbf{A}, \mathbf{B} ( \mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A} )\]
\[\forall \mathbf{A} (\exists -\mathbf{A} [ \mathbf{A}+(-\mathbf{A}=\mathbf{0}])\]

This last law states that for any matrix, there is some 'inverse matrix' you can add to it to get the zero matrix. This can be used to define matrix subtraction.
\[\mathbf{A}-\mathbf{B}=\mathbf{A}+(-\mathbf{B})\]
\subsection{Matrix multiplication}


\subsection{Laws of matrix multiplication}

Matrix multiplication unfortunately isn't as liberal as real number multiplication.
Matrix multiplication multiplies entries of the first matrix' row by entries of the second matrix's columns, and takes their sum.
\[ (\mathbf{A} \mathbf{B})_{ij} = \sum^{n}_{k=1} \mathbf{A}_{ik}\mathbf{B}_{kj} \]

\begin{itemize}
\item Matrix multiplication by the identity matrix returns the original matrix
\item Matrix multiplication doesn't necessarily commute (order of matrixes may change result)
\end{itemize}
\[\mathbf{A}\mathbf{I}=\mathbf{I}\mathbf{A}=\mathbf{A}\]
\[\exists \mathbf{A}, \mathbf{B} ( \mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A} )\]
\[\mathbf{A}(\mathbf{B}\mathbf{C})=(\mathbf{A}\mathbf{B})\mathbf{C}\]

Are there 'inverse matrixes' with respect to matrix multiplication? Yes, however the conditions for this are not so straightfoward in comparison to matrix addition. 

In the case of real numbers, the only real number that cannot be multiplied to make 1 is 0. For square matrixes, that cannot be matrix mutliplied by anything to make the identity matrix have a \emph{determinant} of 0; we require new theory to understand this.


\subsection{Relationship to linear equations}

We have developed a whole new type of mathematical object and defined how operations act upon them. Perhaps it is starting to take shape why all these definitions are being made.

A system of linear equations can be reformulated in terms of in terms of matrixes and vectors; we can treat a system of linear equations as a simgle matrix equation.

Matrix multiplication is designed in such a way that given a vector $\mathbf{x}$ filled with variables, a matrix $\mathbf{A}$ filled with coefficients (where the first row is for the first equation's coefficients, second row is for the second equation's coefficients etc.), and a vector of right hand side numbers $\mathbf{b}$, our entire system of linear equations can be written simply as $\mathbf{A}\mathbf{x} = \mathbf{b}$.

What's more is that we can think of the problem as such; given a 'function' $T(\mathbf{x}) = \mathbf{A}\mathbf{x}$, when do we have $T(\mathbf{x})=\mathbf{b}$? We are familiar with finding the $x$ such that $f(x)=y$; systems of linear equations now have the analogue of finding the $\mathbf{x}$ such that $T(\mathbf{x})=\mathbf{b}$! This kind of function is called a 'linear map'; we will study these in this book.

In a basic sense, mathematicians can use matrixes like 'arrays'; notational 'boxes' employed as a way to conveniently index elements, but the true power of matrixes comes from their ability to define a unique 'linear map'. Even though at first matrixes just look like boxes that hold numbers, they can also linear functions that map vectors to vectors! Matrix multipliciation can even be thought of as composition of the linear maps induced by these matrixes, but again, we'll savor the details for later.

From now on, we will treat systems of linear equations in terms of matrix theory, that is, we will write systems of linear equations as $\mathbf{A}\mathbf{x}=\mathbf{b}$.


Although we are discussing matrixes 

\section{Determinant}


For polynomial functions, the discriminant tells us whether there exist any real solutions to the polynomial.
For systems of linear equations (matrixes), the determinant tells us whether a unique solution exists.

\begin{definition}[Determinant function]
\end{definition}

The discriminant and determinant are similar in this aspect, and this was the original purpose of the determinant; some 'invertibility' statistic.

\subsection{Invertible matrix}

Reciprocating a number is a 'multiplicative inverse', division is essentially multiplying by such an 'inverse element' since $a \cdot \frac{1}{b} = a \div b$. The only number that isn't 'invertible' is 0; $\frac{1}{0}$ is undefined.
Can we develop a similar system for matrixes? Is every matrix invertible?
\begin{definition}[Invertible matrix]
An \emph{invertible matrix} is a square matrix $\mathbf{A}$ such that there exists some $\mathbf{A}^{-1}$ such that the following holds.
\[ \mathbf{A} \mathbf{A}^{-1}  = \mathbf{A}^{-1} \mathbf{A} = \mathbf{I}\]
Such an $\mathbf{A}^{-1}$ is called the \emph{inverse matrix of $\mathrm{A}$}
\end{definition}


\subsection{Invertible matrix theorem}


Unfortunately, not evey matrix is invertible. $\mathbf{0}$ is a counterexample, however there turn out to be many more; is there a 'test' or condition we can use to check invertibility?

\begin{theorem}[Invertible matrix theorem]
\[\mathbf{A} \text{ is invertible } \iff \mathrm{det}(\mathbf{A}) \neq 0\]
\end{theorem}


\subsection{Properties of the determinant}
Although we've been using the determinant as a 'statistic' to ensure the existence of a unique solution to a system of linear equations, the determinant is actually much more.

First of all, let's note the properties that the determinant has.

\begin{proposition}
\[\mathrm{det}(\mathbf{A}\mathbf{B}) =  \mathrm{det}(\mathbf{A})\mathrm{det}(\mathbf{B})\]
	\[\mathrm{det}(\mathbf{A}^{\intercal}) =  \mathrm{det}(\mathbf{A}) \]
	\[\mathrm{det}(\mathbf{A}^{-1}) =  \frac{1}{\mathrm{det}(\mathbf{A})}\]
	\[\mathrm{det}(\mathbf{A}^{-1}) =  \frac{1}{\mathrm{det}(\mathbf{A})}\]
\end{proposition}

The determinant sends many matrixes to one scalar; the determinant is a way of alikening matrix multiplication to scalar multiplication (the technical name for this is a homomorphism).


\subsection{Laplace expansion}
\subsection{Leibniz' formula}
\subsection{Cramer's rule}
\subsection{Principle minors}
\subsection{Matrix determinant lemma}


\section{Transposition}

\begin{definition}[Transposition]
\[(\mathbf{A}^{\intercal})_{ij} = (\mathbf{A})_{ji}\]
\end{definition}

\begin{definition}[Conjugate transposition]
\[(\mathbf{A}^{*})_{ij} = (\mathbf{A}^{\intercal})^{*}_{ij}\]
\end{definition}

\subsection{Symmetric matrix}
\begin{definition}[Symmetric matrix]
A \emph{symmetric matrix} is a square matrix such that it is equal to its transpose.
\[\mathbf{A} \text{ is symmetric } \iff \mathbf{A}=\mathbf{A}^{\intercal}\]
\end{definition}


\section{Miscellaneous formulae}
trace
Sherman-Morrison formula


Matrixes of the form $\mathbf{A}+\mathbf{u}\mathbf{v}^{\intercal}$ are relatively frequent in mathematics and its applications, so it would be useful to have an algebraic formula for its inverse.


\begin{theorem}[Sherman-Morrison formula]
\[ ( \mathbf{A} + \mathbf{u}\mathbf{v}^{\intercal})^{-1} = \mathbf{A}^{-1} - \frac{\mathbf{A}^{-1}\mathbf{u}\mathbf{v}^{\intercal}\mathbf{A}^{-1}}{1+\mathbf{v}^{\intercal}\mathbf{A}^{-1}\mathbf{u}}\]
\end{theorem}

This can be seen by considering $( \mathbf{A} + \mathbf{u}\mathbf{v}^{\intercal})\mathbf{x} = \mathbf{y}$.

Theory of eigenequations also proves this (for $\mathbf{I}+\mathbf{u}\mathbf{v}^{\intercal}$, whcih can easily be transformed back), since we have the following.

\[(\mathbf{I} + \mathbf{u}\mathbf{v}^{\intercal})\mathbf{u}=(1+\mathbf{v}^{\intercal}\mathbf{u})\mathbf{u}\]

The only other eigenvalue is $1$ in this eigenequation. Indeed any $\mathbf{I}+g\mathbf{u}\mathbf{v}^{\intercal}$  has such a spectrum.

A common result in theory of eigenequations tells us that the inverse matrix has a reciprocal eigenvalue linked to $\mathbf{u}$. Therefore the inverse matrix must be of form $\mathbf{I}+g\mathbf{u}\mathbf{v}^{\intercal}$ too, since $\frac{1}{1}=1$.
\[(\mathbf{I} + \mathbf{u}\mathbf{v}^{\intercal})^{-1}\mathbf{u}=(1+\mathbf{v}^{\intercal}\mathbf{u})^{-1}\mathbf{u}\]


By solving what $\mathbf{I}+g\mathbf{u}\mathbf{v}^{\intercal}$ would be the inverse we find that
\[ ( \mathbf{I} + \mathbf{u}\mathbf{v}^{\intercal})^{-1} = \mathbf{I} \frac{\mathbf{u}\mathbf{v}^{\intercal}}{1+\mathbf{v}^{\intercal}\mathbf{u}}\]

Tranforming $\mathbf{u} \to \mathbf{A}^{-1}\mathbf{u}$ and right-mutliplying both sides by $\mathbf{A}^{-1}$ gives us the Sherman-Morrsion formula.