\part{Fundamentals}

\chapter{???}

That seeks to give a mathematical description and treatment of 'information'. To this end, many fields of mathematics arise to describe the various ways information manifests itself, the most prominent being probability theory, statistics, combinatorics, and fourier analysis.

Its applications span through data compression to signal processing.


			<h3>Information unit</h3>
			<p>The capacity of some data standard</p>
			<h4>Shannon</h4>
			<p>\( \log_{2}(n) \) where:</p>
			<ul><li>\(n\) is the possible amount of states</li></ul>
			<h4>Nat</h4>
			<p>\( \ln(n) \) where:</p>
			<ul><li>\(n\) is the possible amount of states</li></ul>
			<h4>Harley</h4>
			<p>\( \log_{10}(n) \) where:</p>
			<ul><li>\(n\) is the possible amount of states</li></ul>

			<h3>Communication system</h3>
			<p>The axiomatic properties of a communication system are as follows:</p>
			<p>\( W \to X^{n} \to Y^{n} \to \hat{W}\)</p>
			<ul>
				<li>\(W\) is the original message</li>
				<li>\(f_{n}\) is the encoding sequence</li>
				<li>\(X\) is a channel input symbol</li>
				<li>\(\text{Pr}(Y|X)\) is the channel</li>
				<li>\(Y\) is a channel output symbol</li>
				<li>\(g_{n}\) is the decoding sequence</li>
				<li>\(\hat{W}\) is the estimated message</li>
			</ul>

			<h3>Capacity</h3>
			<p>Measure of how much information can be received on some channel</p>
			<p>\(C = \lim_{t \to \infty} \frac{log (N(t) )}{t}\) where:</p>
			<ul>
				</li>\(C\) is the channel's capacity</li>
				</li>\(t\) is the time variable</li>
				</li>\(N(t)\) represents all legal combinations of symbols that are possible in time \(t\)</li>
				</li>\(\log\) is the logarithm scale of the chosen information unit</li>
			</ul>


\begin{definition}[Information content]
The \emph{information content of an outcome} is a statistic representing how informative knowledge of an event occuring would be. 
\[ I(\omega) = - \log (\mathrm{Pr}(\omega)) \]
Furthermore, the \emph{information content of a discrete RV} is the information content obtained when a RV outputs a certain value.
\[ I_{X}(x) = - \log (\mathrm{Pr}(X=x)) \]
\end{definition}


Information content is defined for events, however can also be for discrete RV outputs (which in themselves are events). FOr continuous RVs however single outputs (at least in general, assuming distribution theory isn't involved) have 0 probability, hence information content does not extend well to it.

We can use this to define entropy for discrete RVs.

\begin{definition}[Entropy for discrete RVs]
Let $X$ be a discrete RV, then its \emph{entropy} is the expected information content of a random variable.
\[ \Eta (X) = \mathrm{E}[I_{X}(X)] \]
\end{definition}


\begin{definition}[Differential Entropy]
Let $X$ be a continuous RV, then its \emph{entropy} is the expected information content of a random variable.
\[ \Eta (X) = \mathrm{E}[-\log \circ f_{X}] \]
\end{definition}