Students who learn about matrix multiplication before linear map composition often think the definition seems arbitrary and unmotivated. (Why not just multiply each component separately, the way addition and scalar multiplication work?) Linear maps are the major reason we use matrices, and matrix multiplication really only makes sense once you figure this out.

But still, any grid of numbers is by definition a matrix, whether or not those numbers came from the co-ordinates of images of basis vectors under a linear map. Whatever context the data in the grid came from, we can do matrix multiplication. The matrix may even be interpreted post hoc as linear map. What's surprising is how often this bonus linear algebraic structure ends up being useful and elegant.

A few examples:

• A graph or network has two natural representations as a grid of numbers; an adjacency matrix $A$ gives the connections between nodes and other nodes, and an incidence matrix $B$ gives the two endpoints of each edge. $(A^n)_{i,j}$ happens to give the number of routes of length $n$ between nodes $i$ and $j$, and $B$ has a post hoc interpretation as the boundary map, which transforms a vector $\mathbf{e}$ of edges into a vector $B\mathbf{e}$ of nodes such that $\mathbf{e}$ is a disjoint union of paths with distinct endpoints $B\mathbf{e}$.
• Markov chain is a random process with a set of states, and each state has a probability distribution for determining the next state. If $P_{i,j}$ is the probability that state $i$ will evolve into state $j$, then it turns out that $(P^n)_{i,j}$ is the probability that state $i$ will evolve into state $j$ after $n$ steps.
• A multi-variable function $f(\mathbf{x})$ has a Hessian matrix $$H_{i,j} = \frac{\partial f}{\partial x_i \, \partial x_j}$$ which appears as a quadratic form $\newcommand\transp{^{\mathsf{T}}} \frac{1}{2}\mathbf{x}\transp H \mathbf{x}$ in $f$'s Taylor series, and its eigenvalues determine the nature of $f$'s stationary points.
• A set of random variables $X_i$ has a covariance matrix $C_{i,j} = \operatorname{Cov}(X_i, X_j)$. Given a linear combination $Y = \sum y_j X_j$, matrix multiplication strikes again: $(C \mathbf{y})_i = \operatorname{Cov}(Y, X_i)$.

How about Pascal's triangle? $$P = \begin{pmatrix} 1 & 0 & 0 & 0 & \cdots \\ 1 & 1 & 0 & 0 & \cdots \\ 1 & 2 & 1 & 0 & \cdots \\ 1 & 3 & 3 & 1 & \cdots \\ \vdots & \vdots & \vdots & \vdots & \ddots \end{pmatrix}$$ Of course it's a matrix. The numbers are determined by the first row, and the formula $P_{i+1,j+1}$ $= P_{i,j} + P_{i,j+1}$. We also know this matrix as $\newcommand\ncr[2]{{}^{#1} \mathrm{C}_{#2}} P_{n,r} = \ncr{n}{r}$.

Alternatively we can write it as a symmetric matrix: $$S = \begin{pmatrix} 1 & 1 & 1 & 1 & \cdots \\ 1 & 2 & 3 & 4 & \cdots \\ 1 & 3 & 6 & 10 & \cdots \\ 1 & 4 & 10 & 20 & \cdots \\ \vdots & \vdots & \vdots & \vdots & \ddots \end{pmatrix}$$ $P$ and $S$ are just grids of numbers; they're defined by a simple combinatorial rule, and there is no linear algebra in sight. Surely matrix multiplication is meaningless here? But in fact, $PP\transp = S$: $$\left(PP\transp\right) {}_{\!i,j} = \sum P_{i,k} \left(P^{\mathsf{T}}\right) {}_{\!k,j} = \underbrace{\sum \ncr{i}{k} \, \ncr{j}{k} = \ncr{i+j}{i}} = S_{i,j}$$ as required.

Let's take a look at that third step more closely - what we're saying is, the number of ways to choose $i$ items from a set of $i+j$ is the same as the number of ways to choose some number $k$ items from a set of $i$ and another $k$ from a set of $j$. There is indeed a natural correspondence between such choices; for example, how many ways are there of choosing $3$ of the symbols A,B,C,X,Y?

 $k = 0$ ABC $k = 1$ AABBCC XYXYXY BCXBCYACXACYABXABY $k = 2$ ABACBC XYXYXY CXYBXYAXY

This yields all $10 = \ncr{3+2}{3}$ ways. The correspondence is simple; choose $k$ of the first set A,B,C not to use, and choose $k$ of the second set X,Y to use.

A consequence of this fact is that if we truncate $S$ so that it is a finite square matrix, then its determinant is $1$. This is not difficult to show: $S = PP\transp$ is a product of a lower-triangular and upper-triangular matrix, each of which has only $1$s on the diagonal.

(This insight is lost on MATLAB:

>> det(pascal(20))ans = 1007.44031415979>> det(sym(pascal(30)))ans = 209

Thanks, IEEE 754.)

Can $P$ be interpreted as a linear map? In fact, it can. $P$ is the matrix of binomial coefficients, so binomial expansions are a good context to consider. If we identify polynomials $f(x) = a_0 + a_1 x + \ldots + a_n x^n$ with row vectors $[f(x)] = (a_0~a_1~\cdots~a_n)$, then $[1]\,P = [1]$, $[x]\,P = [1 + x]$, $[x^2]\,P$ $= [1 + 2x + x^2]$ $= [(1+x)^2]$, and in general $$[x^n]\,P = \sum P_{n,k} \left[x^k\right] = \left[\sum \ncr{n}{k} \, x^k\right] = \left[(1+x)^n\right]$$ Since matrix multiplication is linear, it follows immediately that $[f(x)] \,P = [f(1+x)]$. Therefore the matrix $P$ represents a linear transformation on polynomials given by the substitution $x \mapsto x+1$. Then we see that $P^a$ represents the substitution $x \mapsto x+a$, and so $(P^a)_{n,r}$ is the coefficient of $x^r$ in the binomial expansion $(x+a)^n$.

Somehow, wherever a grid of numbers comes from, linear algebra finds something to say about it.