The Distributive Law (a + b)c = ac + bc

Zarankiewicz Algorithm 1: Pruning the Search Tree

This post is the fourth in a series of posts about my computational approach to the Zarankiewicz problem.

So far we have a rudimentary backtracking algorithm which beats brute force, but not much else. However, it's still embarrassingly easy to improve: the algorithm wastes a lot of time investigating partial solutions which

  1. Are unlikely to lead to an optimal solution, meaning we should try them later when we have a better lower bound on the solution;
  2. Cannot lead to an optimal solution, meaning we should just reject them immediately; or
  3. Are symmetric to other partial solutions we considered earlier, so they can't lead to a better solution than one we already found. We can reject these too.

That means there are large branches of the search tree which we're exploring unnecessarily. We want to prune these branches; let's consider our options.


Number 1 is the order we navigate the search tree in. At first glance, this seems irrelevant: visiting the same nodes in any order should take the same total time. However, once we notice that finding a solution quickly gives us a lower bound which can be matched against an upper bound, we realise that the order does matter, because we might end up visiting fewer nodes.

Considering the rows as binary numbers, we're trying from $00000$ to $11111$ by incrementing. This means we're first trying rows without many $1$s; but we want as many $1$s as possible. Perhaps we should start at $11111$ and decrement; on the other hand, putting a lot of $1$s into the first row right away will severely constrain how many $1$s we can fit in the remaining rows, since there will be many more potential rectangles to be avoided.

These options are mutually exclusive; we can only pick one order to navigate the tree in, so we'll want to compare which option is best.


Number 2 is how we reject partial grids. If the upper bound on the potential weight of a partial grid is less than the lower bound on the optimal solution, then the partial grid can be rejected. The lower bound is given by whatever solution we've already found, so what we need is upper bounds. A simple way to bound the weight of a potential solution would be to total the weights of the filled rows, and add $n$ (the number of columns) per unfilled row. We'll call this the naïve bound, and it works for the trivial reason that a row can't have more $1$s than it has columns.

There are other ways to get an upper bound, and they're complementary; we can compute as many as we want, and having more probably allows us to prune more of the tree. We don't need to choose one "best" method.

In "post 0" of this series, we counted pairs of columns to get an upper bound. Let's revisit that argument in the context of our backtracking search algorithm, where grids have some rows already filled in:

$\begin{pmatrix} 0 & \bbox[2pt,#FF8080]{1} & \bbox[2pt,#FF8080]{1} & 1 \\ 1 & \bbox[2pt,#80FF80]{0} & \bbox[2pt,#80FF80]{0} & 1 \\ * & \bbox[2pt,#80FF80]{*} & \bbox[2pt,#80FF80]{*} & * \\ * & \bbox[2pt,#80FF80]{*} & \bbox[2pt,#80FF80]{*} & * \end{pmatrix}$ $\begin{pmatrix} 0 & \bbox[2pt,#FF8080]{1} & 1 & \bbox[2pt,#FF8080]{1} \\ 1 & \bbox[2pt,#80FF80]{0} & 0 & \bbox[2pt,#80FF80]{1} \\ * & \bbox[2pt,#80FF80]{*} & * & \bbox[2pt,#80FF80]{*} \\ * & \bbox[2pt,#80FF80]{*} & * & \bbox[2pt,#80FF80]{*} \end{pmatrix}$
$\begin{pmatrix} 0 & 1 & \bbox[2pt,#FF8080]{1} & \bbox[2pt,#FF8080]{1} \\ 1 & 0 & \bbox[2pt,#80FF80]{0} & \bbox[2pt,#80FF80]{1} \\ * & * & \bbox[2pt,#80FF80]{*} & \bbox[2pt,#80FF80]{*} \\ * & * & \bbox[2pt,#80FF80]{*} & \bbox[2pt,#80FF80]{*} \end{pmatrix}$ $\begin{pmatrix} \bbox[2pt,#80FF80]{0} & 1 & 1 & \bbox[2pt,#80FF80]{1} \\ \bbox[2pt,#FF8080]{1} & 0 & 0 & \bbox[2pt,#FF8080]{1} \\ \bbox[2pt,#80FF80]{*} & * &* & \bbox[2pt,#80FF80]{*} \\ \bbox[2pt,#80FF80]{*} & * &* & \bbox[2pt,#80FF80]{*} \end{pmatrix}$

For each highlighted pair of columns, adding a row with $1$s in both columns would create a rectangle. In effect, each completed row "claims" some pairs of columns, forbidding rows below it to have $1$s in both. Importantly, the number of pairs claimed by a row is only determined by its weight, not where the $1$s are: if a row has weight $w$, it claims ${w \choose 2} = \frac{1}{2}w(w-1)$ pairs of columns. In this example, the first row claims three pairs and the second row claims only one.

Since there are only finitely many pairs of columns in total, only finitely many pairs remain to be claimed by the rows yet to be filled in. If the number of unclaimed pairs is $T$, the remaining rows must have weights $w_i$ satisfying $\sum \frac{1}{2}w_i(w_i-1) \le T$, putting an upper bound on $\sum w_i$ and hence on the weight of a completion. We'll call this the convex optimisation bound, because as we identified earlier, maximising $\sum w_i$ in this expression is a convex optimisation problem.*


Number 3 is called symmetry reduction. Permuting the rows and columns of a grid preserves its weight and the rectangle-free property, so there are many optimal solutions, and we only need to find one. To take advantage of this, we can search only for grids in a particular form, and ignore any partial grids not in that form; this is valid if symmetries guarantee at least one optimal grid in that form.

It's not yet obvious what demands we can make about the form of the solution, but a simple one is requiring the rows to be in order. It's easy to guarantee there is an optimal solution with its rows in order, and this is intuitive when you think about the kinds of solution that our algorithm already finds:

$$\begin{pmatrix} \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & 1 & 1 \\ \cdot & \cdot & \cdot & \cdot & 1 & 1 & \cdot & 1 \\ \cdot & \cdot & \cdot & 1 & \cdot & 1 & 1 & \cdot \\ \cdot & \cdot & 1 & \cdot & 1 & \cdot & 1 & \cdot \\ \cdot & 1 & \cdot & 1 & 1 & \cdot & \cdot & \cdot \\ \cdot & 1 & 1 & \cdot & \cdot & 1 & \cdot & \cdot \\ 1 & \cdot & 1 & 1 & \cdot & \cdot & \cdot & 1 \\ 1 & 1 & \cdot & \cdot & \cdot & \cdot & 1 & \cdot \end{pmatrix}$$

(I've replaced each $0$ with a $\cdot$ to make this easier to see.) There are $8! = 40{,}320$ solutions with these same rows in various orders, but this is the first one the plain backtracking algorithm found, and its rows are in order (as binary numbers), even though we didn't ask for them to be! This is no coincidence: grids with their rows in order appear earlier in the search tree than their symmetric counterparts, so of course we find them first. The issue is that we don't then need to find the other $40{,}319$ permutations.**

We could implement symmetry reduction by making the rejectNode() method reject out-of-order rows. However, this is wasted effort; if the current row is $01001$, for example, then we know we're going to reject any new row earlier than $01001$; why even try them? We could simply skip over these nodes when we're navigating the tree: instead of starting new rows at $00000$, we can make the descend() method copy the state of the current row.*** Let's call this strategy lexicographic order, or lex order for short, because the rows are in "alphabetical order" as strings. (This is only true because we used the "increment" strategy; if we decrement instead, it will still be "alphabetical order" in a sense, but our "alphabet" will have $1$ before $0$.)

Different symmetry reduction rules may be complementary or mutually exclusive, depending on what they are. This is going to be the main source of complexity in the algorithm we're working towards; we might have $20$ different demands, and we'll need a way to decide whether they can all be demanded simultaneously.


There are three dimensions of variation - the order we navigate the tree, the choice of upper bounds, and whether we use symmetry reduction - so there are many combinations to compare. To keep it manageable, let's not compare options where we know what will be faster:

  • Symmetry reduction is free, since we're skipping nodes rather than adding extra checks (which would take CPU-time).
  • The convex optimisation bound will reject all of the same nodes as the naïve bound, and more: the naïve bound says an unfilled row might be full of $1$s, whereas the convex optimisation bound says that would claim every pair of columns. The naïve bound should be faster to compute, so we'll still try it, but there's no need to use both at once.

To enable such varied combinations of strategies in our code, a Solver has a Navigator and a list of Rejectors, where the Navigator is responsible for defining the descend()canAdvance()advance() and ascend() methods which navigate the search tree, and each Rejector defines a rejectNode() method. (A Bound is a kind of Rejector.)

Enough talk; let's see the results:


$+$/$-$: increment/decrement, N: naïve bound, CO: convex optimisation bound. 

Some observations:

  • Lex order makes a huge improvement: all of these options beat plain backtracking, by an even bigger margin than backtracking beats brute force.
  • Decrementing and the convex optimisation bound are the clear winner; both are improvements by orders of magnitude. They complement each other well, because the convex optimisation bound is good at rejecting the early rows that have too many $1$s, so we get to the right number of $1$s fairly quickly.
  • The naïve bound is useless; it rejected so few nodes that you can't see the difference on the graph. Going by running time, the algorithm was actually faster without it, because computing the bound isn't free.
  • For some sizes like $7{\times}7$ and $9{\times}9$, BT lex$-$ CO gets lucky, and finishes sooner than a best-fit curve would predict. This doesn't happen for either BT lex$+$ CO or BT lex$-$, so it's probably because the convex optimisation bound does much more work when it can be matched against a good lower bound, and lex$-$ sometimes sporadically finds the optimal solution faster than usual, whereas lex$+$ consistently wastes time trying empty rows before it gets to the solution.

From here on we'll forget about lex$+$, and just refer to lex$-$ as lex. This implies our "alphabet" has $1$ before $0$, so to avoid cognitive dissonance we'll stop writing the $0$s.


Let's go further. To improve on this strategy, we might learn something from the kind of solutions it finds:

$$\begin{pmatrix} 1 & 1 & 1 & 1 & \cdot & \cdot & \cdot & \cdot \\ 1 & \cdot & \cdot & \cdot & 1 & 1 & \cdot & \cdot \\ 1 & \cdot & \cdot & \cdot & \cdot & \cdot & 1 & 1 \\ \cdot & 1 & \cdot & \cdot & 1 & \cdot & 1 & \cdot \\ \cdot & 1 & \cdot & \cdot & \cdot & 1 & \cdot & 1 \\ \cdot & \cdot & 1 & \cdot & 1 & \cdot & \cdot & 1 \\ \cdot & \cdot & 1 & \cdot & \cdot & 1 & 1 & \cdot \\ \cdot & \cdot & \cdot & 1 & 1 & \cdot & \cdot & \cdot \end{pmatrix}$$

If you have a keen eye, you'll notice that not only are the rows in lex order, they're also in descending weight order; and the columns are in lex order too. Is this a coincidence, or can we make all three demands?

Proposition. The following grid has no symmetric counterpart whose rows are in both lex order and descending weight order:

$$\begin{pmatrix} 1 & 1 & 1 & 1 & \cdot & \cdot & \cdot & \cdot \\ 1 & \cdot & \cdot & \cdot & 1 & 1 & 1 & \cdot \\ 1 & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & 1 \\ \cdot & 1 & \cdot & \cdot & 1 & \cdot & \cdot & 1 \\ \cdot & \cdot & 1 & \cdot & \cdot & 1 & \cdot & 1 \\ \cdot & \cdot & \cdot & 1 & \cdot & \cdot & 1 & 1 \end{pmatrix}$$

Proof. By exhaustion. (There are only $6! \times 8!$ counterparts to try…)

That's a "no", then: symmetry doesn't guarantee a solution meeting these two demands. However, other combinations are compatible:

Theorem. Every grid has a symmetric counterpart whose rows and columns are in lex order.****

Proof. We'll achieve this by sorting the rows, then the columns, then rows, then columns, and so on; when the grid stops changing, the rows and columns must both be in order. It remains to be shown that the grid does eventually stop changing.

Consider our algorithm with the "decrement" strategy but no symmetry reduction. As we noticed above, if we sort the rows of a grid in lex order, then this grid will be visited earlier by the algorithm.

In fact, sorting the columns results in a grid the algorithm visits earlier, too:

$$\newcommand{\zerodot}{\llap{\vphantom{1}}{\,\cdot\,}} \begin{pmatrix} 1 & \cdot & \bbox[2pt,#FF8080]{\zerodot} & \bbox[2pt,#80FF80]{\zerodot} & \cdot \\ 1 & 1 & \bbox[2pt,#FF8080]{\zerodot} & \bbox[2pt,#80FF80]{1} & \cdot \\ \cdot & 1 & \bbox[2pt,#FF8080]{1} & \bbox[2pt,#80FF80]{\zerodot} & 1 \\ & & \vdots & & \end{pmatrix}$$

Consider the first pair of columns out of order: swapping them will not affect any row with $\bbox[2pt,#FF8080]{1}~\bbox[2pt,#80FF80]{1}$ or $\bbox[2pt,#FF8080]{\zerodot}~\bbox[2pt,#80FF80]{\zerodot}$ in these two columns, so the first row affected has $\bbox[2pt,#FF8080]{\zerodot}~\bbox[2pt,#80FF80]{1}$, as in the example. (If it was $\bbox[2pt,#FF8080]{1}~\bbox[2pt,#80FF80]{\zerodot}$ then these columns would already be in lex order.) Swapping them therefore leaves this row earlier in lex order, and the rows above it unchanged; so the algorithm would visit the resulting grid earlier than the original one.

The result follows because we can't keep going earlier and earlier indefinitely. $\square$

This also explains why the solution above had its columns in lex order; that wasn't a coincidence.

Descending row weight order is compatible with column lex order, too, and the proof is almost the same: but instead of considering the "decrement" navigation strategy, the algorithm in the proof must try rows in descending weight order, and rows of equal weight in lex order. Let's call this weightlex order; it's interesting that by proving that these two demands are compatible, we get - as a byproduct - instructions for how to do it.

As before, we can simply skip nodes whose rows are out of order. Enforcing column lex order isn't free, though: we'll need another Rejector to reject grids whose columns are out of order.*****

By demanding a solution in weightlex order, we get another upper bound: the weights of the remaining rows must be less than or equal to the weight of the most recently added row. Let's call this the descending row weight bound.

We have more strategies to try! Let's try them:


ColLex: column lex order, WLex: weightlex order and descending row weight bound.

The story continues: for all the orders of magnitude we've pruned from the search tree, lex still only gets us to $8{\times}8$, decrementing and the convex optimisation bound together get us to $9{\times}9$, weightlex gets us to $10{\times}10$, and column lex then gets us to $12{\times}12$. It's a long way to $31{\times}31$…


*By the same argument as before, the maximum of $\sum w_i$ is achieved when the $w_i$ differ by at most $1$, so this maximum can be found by a direct calculation.

**Unfortunately, this won't make the algorithm $m!$ times faster; we still have to explore some branches of the search tree which contain symmetric counterparts, because they don't only contain symmetric counterparts. In fact, an asymptotic speed-up by a factor of $O(m!)$ can never be possible, because that's higher than the complexity of the algorithm itself, which should be something like $2^{O(mn)}$, if we keep $n$ fixed.

***Although not visiting nodes is faster than visiting and rejecting them, this strategy does commit us to using the same ordering for symmetry reduction as we use for navigation; in principle, these need not be the same. We want our ordering for symmetry reduction to prune more of the tree and enable better upper bounds (as weightlex does), but this may be incompatible with the desire to find good solutions early.

****Notice that we made no mention of which alphabet the symbols in the grid come from, or the rectangle-free property of the grid. This is a very general result, so unsurprisingly it's already known; this conference paper by Flener et al. (2002) gives essentially the same proof. Interestingly, their example (on page 3) is the Zarankiewicz problem, though they don't identify it by name!

*****Actually, I did write an algorithm which gives column lex ordering for free: by grouping the columns into sections which "can still be permuted", we only need to generate rows which, within each section, have all of their $1$s on the left hand side. The first row has only one section, and each section defines one or two sections in the row below it, forming a binary tree structure. Getting it to work with weightlex is tricky, but possible, and faster; I won't explain it here, though, because it doesn't advance the narrative towards the even faster algorithms.

Zarankiewicz Algorithm 1: Backtracking Search

This post is the third in a series of posts about my computational approach to the Zarankiewicz problem.

A quick summary of where we left off:

  • We're looking for the maximum possible number of $1$s in a grid with $m$ rows and $n$ columns, with the constraint that no four $1$s form a rectangle. This is called the Zarankiewicz problem.
  • Rather than try every possible grid, we're going to build up grids row-by-row, and reject partial grids which definitely can't be completed (because they already contain a rectangle).

A grid can only contain $0$s and $1$s, which computers are good at manipulating, so we'll represent each row as a binary number; a grid will be an array of rows.


In the previous post we discussed how partial grids form a tree structure, with completed solutions at the ends of each branch. Let's elaborate on that a bit more. Each node in the tree represents either a partial or complete grid. There are a very large number of possible grids, so it's hard to think about the whole tree at once. It's also unnecessary; the power of algorithms is that they solve problems by accumulating the effects of small, manageable steps, and at each step we only need to consider a small part of the tree.

What does part of the tree look like?

$\bbox[border:1px solid black]{\begin{matrix} 011 \\ \bbox[2pt,#8080FF]{101} \\ *  \\ * \end{matrix}}$
$\swarrow$ $\swarrow$ $\swarrow$ $\swarrow$ $\searrow$ $\searrow$ $\searrow$ $\searrow$
$\bbox[border:1px solid black]{\begin{matrix} 011 \\ 101 \\ \bbox[2pt,#80FF80]{000} \\ * \end{matrix}}$ $\bbox[border:1px solid black]{\begin{matrix} 011 \\ 101 \\ \bbox[2pt,#80FF80]{001} \\ * \end{matrix}}$ $\bbox[border:1px solid black]{\begin{matrix} 011 \\ 101 \\ \bbox[2pt,#80FF80]{010} \\ * \end{matrix}}$ $\bbox[border:1px solid black]{\begin{matrix} 011 \\ 101 \\ \bbox[2pt,#FF8080]{011} \\ * \end{matrix}}$ $\bbox[border:1px solid black]{\begin{matrix} 011 \\ 101 \\ \bbox[2pt,#80FF80]{100} \\ * \end{matrix}}$ $\bbox[border:1px solid black]{\begin{matrix} 011 \\ 101 \\ \bbox[2pt,#FF8080]{101} \\ * \end{matrix}}$ $\bbox[border:1px solid black]{\begin{matrix} 011 \\ 101 \\ \bbox[2pt,#80FF80]{110} \\ * \end{matrix}}$ $\bbox[border:1px solid black]{\begin{matrix} 011 \\ 101 \\ \bbox[2pt,#FF8080]{111} \\ * \end{matrix}}$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$

This picture contains enough information to think about one step in the algorithm. This is what happens when we "visit" the current node, which has just been reached by adding the blue row, $\bbox[2pt,#8080FF]{101}$:

  1. Since the partial grid at this node doesn't contain any rectangle so far, we can continue.
  2. Descend to the next layer of the tree, by inserting $\bbox[2pt,#80FF80]{000}$.
  3. "Visit" the node with the partial grid we just created.
  4. Once we're done with that, advance to the next sibling in the tree, and do step 3 again.
  5. Once we're done with $\bbox[2pt,#FF8080]{111}$, we can't advance any further, so we've finished visiting the blue node. Delete the child, and go back up to the previous layer in the tree.

This is a recursive algorithm: "visiting" a node requires "visiting" its children.

Note that three of the new partial solutions, in red, have rectangles; these won't need to be explored any further. There are some more special cases to handle: what happens when we reach the bottom of the tree (i.e. there are no more rows to add), and what happens when we "go back up one" from the top of the tree (i.e. there are no rows to delete). These occur when we find a solution, and when the algorithm terminates.


The fact that we can sometimes reject nodes without having to visit all of their children is what distinguishes this algorithm from brute force. However, it's not quite the same as the algorithm for solving (weak) combination locks: in that problem, once a digit worked, we knew it was correct, so we could simply continue solving by moving onto the next digit. That's a greedy algorithm. For this problem, there are two subtle but important differences:

  1. Completing a grid doesn't mean we've finished, because other grids might turn out to be better solutions, with more $1$s.
  2. We don't immediately know whether a partial grid should be rejected or not: a grid might be rejected immediately (e.g. because it already contains a rectangle), but it also might be rejected later due to all of its children being rejected. This is like a dead end in a maze: a corridor is a dead end if it's blocked right in front of you, but it could also be a "dead end" in the sense that each way forward eventually leads to a dead end itself.

Both are addressed by the instruction to "go back up" in step 5; like solving a maze, once we we've finished exploring everywhere we can get to from this node, we must backtrack and explore somewhere else. We've designed a backtracking algorithm.


We'll write in Java. I've put the full code in a git repository, but let's look at the important part:

// solve using a backtracking algorithm
public int solve() {
    grid = new Grid(width, height);
    // count how many nodes we visit
    nodesVisited = 0;
    // this method does all of the work.
    // after visiting the root node, we're done!
    return bestWeight;

// visit one node in the search tree
private void visitCurrentNode() {
    // we are visiting another node, so increment this counter
    if(rejectNode()) { return; }
    if(grid.isComplete()) {
        // we reached the bottom of the search tree, so the grid is complete.
        int newWeight = grid.weight();
        // check if this solution is better than what we had before
        if(newWeight > bestWeight) {
            bestWeight = newWeight;
            // uncomment this line to see the solutions found.
    // move down the tree to visit this node's children
    // visit every child node
    while(true) {
        // visit this child node
        // stop if there are no more siblings to visit
        if(!canAdvance()) { break; }
        // otherwise move on to the next sibling
    // after visiting all of this node's children, backtrack

Notice that this is written fairly abstractly: this makes it easier to understand the code, and also to improve the algorithm later.*


To fill in some of the missing details, we're representing a row as a Row object, which represents the current state of the row as a binary number; for now, descend() adds a new Row like $00000$ (represented by the number 0) to gridadvance() increments this number, and ascend() deletes it once we've tried all of its possibilities.

The only remaining non-trivial detail is the code to check whether the grid has a rectangle. To do this, we take advantage of the binary number representation in the Row class:

// determines whether any rectangle occurs across these two rows
boolean hasRectangleWith(Row other) {
   int t = this.asNumber & other.asNumber;
   return (t&(t-1)) != 0;

A rectangle occurs when two rows have two or more $1$ bits in common in their binary numbers. The bitwise operation & finds the common $1$ bits. The next line determines whether the result t has two or more $1$ bits; it's non-obvious why this works, but rather than explain it here, I think it's a good puzzle to try yourself.**


At last, we have an algorithm! It's certainly not efficient, but it beats brute force:***


By one measure, we only beat brute force by $1$: our backtracking algorithm can solve a $7{\times}7$ grid before the search tree explodes beyond feasibility, whereas brute force can get up to $6{\times}6$. By another measure, we beat brute force by a factor of about $400$: for a $7{\times}7$ grid, our algorithm visits about $1/400$th the number of nodes that brute force would. This will be a recurring theme as we continue: we can improve the algorithm by orders of magnitude, again and again, but it only gets us a little further each time.

In general, there's not going to be much point drawing graphs like this; every algorithm is going to cling to the $n$ axis until it explodes. If we squash the vertical axis logarithmically, we can see more detail:


This is comparing by the number of nodes visited. The efficiency of the algorithm is mostly determined by how the number of nodes visited grows with the size of the grid, but of course this doesn't correspond exactly with running time. It might still be worth visiting more nodes if we can visit each node faster. That said, the time taken to visit each node is not going to make a big difference in what inputs are feasible or infeasible; what matters is how long we can stave off the explosion. We're not trying to squeeze every millisecond out of the algorithm.


Where to go from here? There are many avenues for improvement, and we've identified a few already.

Since we want to find good solutions as soon as possible, perhaps we should start with a row like $11111$, and make advance() decrement this number. We're trying to maximise the number of $1$s, so perhaps we should try bigger numbers first, as they tend to have more $1$s in binary.

We are rejecting partial grids which have rectangles; however, we could also reject partial grids if we can know they won't lead to an optimal solution. To do this, we need an upper bound on how many $1$s the completion of a partial grid can have, and if this is too low, the node can be rejected by the rejectNode() method.

How about symmetry? We're free to permute the rows and columns in a solution, while preserving the rectangle-free property and the number of $1$s in the grid. A large proportion of the nodes are symmetrical to other nodes which we'll already have visited; if we can identify which nodes, we don't need to visit them.

We'll explore these options in the next post in this series.


*Of course, since I'm writing these posts long after writing the algorithms, I can say with the benefit of hindsight that we will indeed be able to improve substantially on the algorithms described here.

**Even if you won't find any use for them, bit manipulation hacks like this are at least good logic puzzles. Of course, because this is Java we could have written Integer.bitCount(t) > 1; but that's no fun.

***We can implement the brute force algorithm by making rejectNode() always return false for partial grids; this forces it to visit the whole search tree. That said, brute force is far too slow and I really don't want to wait around for it; instead, we'll use the exact formula$$\sum_{i=0}^{m} 2^{in} = \frac{2^{(m+1)n} - 1}{2^{n} - 1}$$to calculate exactly how many nodes the brute force algorithm would visit; $2^{in}$ is the number of nodes in layer $i$ of the tree.

How to beat Brute Force

This post is the second in a series of posts about my computational approach to the Zarankiewicz problem.

Let's begin with a quick summary of what the problem is. You have a grid with $m$ rows and $n$ columns, and in each cell of that grid you must write either a $1$ or a $0$. The objective is to fit as many $1$s as you possibly can, but no four $1$s may lie on the corners of a rectangle.*

$$\begin{pmatrix}\bbox[3pt,white]{1}&1&\bbox[3pt,white]{0} \\ 1&0&1 \\ \bbox[3pt,white]{0}&1&\bbox[3pt,white]{1}\end{pmatrix}~~~\begin{pmatrix} \bbox[3pt,#FF8080]{1}&0&\bbox[3pt,#FF8080]{1} \\ 1&1&0 \\ \bbox[3pt,#FF8080]{1}&0& \bbox[3pt,#FF8080]{1} \end{pmatrix}$$Left: a solution. Right: an invalid grid.

If you want to solve a discrete optimisation problem like this, the simplest thing to do is: try everything, throw away the things that aren't valid, measure the ones that are, and keep the best one. This is the "brute force" algorithm, though really it's a class of algorithms, because the definitions of "everything", "valid" and "measure" will vary depending on the problem being solved. Formally, we have a set of candidate solutions $X$, a constraint on $X$, and a profit function $\pi : X \to \mathbb{R}$.

Brute force is notoriously slow for combinatorial problems; $X$ is usually extremely large, even when the numbers in the problem are small. In the Zarankiewicz problem, we have $mn$ cells in the grid, and $2$ choices of what to write in each cell, so there are $2^{mn}$ possible ways of completing the grid. For a $5{\times}5$ grid, this number is about 34 million, for a $7{\times}7$ grid it's about 563 trillion. By the time we get to $31{\times}31$, the number of things to try is 290 digits long; we'll never be able to try that many things.**


But if we don't try everything, how can we be sure to find the best one? This is the challenge.

In fact, sometimes we really do have to try everything. We need problems like this in security: consider a combination lock, for example.


(Image modified from this one.)

To open this lock, you need all three cylinders in their correct orientations, so that the internal gaps align with where the rod needs to move. If the lock is well-designed, the only way to discover the correct combination is to try them all until you find the one which works. This is the brute force algorithm, where $X$ is "all sequences of three digits, from 000 to 999", there is no constraint, and $\pi$ is $1$ or $0$ depending on whether the lock opens or not.*** This takes up to $1000$ tries.

If there is an algorithm faster than brute-force, the combination lock has a design flaw. The image above suggests one, which is indeed a common weakness in cheap locks: if the rod can move through the first cylinder, we know the first digit is correct; if it also moves through the second cylinder, we know the second digit is also correct. This represents a structure in the problem - if we "measure" a combination $\renewcommand{\vec}{\mathbf} \vec{a} = (a_1,a_2,a_3)$ by the number of cylinders $\pi(\vec{a})$ the rod can move through, then whenever $\pi(\vec{a}) = 0$ and $b_1 = a_1$, we also know $\pi(\vec{b}) = 0$, for example. Importantly, we know this without trying $\vec{b}$.

This can indeed be turned into an efficient algorithm: for $0 \le a_1 \le 9$ try each possible $(a_1,0,0)$ until we find one with $\pi(\vec{a}) \ge 1$. Then for $0 \le a_2 \le 9$ try each possible $(a_1,a_2,0)$ until we find one with $\pi(\vec{a}) \ge 2$. Finally for $0 \le a_3 \le 9$ try each possible $(a_1,a_2,a_3)$ until we find one with $\pi(\vec{a}) = 3$. This takes at most $30$ tries: a significant improvement. That's bad for security, but good for computer science.


On an abstract level, this works because inspecting one candidate solution gives information about many others. In this example we were able to reject up to $100$ candidates per inspection. This points to a general (but vague) principle of combinatorial optimisation: efficient search algorithms exploit structure within the problem, to reject as many candidates per inspection as possible.

As with the combination lock, this usually requires decomposing the candidate solutions into components - in this case, the individual digits in the sequence. It may be more convenient to represent "unfilled" components explicitly, rather than choosing a default. The result can be interpreted as a tree, in which e.g. $(a_1,a_2,*)$ is "descended from" $(a_1,*,*)$. The presence of a $*$ means the candidate is a partial solution; a non-partial descendant is a completion.

If this is possible, we might be able to find logical relationships between the components and the validity or measurement of the full candidate. These can take various forms:

  1. If $\vec{a}$ is invalid, and $\vec{b}$ is a completion of $\vec{a}$, then $\vec{b}$ is also invalid.
  2. For some function $U$, $\pi(\vec{b}) \le U(\vec{a})$ whenever $\vec{b}$ is a completion of $\vec{a}$.

These allow us to instantly reject all completions of $\vec{a}$, whenever $\vec{a}$ is invalid or $U(\vec{a})$ is too small. There is another kind of structure which many problems have: symmetry.

  1. Candidates $\vec{a}$ can be transformed into other candidates $\sigma\vec{a}$, where $\sigma\vec{a}$ is valid exactly when $\vec{a}$ is, and $\pi(\sigma\vec{a}) = \pi(\vec{a})$.

This allows us to reject $\vec{a}$ whenever $\sigma\vec{a}$ has already been considered.****


Returning to the Zarankiewicz problem, one way to decompose the grid is as a sequence of rows. A row itself is a sequence of $1$s and $0$s, of a fixed length, which is convenient for representation on a computer. The rectangle-free constraint can be phrased as "no two rows have more than one $1$ bit in common", which can be tested by bitwise operations.

Most importantly, this decomposition is compatible with the structure of the problem: if a partial solution has a rectangle, then any completion also has a rectangle, as in (1.). Furthermore, by considering the weights (number of $1$s) of the rows in a partial solution, we can give an upper bound on the weight of its completions, as in (2.), based on the convex optimisation bound described in a previous post.

We can also exploit symmetry. The problem has lots of transformations as in (3.): any permutation of rows and columns will do, because swapping rows (or columns) cannot introduce or remove a rectangle, and does not change the weight. As we'll see in later posts, there are many ways that all this symmetry can be exploited, and some are much better than others.


*Formally, for $i \ne k$ and $j \ne l$, not all of $A_{i,j}$, $A_{i,l}$, $A_{k,j}$, $A_{k,l}$ can equal $1$. For pedants, only axis-aligned rectangles are forbidden.

**Combinatorial explosion is, for example, why IBM's Deep Blue and Google's AlphaGo can't just be replaced with about 15 lines of code.

***We could instead have defined a combination as "valid" if it opens the lock, but that would be less convenient for the analogy.

****Alternatively, we can reject $\vec{a}$ when $\sigma\vec{a}$ will be considered later, so long as we don't later reject $\sigma\vec{a}$ by circular reasoning. We can make this consistent by comparing $\vec{a}$ and $\sigma\vec{a}$ in some ordering, which need not be the order that candidates are considered in.

The Zarankiewicz Rollercoaster

A lot has happened since my last post on the Zarankiewicz problem, so it's storytime, and will be for several posts to come. Let's be honest: storytime is better than mathstime, even in the opinion of many mathematicians.

  1. Keyboards and Projective Geometry
  2. The Zarankiewicz Rollercoaster
  3. How to beat Brute Force
  4. Zarankiewicz Algorithm 1: Backtracking Search
  5. Zarankiewicz Algorithm 1: Pruning the Search Tree

I do think I should spoil the ending, though, so you (dear reader) will know where it's going.

Long story short, as of 9pm last night I have $z(n,n)$ for every $n \le 31$, and this would have been an improvement on the previous record of $n \le 21$, except here's the punchline, from the arXiv this month:


Appendix: Small Zarankiewicz Numbers.


Well played, Narjess Afzaly and Professor Brendan McKay. Well played.


The good news is, my results are in complete agreement. Also, given the apparent interest in the problem, I thought it would be a good idea to start an online database for results on this and related problems, so I did.

Card Games and Latin Squares

In a previous post, we saw that a card game with some particular properties is essentially the same thing as a finite projective plane (FPP). For some $n \ge 2$,

  • Each card has $n+1$ symbols,
  • Each symbol appears on $n+1$ cards,
  • Any two cards share exactly one common symbol, and
  • Any two symbols appear together on exactly one card.

$n$ is the order of the FPP. All manner of elusive combinatorial structures turn out to be equivalent to, constructible from, or otherwise intimately connected with FPPs. This post is about one particular case: complete sets of mutually orthogonal Latin squares.


Backtrack for a moment. A Latin square of order $n$ is an $n{\times}n$ grid containing the numbers $1,2,\ldots,n$ such that each row and each column contains all $n$ numbers. By the pigeonhole principle this is the same as saying no number appears twice in any row or any column. (If a number appears twice in a row, then there is not enough space left to fit the rest of the numbers in that row.) Here are three Latin squares with $n=4$: $$\mathcal{A} = \begin{bmatrix} 1&2&3&4 \\ 2&1&4&3 \\ 3&4&1&2 \\ 4&3&2&1 \end{bmatrix} \hspace{.5cm} \mathcal{B} = \begin{bmatrix} 1&2&3&4 \\ 3&4&1&2 \\ 4&3&2&1 \\ 2&1&4&3 \end{bmatrix} \hspace{.5cm} \mathcal{C} = \begin{bmatrix} 1&2&3&4 \\ 4&3&2&1 \\ 2&1&4&3 \\ 3&4&1&2 \end{bmatrix}$$

Latin squares are not remotely rare. In fact, they are easy to construct from blank grids: if you do one row at a time, you will never reach a dead end. But orthogonal Latin squares are harder to come by. $\mathcal{A}$, $\mathcal{B}$ and $\mathcal{C}$ are not just any $4{\times}4$ Latin squares: together, they have the particularly elusive property that when any two are superimposed, every ordered pair of numbers is present. Here's $\mathcal{A}$ and $\mathcal{B}$ together:$$\begin{bmatrix}1,1&2,2&3,3&4,4 \\ 2,3&1,4&4,1&3,2 \\ 3,4&4,3&1,2&2,1 \\ 4,2&3,1&2,4&1,3\end{bmatrix}$$I.e. $\mathcal{A}$ and $\mathcal{B}$ are orthogonal. The same can be easily verified for $\mathcal{A}$ and $\mathcal{C}$, and for $\mathcal{B}$ and $\mathcal{C}$. Of course, by the pigeonhole principle, "every ordered pair of numbers appears once" is the same as saying "no ordered pair of numbers appears twice".


A natural* question to ask is how many mutually orthogonal Latin squares can be found.

Definition. Latin squares (of the same order) $\mathcal{A}_1,\ldots,\mathcal{A}_t$ are mutually orthogonal if for $i \ne j$, $\mathcal{A}_i$ and $\mathcal{A}_j$ are orthogonal.

Definition. $N(n)$ is the maximum size of a set of mutually orthogonal Latin squares (MOLS) of order $n$.

We have seen already that $N(4) \ge 3$. In fact, $N(4) = 3$.

Proposition. for $n \ge 2$, $N(n) \le n-1$.

Proof. Note that permuting the numbers of a Latin square (e.g. exchanging $1$s for $2$s and vice versa) preserves orthogonality. Therefore without loss of generality, each $\mathcal{A}_k$ has $[1~2~\cdots~n]$ in the first row.

Now look at, say, the second row** in the first column; let $s_k$ be the number there in $\mathcal{A}_k$. Every $s_k \ne 1$, since each $\mathcal{A}_k$ has a $1$ in the first row of the same column, and the number $1$ cannot appear twice in a column of a Latin square.

Furthermore, the $s_k$ are distinct, otherwise some ordered pair $s,s$ appears in this position and in the first row in the $s$th column, contradicting orthogonality.

So there are at most $n-1$ numbers which can appear in this position, and each $\mathcal{A}_k$ has a different number there. $\square$

The following definition is therefore fitting:

Definition. A set of MOLS of order $n \ge 2$ is complete if the size of the set is $n-1$.

So $\mathcal{A},\mathcal{B},\mathcal{C}$ are a complete set for $n=4$.


The main purpose of this post is to give a proof of the following fantastic theorem,*** due to R. C. Bose in 1938:

Theorem. There is a complete set of mutually orthogonal Latin squares of order $n$ if and only if there is a finite projective plane of order $n$.

The proof is constructive. We'll consider complete Card & Symbol Games (CSGs) with $n+1$ symbols per card, which for $n \ge 2$ are the same thing as FPPs, but the notation and intuition lends itself to a nice visual proof which accomplishes both the "if" and the "only if" in one fell swoop.

Before we attempt a general proof, it's worth trying an example. If we use our complete set $\mathcal{A},\mathcal{B},\mathcal{C}$ for $n=4$, there will be $4^2 + 4 + 1$ cards, each with $4+1$ symbols.

$5$ "special" cards are given below:

$I$ $J$
$\Omega~\sigma_1~\sigma_2~\sigma_3~\sigma_4$ $\Omega~\tau_1~\tau_2~\tau_3~\tau_4$
$z_{\mathcal{A}}$ $z_{\mathcal{B}}$ $z_{\mathcal{C}}$
$\Omega~\alpha_1~\alpha_2~\alpha_3~\alpha_4$ $\Omega~\beta_1~\beta_2~\beta_3~\beta_4$ $\Omega~\gamma_1~\gamma_2~\gamma_3~\gamma_4$

The other $4^2$ cards can be arranged in a $4{\times}4$ grid:

$\sigma_1~\tau_1~\alpha_1~\beta_1~\gamma_1$ $\sigma_1~\tau_2~\alpha_2~\beta_2~\gamma_2$ $\sigma_1~\tau_3~\alpha_3~\beta_3~\gamma_3$ $\sigma_1~\tau_4~\alpha_4~\beta_4~\gamma_4$
$\sigma_2~\tau_1~\alpha_2~\beta_3~\gamma_4$ $\sigma_2~\tau_2~\alpha_1~\beta_4~\gamma_3$ $\sigma_2~\tau_3~\alpha_4~\beta_1~\gamma_2$ $\sigma_2~\tau_4~\alpha_3~\beta_2~\gamma_1$
$\sigma_3~\tau_1~\alpha_3~\beta_4~\gamma_2$ $\sigma_3~\tau_2~\alpha_4~\beta_3~\gamma_1$ $\sigma_3~\tau_3~\alpha_1~\beta_2~\gamma_4$ $\sigma_3~\tau_4~\alpha_2~\beta_1~\gamma_3$
$\sigma_4~\tau_1~\alpha_4~\beta_2~\gamma_3$ $\sigma_4~\tau_2~\alpha_3~\beta_1~\gamma_4$ $\sigma_4~\tau_3~\alpha_2~\beta_4~\gamma_1$ $\sigma_4~\tau_4~\alpha_1~\beta_3~\gamma_2$

It is slow but simple to verify that this is indeed a complete CSG. Notice:

  • Each "cell" card must have one common symbol with each "special" card; since we can't have any more than $5$ cards with the symbol $\Omega$, this means each "cell" card must have exactly one $\sigma$, $\tau$, $\alpha$, $\beta$ and $\gamma$ symbol.
  • The subscripts on the $\sigma$ and $\tau$ symbols are used to determine the row and column of the cell.
  • The subscripts on the $\alpha$, $\beta$ and $\gamma$ symbols are given by the numbers in the Latin squares $\mathcal{A}$, $\mathcal{B}$ and $\mathcal{C}$.
  • The "each number appears once in each row and column" rule for Latin squares, and the "each ordered pair appears once" rule for orthogonal Latin squares, both correspond with the "each pair of symbols appears together on one card" rule for complete CSGs.


The example contains almost all of the insight; the rest is formalism, except for the following lemma. Notice that e.g. the two cards $\newcommand{\card}[1]{\boxed{{\strut}~#1~}} \card{\sigma_2~\tau_3~\alpha_4~\beta_1~\gamma_2}$ and $\card{\sigma_4~\tau_2~\alpha_3~\beta_1~\gamma_4}$ have a common symbol $\beta_1$ - this is a requirement for a complete CSG - but do complete sets of MOLS always satisfy this? It means that given two grid positions not in the same row or column, there must be one Latin square from the complete set of MOLS with the same number in both positions.$$\begin{bmatrix} 1&2&3&4 \\ 2&1&\boxed{4}&3 \\ 3&4&1&2 \\ 4&\boxed{3}&2&1 \end{bmatrix} \hspace{.5cm} \begin{bmatrix} 1&2&3&4 \\ 3&4&\boxed{1}&2 \\ 4&3&2&1 \\ 2&\boxed{1}&4&3 \end{bmatrix} \hspace{.5cm} \begin{bmatrix} 1&2&3&4 \\ 4&3&\boxed{2}&1 \\ 2&1&4&3 \\ 3&\boxed{4}&1&2 \end{bmatrix}$$It's not obvious why this has to happen.

Lemma. If $\mathcal{A}_1,\ldots,\mathcal{A}_{n-1}$ is a complete set of MOLS, $i \ne i'$ and $j \ne j'$, then exactly one $\mathcal{A}_k$ has $(\mathcal{A}_k)_{i,j} = (\mathcal{A}_k)_{i',j'}$.

Proof. Without loss of generality, we can permute symbols so that every $(\mathcal{A}_k)_{i,j} = 1$. Let $l_k$ be the row containing the $1$ in column $j'$ of $\mathcal{A}_k$.

Since they are Latin squares, every $l_k \ne i$. Also, the $l_k$ are distinct: every pair of the Latin squares each has a $1$ in the same cell in column $j$, so by orthogonality they cannot also each have a $1$ in the same cell of column $j'$.

Therefore the $n-1$ distinct values of $l_k$ are each of the $n-1$ rows other than $i$, so one $l_k = i'$. $\square$


OK, now the proof of Bose's theorem. To avoid drowning in a sea of unnecessary variables, the subscript $-$ indicates an unknown number between $1$ and $n$.

Proof. Fix a distinguished symbol $\Omega$, and let $I = \card{\Omega~\sigma_1~\cdots~\sigma_n}$, $J = \card{\Omega~\tau_1~\cdots~\tau_n}$ and $n-1$ cards $z_{\mathcal{A}},z_{\mathcal{B}},\ldots,z_{\mathcal{H}}$ be the cards with symbol $\Omega$, with $z_{\mathcal{A}} = \card{\Omega~\alpha_1~\cdots~\alpha_n}$, $z_{\mathcal{B}} = \card{\Omega~\beta_1~\cdots~\beta_n}$, and so on up to $z_{\mathcal{H}} = \card{\Omega~\eta_1~\cdots~\eta_n}$. Denote these as the special cards.

Note that $\sigma_{-},\tau_{-},\alpha_{-},\ldots,\eta_{-}$ must be distinct: each can only appear once alongside $\Omega$, and there are $n+1$ such distinct cards.

Note that we're just choosing names, not imposing restrictions; the construction works either starting with a complete CSG, or starting from a blank slate in an attempt to construct one.

Proof (continued). We have accounted for $n+1$ cards and all $n^2 + n + 1$ symbols, so there are $n^2$ more cards; denote these as the cell cards. Any cell card $c$ must have exactly one symbol in common with each special card, and by assumption $c$ does not have the symbol $\Omega$. It follows that $c = \card{\sigma_{-}~\tau_{-}~\alpha_{-}~\cdots~\eta_{-}}$.

There must be $n$ cell cards with each $\sigma_{-}$, and $n$ cell cards with each $\tau_{-}$. Moreover, each ordered pair $\sigma_{-},\tau_{-}$ must appear on exactly one card. This gives an enumeration of the cell cards as $c_{i,j}$ for $1 \le i,j \le n$:

$\sigma_1~\tau_1~\alpha_{-}~\cdots~\eta_{-}$ $\cdots$ $\sigma_1~\tau_n~\alpha_{-}~\cdots~\eta_{-}$
$\vdots$ $\ddots$ $\vdots$
$\sigma_n~\tau_1~\alpha_{-}~\cdots~\eta_{-}$ $\cdots$ $\sigma_n~\tau_n~\alpha_{-}~\cdots~\eta_{-}$

Now let $c_{i,j}$'s subscripts $\alpha_{-},\ldots,\eta_{-}$ be the numbers $\mathcal{A}_{i,j},\ldots,\mathcal{H}_{i,j}$ respectively: if the CSG is given, use the subscripts to determine the Latin squares, and vice versa.

We now check that from a complete set of MOLS, the resulting incidence structure is a complete CSG:

  • By construction each card has $n+1$ symbols.
  • $\Omega$ appears on $n+1$ cards. Each $\sigma_{-}$ and each $\tau_{-}$ appear on one special card and on $n$ cell cards, making $n+1$. Each $\alpha_{-}$ appears on one special card and $n$ cell cards since each number appears $n$ times in $\mathcal{A}$, making $n+1$; $\beta_{-},\ldots,\eta_{-}$ are similar.
  • By construction each two special cards have exactly $\Omega$ in common, and each special card has one symbol in common with each cell card. Two cell cards with the same $\sigma_{-}$ do not have a common $\tau_{-}$ and vice versa, and either case has no common $\alpha_{-},\ldots,\eta_{-}$ since each number appears only once in each row and column of $\mathcal{A},\ldots,\mathcal{H}$. Two cell cards with the same $\alpha_{-}$ do not have a common $\beta_{-}$, as no ordered pair appears twice when $\mathcal{A}$ and $\mathcal{B}$ are superimposed; other pairs are similar. The tricky case of showing that two cell cards with no common $\sigma_{-}$ or $\tau_{-}$ do have a common symbol is handled by the lemma.
  • Each two of the same $\sigma_{-},\tau_{-},\alpha_{-},\ldots,\eta_{-}$ appear together on one special card. Each $\sigma_{-},\tau_{-}$ appear together on one cell card by construction. Each $\sigma_{-},\alpha_{-}$ and each $\tau_{-},\alpha_{-}$ appear together on one cell card because each number appears once in each row and each column of $\mathcal{A}$; $\beta,\ldots,\eta$ are similar. Each $\alpha_{-},\beta_{-}$ appear together on one cell card because each ordered pair of numbers appears once when $\mathcal{A}$ and $\mathcal{B}$ are superimposed; other pairs are similar.

Next, we check that from a complete CSG, the result is a complete set of MOLS:

  • Each $\sigma_{-}$ and $\alpha_{-}$ appear together on one card, so each number appears in one row of $\mathcal{A}$; columns and $\mathcal{B},\ldots,\mathcal{H}$ are similar. So the squares are indeed Latin.
  • Each $\alpha_{-}$ and $\beta_{-}$ appear together on one card, so each ordered pair appears once when $\mathcal{A}$ and $\mathcal{B}$ are superimposed; other pairs are similar.
  • The set of MOLS is complete since there are $n-1$ special cards $z_{\mathcal{A}},\ldots,z_{\mathcal{H}}$.

The result follows because for $n \ge 2$, a complete CSG is equivalent to a finite projective plane. $\square$

So as with FPPs, we can systematically construct a complete set of MOLS when $n$ is a power of a prime, it's impossible when $n=6$,**** and $n=12$ is an open problem.


*Mathematicians are greedy, so of course we want as many MOLS as possible.

**This proposition is often stated without the condition that $n \ge 2$; but the proof requires there to be a second row. Indeed, when $n=1$ there is a counterexample in the set containing the unique Latin square of order $1$, all pairs of distinct Latin squares from which are orthogonal.

***The theorem is well known to combinatorists, but proofs seem to be thin on the ground - even Wikipedia (as of writing) says "a set of n - 1 MOLS is equivalent to a finite projective plane of order n" with no citation.

****In fact, $N(6) = 1$.

Card Games and Projective Geometry

A correspondent writes:

A few times, over the past few weeks, I've seen a particular friend of mine, and he's had a game with him, a game called Spot It. It takes the form of a bunch of cards, each of which has eight different symbols on it; the game mechanic depends on the fact that each pair of cards has exactly one symbol in common.


It turns out there are 57 distinct symbols. 42 of them appear on eight cards each, 14 appear on seven cards each, and one appears on six. This looks suspiciously like "57 symbols, 57 cards, each card has eight symbols, each symbol appears on eight cards" with two cards missing.


It's trivial to deduce what the 'missing' cards are, and, with them added, it is then a 57-card deck with the same property; that any two cards have exactly one symbol in common.

Then I noticed that 57 equals 57. That is, that the number of cards in the 'complete' deck equals the number of distinct symbols. Then, I drew up a 57x57 matrix, where one axis is card identity and the other is symbol identity. It looks symmetric enough, and, on checking, it turns out that the "one bit in common" property holds in the other dimension too (which means, mapped back into cards, that, for every pair of symbols, there is exactly one card that has both of them).

Then a few questions (paraphrased):

  1. Why $57$; is there anything special about this number?
  2. If we choose a different number of symbols per card, can we always construct a card game like this? How many cards would there be?
  3. Are there always the same number of cards and symbols in a complete game? If a game has fewer cards than symbols, is it always possible to complete it?
  4. Is it possible to add more cards even if none are "missing"?

I added a question of my own.

  1. Is the incidence matrix actually symmetric (for some permutation of rows and columns)? If so, do all such card games have symmetric incidence matrices?

I found question E interesting because it implies that there is a bijection between cards and symbols with nice properties;* in particular it relates these card games to certain regular graphs of diameter $2$. More on this later.


Here's an example of what we're talking about. The triangle game has cards $a,b,c$ and symbols $\alpha,\beta,\gamma$:

$a$ $b$ $c$
$\beta,\gamma$ $\alpha,\gamma$ $\alpha,\beta$

There are $3$ cards and $3$ symbols, each card has $2$ symbols, each symbol appears on $2$ cards, each pair of cards has one symbol in common, and each pair of symbols appears together on one card. There is a pleasing duality between cards and symbols, and we can write down the dual of this game:

$\alpha$ $\beta$ $\gamma$
$b,c$ $a,c$ $a,b$

In the dual game, the card $\alpha$ has symbols $b,c$ because those were the cards with the symbol $\alpha$ in the original game. Notice that in this case, the dual is essentially the same game: there's a bijection $$\varphi = \{~a \mapsto \alpha,~b \mapsto \beta,~c \mapsto \gamma~\}$$ such that a card $x$ has a symbol $\varphi(y)$ if and only if the card $\varphi(x)$ has the symbol $y$ in the dual game. This relates to the fact that the incidence matrix is symmetric:

  $\begin{matrix} \alpha&\beta&\gamma \end{matrix}$
$\begin{matrix} a \\ b \\ c \end{matrix}$ $\begin{pmatrix} 0&1&1 \\ 1&0&1 \\ 1&1&0 \end{pmatrix}$


It's usually a good idea to come up with some notation.

Definition. An incidence structure is $(C,S,\in)$ where $C$ and $S$ are disjoint finite non-empty sets, and $\in$ is a relation with domain $C$ and range $S$.**

Interpret $C$ as the set of cards, $S$ as the set of symbols, and $x \in \sigma$ as "card $x$ has symbol $\sigma$". The bit about domains and ranges means there are no "blank cards" or "phantom symbols" - every card has a symbol, and every symbol appears on some card.

Definition. For $x : C$, $S_x = \{~\sigma : S~|~x \in \sigma~\}$, and for $\sigma : S$, $C_{\sigma} = \{~x : C~|~x \in \sigma~\}$.

$S_x$ is the set of symbols on a card $x$, and $C_\sigma$ is the set of cards with symbol $\sigma$.

Definition. A Card & Symbol Game (or CSG) is an incidence structure such that

$\newcommand\prop[1]{[\textsf{#1}]}\prop{UNIFORM}$ For some $n \ge 1$, every card has $n+1$ symbols (i.e. $\exists n \ge 1 \bullet \forall x : C \bullet \#S_x = n+1$)

$\prop{1-COMMON}$ Any two distinct cards have a unique common symbol (i.e. $\forall x,y : C~|~x \ne y \bullet \#(S_x \cap S_y) = 1$)

Call this $\prop{CSG} \equiv \prop{UNIFORM} \land \prop{1-COMMON}$.

The use of $n+1$ rather than $n$ is because mathematicians like to go back and change their notation in order to make things easier later, once they know what will make things easier.


Some more definitions will be useful for our investigation:

Definition. An incidence structure is square if there are the same number of cards and symbols (i.e. $\#C = \#S$). Call this $\prop{SQUARE}$.

Definition. An incidence structure is regular if the number of symbols on each card equals the number of cards with each symbol, and this number is at least $2$. Call this $\prop{REGULAR}$.

Definition. Incidence structures $\mathcal{G} = (C,S,\in)$ and $\mathcal{G}' = (C',S',\in')$ are isomorphic (or $\mathcal{G} \cong \mathcal{G}'$) if there are bijections $\varphi : C \to C'$ and $\psi : S \to S'$ such that $x \in \sigma$ if and only if $\varphi(x) \mathrel{\in'} \psi(\sigma)$.

Obviously if $\mathcal{G}$ satisfies a property $\prop{P}$ and $\mathcal{G} \cong \mathcal{G}'$, then $\mathcal{G}'$ also satisfies $\prop{P}$.


We'd like to talk about duals:

Definition. The dual of an incidence structure $\mathcal{G} = (C,S,\in)$ is $\mathcal{G}^\star = (S,C,\ni)$, where $\sigma \ni x$ if and only if $x \in \sigma$.

Definition. $\mathcal{G}$ is self-dual if $\mathcal{G} \cong \mathcal{G}^\star$. Call this $\prop{SELF-DUAL}$.

Obviously $\mathcal{G}^{\star\star} = \mathcal{G}$. It will be useful to have a notation for dual properties:

Meta-definition. For a property $\prop{P}$, an incidence structure $\mathcal{G}$ satisfies $\newcommand\propd[1]{[\textsf{#1}^\star]}\propd{P}$ if and only if $\mathcal{G}^\star$ satisfies $\prop{P}$. For example,

$\propd{UNIFORM}$ For some $n \ge 1$, every symbol appears on $n+1$ cards

$\propd{1-COMMON}$ Any two distinct symbols appear together on a unique card

Obviously $[\mathsf{P}^{\star\star}] \equiv \prop{P}$. Due to the symmetry between cards and symbols in their definitions, $\prop{SQUARE}$, $\prop{REGULAR}$ and $\prop{SELF-DUAL}$ are each equivalent to their dual properties.


We'll be interested in finding more logical connections between these properties; let's write $\newcommand\infer{\vDash} \prop{P} \infer \prop{Q}$ to mean that $\prop{Q}$ can be inferred from $\prop{P}$.

Proposition. $\prop{REGULAR} \equiv \prop{SQUARE} \land \prop{UNIFORM} \land \propd{UNIFORM}$.

Proof. Obviously $\prop{REGULAR} \equiv \prop{UNIFORM} \land \propd{UNIFORM}$ with the same $n$. Consider $\in$ as a set of pairs $x \mapsto \sigma$. Then $\#{\in} = \#C \times \#S_x$ for each $x : C$, and also $\#{\in} = \#S \times \#C_\sigma$ for each $\sigma : S$. Since $\#C_\sigma = \#S_x$, we have $\#C = \#S$.

Conversely, suppose $\#C = \#S$, then $\#C \times \#S_x = \#{\in} = \#S \times \#C_\sigma$ for each $x:C$, $\sigma:S$, so $\#C_\sigma = \#S_x$. $\square$

So if $\mathcal{G}$ is square, uniform and dual-uniform, then the values of $n$ are the same. This is not necessarily the case if $\mathcal{G}$ is not square: for example, the incidence structure $\mathcal{B}_3$ defined by the incidence matrix $$\begin{pmatrix} 0&0&0&~&1&1&1 \\ 0&0&1&~&1&1&0 \\ 0&1&0&~&1&0&1 \\ 0&1&1&~&1&0&0 \\ 1&0&0&~&0&1&1 \\ 1&0&1&~&0&1&0 \\ 1&1&0&~&0&0&1 \\ 1&1&1&~&0&0&0 \end{pmatrix}$$ is uniform with each $\#S_x = 3$ and dual-uniform with each $\#C_\sigma = 4$. (Exercise: generalise this construction to $\mathcal{B}_n$ with $2^n$ cards, $2n$ symbols, each card having $n$ symbols and each symbol appearing on $2^{n-1}$ cards.) This isn't a CSG, as it doesn't satisfy $\prop{1-COMMON}$; another construction gives $\mathcal{C}_4$ with incidence matrix $$\begin{pmatrix} 1&1&1&0&0&0 \\ 1&0&0&1&1&0 \\ 0&1&0&1&0&1 \\ 0&0&1&0&1&1 \end{pmatrix}$$ which generalises to $\mathcal{C}_n$ with $n$ cards, $\frac{1}{2}n(n-1)$ symbols, each card having $n-1$ symbols and each symbol appearing on $2$ cards. (The details of this construction are left as another exercise.) This is a CSG satisfying $\propd{UNIFORM}$ but not $\propd{1-COMMON}$.


Some second-order theorems:

Meta-theorem. $(\prop{P} \infer \prop{Q}) \infer (\propd{P} \infer \propd{Q})$.

Proof. Suppose there is an incidence structure $\mathcal{G}$ satisfying $\propd{P} \land \neg \propd{Q}$. Then $\mathcal{G}^\star$ satisfies $\prop{P} \land \neg \prop{Q}$, a contradiction. $\square$

Meta-theorem. $\prop{SELF-DUAL} \land \prop{P} \infer \propd{P}$.

Proof. If $\mathcal{G}$ satisfies $\prop{P}$ then $\mathcal{G}^\star \cong \mathcal{G}$ satisfies $\propd{P}$. $\square$

These will be useful later.


We turn our attention now to projective geometry. The reason for this will be immediately clear:

Definition. A finite projective plane (or FPP) is an incidence structure $(P,L,\in)$ where $P$ is a set of points, $L$ is a set of lines, $p \in \lambda$ means "point $p$ is on line $\lambda$", and

$\prop{1-COMMON}$ Any two distinct points are on a unique common line

$\propd{1-COMMON}$ Any two distinct lines meet at a unique common point

$\prop{NON-DEGENERATE}$ There are four distinct points such that no three are on any line

Call this $\prop{FPP} \equiv \prop{1-COMMON} \land \propd{1-COMMON} \land \prop{NON-DEGENERATE}$.

Definition. $P_\lambda$ is the set of points on a line $\lambda$, and $L_p$ is the set of lines through a point $p$, as before.

The words "point" and "line" are deliberately undefined - we can interpret them in a geometric sense (if we're generous about how straight a line has to be):


This is the Fano plane, the smallest FPP. But we could instead define "point" to mean card, and "line" to mean symbol, and then we can draw the Fano plane as a card game diagram:

$a$ $b$ $c$ $d$ $e$ $f$ $g$
$\alpha,\beta,\gamma$ $\alpha,\delta,\epsilon$ $\alpha,\zeta,\eta$ $\beta,\delta,\eta$ $\beta,\epsilon,\zeta$ $\gamma,\epsilon,\eta$ $\gamma,\delta,\zeta$

It's easy to check that the Fano plane is a CSG, and isomorphic to its dual (which is also an FPP),

$\alpha$ $\beta$ $\gamma$ $\delta$ $\epsilon$ $\zeta$ $\eta$
$a,b,c$ $a,d,e$ $a,f,g$ $b,d,g$ $b,e,f$ $c,e,g$ $c,d,f$

via the isomorphism $\varphi = \{~a \mapsto \alpha,~b \mapsto \beta,~\ldots,~g \mapsto \eta~\}$ and $\psi = \varphi^{-1}$.

In fact, every FPP is a CSG, and its dual is also an FPP. Let's prove this.

Theorem. The dual of an FPP is an FPP (i.e. $\prop{FPP} \infer \propd{FPP}$).

Proof. We only need to check $\propd{NON-DEGENERATE}$.

Choose $p,q,r,s : P$ as in $\prop{NON-DEGENERATE}$. By $\prop{1-COMMON}$, there are lines $\lambda,\mu,\nu,\xi : L$ joining $p$ to $q$, $q$ to $r$, $r$ to $s$ and $s$ to $p$ respectively. Suppose, without loss of generality, that $\lambda,\mu,\nu$ meet at a single point: by $\propd{1-COMMON}$ the only point on both $\lambda$ and $\mu$ is $q$, and so $\nu$ contains all three of $q,r,s$, a contradiction. $\square$

By a meta-theorem, it follows that $(\prop{FPP} \infer \prop{P}) \infer (\prop{FPP} \infer \propd{P})$. I.e., if every FPP satisfies some property $\prop{P}$, then every FPP also satisfies $\propd{P}$.

Lemma. Every line in an FPP contains at least $3$ points.

Proof. Choose $p,q,r,s : P$ as in $\prop{NON-DEGENERATE}$, and without loss of generality let $p \notin \lambda$ and $q \notin \lambda$. By $\prop{1-COMMON}$ there are $\mu,\nu,\xi : L_p$ with $q \in \mu$, $r \in \nu$ and $s \in \xi$, which are distinct by $\prop{NON-DEGENERATE}$.

Then by $\propd{1-COMMON}$ there are $t,u,v : P_\lambda$ with $t \in \mu$, $u \in \nu$ and $v \in \xi$ which are distinct as each pair of $\mu,\nu,\xi$ already meet at $p$. $\square$

By the dual lemma, every point lies on at least $3$ lines.

Lemma. For any two points in an FPP, there is a line containing neither.

Proof. By $\prop{1-COMMON}$ choose $\mu : L$ with $p \in \mu$ and $q \in \mu$. By the lemma, there is a point $r \in P_\mu - p - q$. By $\prop{NON-DEGENERATE}$ there is a point $s : P - r$ with $s \notin \mu$, so choose $\lambda : L$ with $r \in \lambda$ and $s \in \lambda$. By $\propd{1-COMMON}$, $\lambda$ and $\mu$ only meet at $r$, so $p \notin \lambda$ and $q \notin \lambda$. $\square$

By the dual lemma, for any two lines, there is a point on neither.

Theorem. Every FPP is a CSG (i.e. $\prop{FPP} \infer \prop{CSG}$).

Proof. We have $\prop{1-COMMON}$ already, so we only need to check $\prop{UNIFORM}$: that every point is on the same number of lines.

We will show that given any point $p : P$ and line $\lambda : L$ with $p \notin \lambda$, the number of lines through $p$ equals the number of points on $\lambda$. Given $\mu : L_p$, $\mu \ne \lambda$ so by $\propd{1-COMMON}$ there is a unique point $q : P_\lambda$ with $q \in \mu$. Conversely, given $q : P_\lambda$, by $\prop{1-COMMON}$ there is a unique $\mu : L_p$ with $q \in \mu$. This gives a bijection $: L_p \to P_\lambda$.

Then given distinct points $p,q : P$, by the lemma there is a line $\lambda : L$ with $p \notin \lambda$ and $q \notin \lambda$, so $\#L_p = \#P_\lambda = \#L_q$. $\square$

Notice that the proof also gives us $\prop{FPP} \infer \prop{REGULAR}$, as $\#L_p = \#P_\lambda$.


The problem right now is that the dual of a CSG is not necessarily a CSG; $\mathcal{C}_n^\star$ satisfies $\prop{UNIFORM}$ but not $\prop{1-COMMON}$, for example. Worse still, call this one $\mathcal{L}_5$:

$a$ $b$ $c$ $d$ $e$
$\alpha,\Omega$ $\beta,\Omega$ $\gamma,\Omega$ $\delta,\Omega$ $\epsilon,\Omega$

This CSG is uninteresting because the common symbol is always $\Omega$. Nonetheless, it's a CSG by our definition, and the dual game

$\Omega$ $\alpha$ $\beta$ $\gamma$ $\delta$ $\epsilon$
$a,b,c,d,e$ $a$ $b$ $c$ $d$ $e$

satisfies neither $\prop{UNIFORM}$ nor $\prop{1-COMMON}$. Of course we could change our definition of CSGs to include $\propd{UNIFORM}$ and $\propd{1-COMMON}$, so that the dual of a CSG is always a CSG. But this is unsatisfying and may be too strict: it would disallow "incomplete" CSGs, as deleting cards from a CSG preserves $\prop{UNIFORM}$ and $\prop{1-COMMON}$ but neither $\propd{UNIFORM}$ nor $\propd{1-COMMON}$.

A better approach would be to try to isolate what it means for a CSG to have "missing" cards, and then relate this to "completeness". We'll consider a CSG to be "complete" if it satisfies $\propd{UNIFORM}$ and $\propd{1-COMMON}$, i.e. every symbol appears on the same number of cards, and every pair of symbols appears together, as in the completed original game. Conveniently, this is exactly $\propd{CSG}$.

Definition. A CSG is complete if its dual is a CSG. Call this $\prop{COMPLETE} \equiv \prop{CSG} \land \propd{CSG}$.

Definition. An extension of an incidence structure $\mathcal{G} = (C,S,\in)$ is an incidence structure $\mathcal{G}' = (C',S',\in')$ for which $C \subseteq C'$, $S \subseteq S'$, and $\in$ is the restriction of $\in'$ to $C$ and $S$ (i.e. ${\in} \subseteq {\in'}$ as sets of pairs). The extension is non-trivial if $\mathcal{G}' \ne \mathcal{G}$.

An expansion is an extension such that $\in$ is the restriction of $\in'$ to $C$. Write $\mathcal{G} \le \mathcal{G}'$ for an expansion.

A completion is an expansion which is a complete CSG.

Proposition. If $\mathcal{G} \le \mathcal{G}'$ is a non-trivial expansion, then $C \ne C'$.

Proof. Suppose $C = C'$. Then $\in$ is the restriction of $\in'$ to its own domain, so ${\in} = {\in'}$ and hence $S' =$ range of ${\in'} =$ range of ${\in} = S$, and the expansion is trivial; a contradiction. $\square$

Definition. An incidence structure is CSG-expansible if it has a non-trivial expansion which is a CSG. Call this $\prop{CSG-EXPANSIBLE}$.

An incidence structure is completable if it has a completion. Call this $\prop{COMPLETABLE}$.

So extensions and expansions may add more cards and more symbols, but an extension might add new symbols to existing cards, whereas an expansion cannot change any existing cards. An expansion is trivial if and only if it adds no cards. A completion need not be non-trivial, so $\prop{COMPLETE} \infer \prop{COMPLETABLE}$.


Now let's try to answer some of the questions.

Theorem. $\prop{1-COMMON} \land \prop{REGULAR} \infer \#C = n^2 + n + 1$.

Proof. Fix a card $x : C$. Given any other card $y : C - x$, by $\prop{1-COMMON}$ there is a unique $\sigma : S_x$ for which $y$ is in $C_{\sigma} - x$. Therefore $\{x\}$ and the $C_{\sigma} - x$ form a partition of $C$.

By $\prop{REGULAR}$, $\# S_x = n+1$, and each $\# (C_{\sigma} - x) = n$. Therefore $\#C = 1 + n(n+1)$. $\square$

This answers question A: $57 = 7^2 + 7 + 1$.

Theorem. An incidence structure is a complete CSG if and only if it is either an FPP or a triangle.

Proof. Since complete CSGs satisfy $\prop{1-COMMON}$ and $\propd{1-COMMON}$, we only need to consider complete CSGs which don't satisfy $\prop{NON-DEGENERATE}$. These are degenerate finite projective planes, all of which are classified:

  • The empty set. This is not an incidence structure by our definition.
  • A single line, with any number of points. This fails $\prop{UNIFORM}$, as we need $n \ge 1$.
  • As before, but one of the points is on any number of additional lines. This still fails $\prop{UNIFORM}$.
  • A single point, with any number of lines. This fails $\propd{UNIFORM}$, as we need $n \ge 1$.
  • As before, but one of the lines contains any number of additional points. This still fails $\propd{UNIFORM}$.
  • A line $\mu$ with $C_{\mu} = \{p_1,\ldots,p_{n+1}\}$, and an additional point $q$ with $S_q = \{\lambda_1,\ldots,\lambda_{n+1}\}$, where each $C_{\lambda_i} = \{p_i, q\}$. This satisfies $\prop{UNIFORM}$ only when $n=1$, giving the triangle.

Conversely, $\prop{FPP} \infer \prop{CSG}$ and by a meta-theorem $\prop{FPP} \infer \propd{FPP} \infer \propd{CSG}$, so every FPP is a complete CSG, as is the triangle. $\square$

This answers the first part of question B: all FPPs with $n \le 10$ are known, and there is no FPP with $n=6$, so there is no complete CSG with $7$ symbols per card, for example. $n$ is called the order of the FPP.

Corollary. Every complete CSG is regular.

Proof. $\prop{FPP} \infer \prop{REGULAR}$, and the triangle is regular. $\square$

This answers the second part of question B: if there is a complete CSG with $n+1$ symbols per card, then it has $n^2 + n + 1$ cards.

Corollary. Every complete CSG is square.***

Proof. $\prop{REGULAR} \infer \prop{SQUARE}$. $\square$

This answers question C: yes, a complete CSG must have equal numbers of cards and symbols; and no, there are CSGs which are not completable. For example, a CSG with $1$ card and $7$ symbols is not completable, though it has "missing cards" in the sense that it is CSG-expansible; however, it has a CSG-expansion which is not CSG-expansible.**** In contrast, $\mathcal{L}_n$ is not completable for $n \ge 3$, but $\mathcal{L}_n \le \mathcal{G}$ is a non-trivial CSG-expansion exactly when $\mathcal{G} \cong \mathcal{L}_{n'}$ for some $n' > n$; so however many cards are added, with whatever new symbols, more can still be added!

Some things are known about completability: for example, Dow showed in 1982 that, in other words, any CSG with $\#S = n^2 + n + 1$ and $\#C > n^2 - 2\sqrt{n+3} + 6$ is completable, and if $\#C > n^2 - n + 1$ then the completion is unique (up to isomorphism).

Theorem. $\prop{COMPLETE} \infer \neg \prop{CSG-EXPANSIBLE}$.

Proof. Let $\mathcal{G} \le \mathcal{G}'$ be a non-trivial expansion, and fix $x : C$ and $y : C' \setminus C$. Let $\chi : S$ be the common symbol between $x$ and $y$. There is another symbol $\sigma : S_x - \chi$, and since $G$ satisfies $\propd{UNIFORM}$ there is another card $w : C_\sigma - x$.

Suppose $G'$ satisfies $\prop{1-COMMON}$. Let $\tau : S$ be the unique common symbol between $w$ and $y$. Since $\mathcal{G}$ satisfies $\propd{1-COMMON}$ there is a card $z : C_\chi - x$ with $z \in \tau$. Then $y$ and $z$ have two common symbols $\chi$ and $\tau$, a contradiction. Therefore $\mathcal{G}'$ is not a CSG. $\square$

This answers question D: no, if the CSG is complete then we can't add more cards and still have a CSG.


Question E asks if an incidence matrix is equal to its own transpose. This is apparently a stronger condition than $\prop{SELF-DUAL}$, which only requires the incidence matrix to be some permutation of its transpose.

Definition. An incidence structure is self-polar if there is a bijection $\varphi : C \to S$ such that $x \in \varphi(y)$ if and only $y \in \varphi(x)$. Call this $\prop{SELF-POLAR}$.

Proposition. $\prop{SELF-POLAR} \infer \prop{SELF-DUAL}$.

Proof. $\psi = \varphi^{-1}$ will do. $\square$

Proposition. Every self-dual CSG is complete.

Proof. By meta-theorem, $\prop{SELF-DUAL} \land \prop{CSG} \infer \propd{CSG}$. $\square$

FPPs can be systematically constructed using vector spaces over finite fields, yielding FPPs where $n$ can be any prime power. (It's an open problem whether there are any FPPs where $n$ is not a prime power.) FPPs constructed this way are always self-polar; FPPs constructed by other methods are not necessarily self-dual, and apparently it's another open problem as of 2009 whether every self-dual FPP is self-polar.

To answer question E: the particular card game introduced above is a complete CSG with $8$ symbols per card, so it is isomorphic to the unique FPP with $n=7$, which is self-polar. However, there are complete CSGs which are not self-dual, the smallest two of which are isomorphic to a dual pair of FPPs with $n=9$, so they have $91$ cards each with $10$ symbols.

For general incidence structures, $\prop{SELF-POLAR}$ is strictly stronger than $\prop{SELF-DUAL}$. In 2003, Brouwer et al. found the smallest possible self-dual incidence structure which is not self-polar:

$a$ $b$ $c$ $d$ $e$ $f$ $g$
$\alpha,\beta,\gamma,\epsilon$ $\alpha,\beta,\delta,\zeta$ $\alpha,\zeta,\eta$ $\beta,\epsilon,\eta$ $\alpha,\gamma$ $\beta,\delta$ $\gamma,\delta$

There are $7$ others of the same size, up to isomorphism.


Returning to the start of this post, our correspondent also noted that (in other words) there is a one-to-one correspondence between CSGs and constant-weight codes for which each codeword has Hamming weight $n+1$ and each pair has Hamming distance $2n$. We've shown that complete CSGs are almost exactly the same thing as finite projective planes; another closely related area is the study of symmetric block designs, which (for $\lambda = 1$) are exactly the same thing as complete CSGs, although this result is a theorem rather than by definition.

I said we'd come back to regular graphs with diameter $2$. If the incidence matrix of a CSG is symmetric (i.e. the CSG is self-polar), then we can interpret it as the adjacency matrix of a graph in which $x,y:C$ are joined by an edge if and only if $x \in \varphi(y)$. In this context,

  • $\prop{UNIFORM}$ means the graph is $n+1$-regular, i.e. each vertex has $n+1$ edges;
  • $\prop{1-COMMON}$ means each pair of distinct vertices $x,y:C$ are joined by a unique route of length $2$;
  • $\prop{SELF-POLAR}$ means the graph is undirected.

For $x \in \varphi(x)$, the graph has an edge from $x$ to itself, and self-edges may be used to form routes.***** This gives a correspondence between self-polar CSGs and graphs of this form. Here's the graph corresponding to the Fano plane:


However, the graph produced depends on which polarity $\varphi$ is used. Automorphisms of the graph correspond to conjugates of $\varphi$, but in general the polarities may not form a single conjugacy class in the group of automorphisms of the FPP.

Another thing you can do with CSGs is colour the edges of complete graphs such that the monochrome sets are themselves complete graphs. The colour of an edge joining two cards is given by their unique common symbol. Here's the Fano plane again, with the symbols $\alpha,\beta,\ldots,\eta$ in order of hue:


Here each monochrome set is complete on $3$ vertices, because the Fano plane is dual-uniform; and each pair of colours meets at one vertex, because the Fano plane is dual-1-common. So that's nice.


*Combinatorics would greatly benefit from a theorem which says "if two sets appear near each other, have the same cardinality, and this cardinality is an unusually mundane number like $57$, then there is a natural bijection with some nice properties which explains why this has happened."

**Combinatorists may recognise that the usual definition of incidence structure does not require that everything has some incidence. The relation symbol is $\in$ because we can use a set-theoretic model where $S \subseteq \mathbb{P} C$. Astute combinatorists may recognise that the usual definition of a uniform incidence structure is the dual of the definition given here.

***Ryser's theorem gives the converse: every square CSG is complete!

****This follows because Ryser's theorem gives a strict bound $\#C < n^2 + n + 1$, so it cannot be expanded indefinitely.

*****The friendship theorem implies that the only such graph with no self-edges is the triangle. This is a corollary of the fact that every polarity $\varphi$ of an FPP has an absolute point $x:P$ such that $x \in \varphi(x)$.