RSA is probably the most famous public key cryptosystem. (As is traditional in maths, RSA is named after three people who weren't the first to discover it.) RSA is usually explained with an example - for example:

  • Alice chooses two prime numbers, for example, $p = 7$ and $q = 11$.
  • Alice computes $n$ $= pq$ $= 7 \times 11$ $= 77$, and $t$ $= (p-1)(q-1)$ $= 6 \times 10$ $= 60$.
  • Alice finds a number $e = 13$ which is coprime to $t$, and its inverse $d$ $\equiv e^{-1}$ $\equiv 37 \pmod{t}$, e.g. using the Extended Euclidean algorithm.
  • Alice publishes $n$ and $e$, but keeps $d$ private.
  • Bob encrypts a message $m$ $= 24$ to send to Alice, by computing $c$ $\equiv m^e$ $\equiv 24^{13}$ $\equiv 52 \pmod{n}$. Bob sends the encrypted message $c$ to Alice.
  • Alice decrypts $c$ by computing $c^d$ $\equiv 52^{37}$ $\equiv 24$ $\equiv m \pmod{n}$. Therefore Alice has recovered Bob's message.

But this is not an explanation; it's a demonstration. The student will not learn why RSA works simply by considering examples, and could be forgiven for imagining that RSA is a "black box" which seems to work although nobody is really sure why it should.

One argument could be that since $d$ is "$1/e \pmod{t}$", it makes sense that $c^d \equiv c^{1/e}$ is the $e$th root of $c$, and of course this would be $(m^e)^{1/e} \equiv m$, recovering the original message. This isn't really satisfying by itself, though, and leaves plenty unanswered:

  1. Where did $t$ come from? Weren't we supposed to be working $\renewcommand{\mod}{\operatorname{mod} \,} \mod n$?
  2. Why is it safe to publish $n$ and $e$?
  3. How do we know that $x^{1/e} = {}^e \!\!\!\sqrt{x}$ still works in modular arithmetic, or that we'll even get the right $e$th root? There could be more than one, after all, and they'd each encrypt to the same $c$.

 

Actually, RSA is very well-understood, and the mathematics behind it is far from impenetrable. But to start with, let's consider the simplest possible cryptosystem: the Caesar cipher.

undefined

The picture above shows almost everything there is to know about how the Caesar cipher works; in this case, the encryption process is "go forward $3$ steps". We could say the encryption key used is $e = 3$. What it doesn't show is what to do when we get to the end of the alphabet:

undefined

There we go - the alphabet cycles around on itself. Another advantage of this picture over the first is that it shows the "go forward $e$ steps" operation as the more fundamental "go forward a step" operation iterated $e$ times. This might seem to be a pointless observation, but it means the algebraic structure of the system (or "where the arrows go") does not depend on $e$. This fact is hidden in the first picture, because its arrows were specific to the case $e = 3$.

 

There are a few (equivalent) methods to decrypt a message encrypted this way:

  1. Go backward $e$ steps.
  2. Take another $26 - e$ steps forward.
  3. Encrypt it another $25$ times.

Method C is a bit less obvious, but it follows from the fact that taking $e$ steps forward $26$ times is the same as taking $26$ steps forward $e$ times; of course, this is the same as doing nothing $e$ times. It's much more effort, so we'd use A or B in practice, but the fact that it works is important.

The Caesar cipher is the simplest cryptosystem to use, and it's also the simplest to attack. But it's even more unsuitable as a public key cryptosystem: the information necessary to encrypt a message is sufficient to decrypt one. So Alice must keep $e$ secret, and Bob cannot send encrypted messages unless he is in on the secret.

RSA has more in common with the Caesar cipher than one might think. It is based on the same algebraic structure: a directed cycle. In mathematical notation, we could relabel the letters A to Z as the numbers $0$ to $25$, and the "go forward a step" operation could be $x \mapsto x+1 \pmod{26}$ - or "add $1$, but wrap around back to $0$ when you reach $26$".

 

 

The modulus $n$ in the RSA algorithm is chosen to be the product of two prime numbers. A prime number $p$ can only be written as $p = ab$ if either $a=1$ or $b=1$ (but not both).* Because multiplication gives us the number of items in a grid, a visual interpretation of this is that if $p$ items are arranged in a grid, they must form either a single row or a single column:

undefined

$7$ and $5$ are prime numbers; they can only form "trivial" grids.

undefined

$12$ is not a prime number; it can be made into a grid in several ways.

For our purposes, it's important that when there is only one way to form a non-trivial grid out of a number $n$, (i.e. $n=pq$ for two primes $p,q$), then finding this grid (i.e. finding $p$ and $q$) gets more and more difficult as $p$ and $q$ get larger. We have a few methods which are better than trial and error, but for prime numbers with more than a few hundred digits, the problem is believed to be unsolvable in practice - and not for lack of trying.

 

Prime factors don't appear from nowhere: if $p$ is a factor of $ab$ then $p$ must be a factor of either $a$ or $b$ (or both).* They also don't disappear; if $p$ is a factor of $a$, then $p$ is a factor of $ab$. Working $\mod p$, these give us the familiar rule $ab \equiv 0$ if and only if $a \equiv 0$ or $b \equiv 0 \pmod{p}$ - i.e. we can't multiply non-zero residues and get zero.

To illustrate this, we can partition the residues $\mod p$ into two sets which are closed under multiplication: if you multiply two residues of the same colour, the result will have that colour. Green means "no factor of $p$": the product of two such residues cannot have a factor of $p$, so it must be green too. Red means "has a factor of $p$": the only such residue $\mod p$ is $0$.

undefined

E.g. $3$ and $6$ are green, so $3 \times 6 \equiv 4 \pmod{7}$ is also green.

We are no longer talking about addition, but multiplication; the cyclic structure defined by the "add $1$" operation is gone. It is perhaps surprising, then, that the green set has another cyclic structure. Whatever prime $p$ we choose, there is always a primitive element $\omega$ for which the operation $x \mapsto \omega x \pmod{p}$ generates a cycle visiting all of the green residues:

undefined

E.g. $\omega \equiv 3 \pmod{7}$ produces this cyclic structure. $\omega \equiv 5$ also works.

The arrows now show the result of multiplying by $\omega$, rather than adding $1$. Rearranging the residues makes this clearer:

undefined

Note that, for example, multiplying by $2$ (forward two steps) is the same as multiplying by $3$ (forward a step) twice, as $3^2 \equiv 2 \pmod{7}$. If our step was $x \mapsto 2x \pmod{7}$ instead, we'd get this picture:

undefined

This operation produces two separate cycles instead. Importantly, they have the same length; in the language of group theory, they are the cosets of the subgroup generated by $2$. The cosets are always the same size, because you can shift one along to get any other. So the size of the green set must be a multiple of the length of a cycle; in this case, $6$ is a multiple of $3$. (This is Lagrange's theorem.) This matters because our earlier observation still holds: if we "go forward $e$ steps" $6$ times then we'll get back where we started, whatever $e$ is - even if the cycle length is not $6$. "Go forward $e$ steps $6$ times" is the same as "go forward $e$ steps $3$ times" $2$ times, or "do nothing" twice.

 

Is multiplication $\mod p$ good enough for a public key cryptosystem? No. If a message is encrypted by "taking $e$ steps forward" (i.e. multiplying by $\omega^e \pmod{p}$), all three decryption methods still work:

  1. Going backward $e$ steps: to undo multiplication by $\omega^e$, we can "divide" by $\omega^e$. This means finding a multiplicative inverse $(\omega^e)^{-1} \pmod{p}$ which can be calculated e.g. by the Extended Euclidean algorithm.
  2. Taking another $p-1-e$ steps forward: because only one element is red, the length of the green cycle is $p-1$. In total we'll have made $e + (p-1-e) = p-1$ steps around the cycle, so we recover the original message.
  3. Encrypting again $p-2$ times: again, this is the less obvious one, but we get $m \omega^e \cdots \omega^e$ $\equiv m \omega^{(p-1)e}$ $\equiv m \times 1^e$ $\equiv m \pmod{p}$, as $\omega^{p-1} \equiv 1 \pmod{p}$. This is Fermat's little theorem, but really it's the same logic as before; Taking $e$ steps forward $p-1$ times is the same as taking $p-1$ steps forward $e$ times, which gets us back where we started.

Note that the information necessary to encrypt a message ($p$ and $\omega^e$) is again sufficient to decrypt it.

 

For RSA, we work $\mod pq$. In this context we have e.g. $pq \equiv 0$ but $p \not\equiv 0$ and $q \not\equiv 0 \pmod{pq}$, so the familiar rule about $0$ no longer applies; $p$ and $q$ (and their multiples) are zero divisors. Because prime factors can't just appear or disappear under multiplication, if residues $\mod pq$ have (or don't have) factors of $p$ (or $q$) then they can't lose or gain this factor by multiplication.** This means we can partition the residues into four sets closed under multiplication:

  • Those with no factor of $p$ or $q$ (green),
  • Those with a factor of $p$ but not $q$ (blue),
  • Those with a factor of $q$ but not $p$ (yellow),
  • Those with factors of both $p$ and $q$ - i.e. only $0$ (red).

In fact, by the Chinese remainder theorem, we can treat residues $\mod pq$ as pairs of independent residues $\mod p$ and $\mod q$; the zero divisors are the $\mod pq$ residues for which one of these is $0$ (blue and yellow). This can be demonstrated (but not explained) by arranging our $pq$ items into a grid, where multiplication can be done per row and per column:

undefined

For $p=7$, $q=11$. The residues $\mod 11$ form a cycle generated by $2$.

(As noted earlier, if $p$ and $q$ are large enough, then finding this grid is practically impossible for an attacker who only knows $n$.)

The residues $\mod p$ and $\mod q$ give co-ordinates in the space of residues $\mod pq$. When arranged according to these co-ordinates, the numbers seem random - but the Chinese remainder theorem guarantees us that multiplication acts on the co-ordinates independently.

Again, if two residues have the same colour, their product must have that colour. It's clear that the number of green residues is $t$ $= (p-1)(q-1)$, so that answers question 1. Let's restrict ourselves only to these - most of the residues are green***, and we can do as much multiplication as we want without leaving the green set. What does multiplication look like?

undefined

This picture shows the cycles generated by $24$ (left) and $65$ (right). $24 \equiv 3 \pmod{7}$ and $2 \pmod{11}$, so $x \mapsto 24x \pmod{77}$ has an effect $\mod 7$ of "take a step right", and the effect $\mod 11$ is "take a step down". The operation $x \mapsto 65x \pmod{77}$ is $2$ steps right and $5$ steps down. Because the structures $\mod 7$ and $\mod 11$ each cycle back around to $1$, taking a step right from the last column "wraps around" to the first column, and similarly once we reach the bottom of the grid, we wrap around to the top.

For clarity, only one cycle is shown in each picture (in dark green); the other residues (in light green) form their own cycles.**** Importantly, Lagrange's theorem still holds: the cycles are all shifted copies of each other, so they're all the same length, and this length must be a factor of the size of the green set, $t$.

 

This is almost good enough for a public key cryptosystem. The reason for this is we're doing multiplication $\mod n$, but the cycle length is not $n$ - it's some factor of $t$ $= (p-1)(q-1)$. We can publish $n = pq$ and keep $t$ secret; the attacker can no longer go all the way around the cycle to decrypt a message, because he doesn't know how far this is. Finding $p$ and $q$ is practically impossible, so the attacker can't use them to find $t$.

In fact, an attacker knowing only $n$ cannot practically find $t$ any other way, either: as $n = pq$ and $n-t+1$ $= p+q$, it follows that $p$ and $q$ are the solutions to the quadratic equation $x^2 - (n-t+1)x + n$ $= 0$, so they can be found by the quadratic formula - a feat which we assumed to be practically impossible. This answers question 2.

 

Still, choosing a step like $x \mapsto 24x$ and iterating it $e$ times is not enough to encrypt a message. Although decryption methods B and C are now unavailable to an attacker (they would need to know the cycle length in order to go all the way round), method A still works - $24^{-1} \pmod{n}$ can be found without knowing $p$ and $q$.

What we need is a "one step forward" operation which can't be undone without knowing some secret. Undoing addition or multiplication is straightforward - students are expected to subtract and divide without the use of calculators, and these are only a little harder to do $\mod n$. Exponentiation, on the other hand, is hard to undo - finding $k$th roots or logarithms (i.e. undoing $x \mapsto x^k$ or $x \mapsto k^x$) for real numbers generally requires a calculator, and it turns out these are even harder to do $\mod n$ when $n$ is very large.

 

So the encryption process for RSA is modular exponentiation: a message $m$ is encrypted as $c$ $\equiv m^e \pmod{n}$. This means we "go forward $e$ steps", but each step is $x \mapsto mx \pmod{n}$, where $m$ is the message, and rather than starting at $m$ we start at $1$. Taking the earlier cycle generated by $m = 24$, and taking $e = 13$ steps forward:

undefined

$c \equiv m^e \equiv 24^{13} \equiv 52 \pmod{77}$

Now we are multiplying by $m$ at each step rather than $e$, decryption method A is no longer available - we'd need to know $m$ in order to divide by it! So now the cryptosystem is secure - the attacker doesn't know $m$ so he can't find $m^{-1}$ to take steps backward; in fact, knowing $m$ is required even to do $x \mapsto mx$ to take a step forward. So an attacker cannot decrypt $c$ to find $m$.

 

There is a problem: how does the intended recipient do it? It appears to be impossible - without knowing $m$ already, taking steps in the cycle is impossible, even if we know the secrets $p$, $q$ and $t$.

However, this is not quite true. We don't know how to take one step, but $c \equiv m^e \pmod{n}$ is enough information to take $e$ steps; multiplying by $c$ takes us another $e$ steps forward. We could go $e$ steps backward from $c$ by dividing (i.e. multiplying by $c^{-1} \pmod{n}$), but this would recover the residue $1$, since that's where the cycle started - not $m$.

The solution is method C: take $e$ steps forward (i.e. multiply by $c$) again the correct number of times. What is the correct number? We don't know the cycle length, but we know $t = 60$ is a multiple of it; so $t$ steps forward is the same as doing nothing. We could do it another $t-1$ times as in the Caesar cipher, but this again gets us to the start of the cycle, $1$, rather than $m$ which is one step further.

If we go $e$ steps forward $d$ times, we'll arrive at the residue which is $ed$ steps into the cycle. Therefore, we'd like $ed$ $\equiv 1 \pmod{t}$, as taking $t$ steps forward some number of times does nothing, and then one more step gets us to $m$. Assuming $e = 13$ was a sensible choice (so this equation has a solution), we can find $d \equiv e^{-1} \pmod{t}$ by "division"; the Extended Euclidean algorithm yields $d = 37$. Therefore, if we start at $1$ and go $e$ steps forward (i.e. multiply by $c$) $d$ times, we'll get to $m$: this gives $m \equiv c^d$ $\equiv 52^{37}$ $\equiv 24 \pmod{n}$, recovering the original message.

 

Actually, there is another way of interpreting this. Multiplication and exponentiation are hard to think about $\mod n$; by taking logarithms, we can turn them into addition and multiplication. In fact, we already took logarithms earlier, when we rearranged the numbers $\mod 7$:

undefined

Our "take a step forward" operation $x \mapsto 3x \pmod{7}$ becomes the much simpler $a \mapsto a+1 \pmod{6}$, where $a \equiv \log_3 x$. Similarly $b \equiv \log_2 x$ turns $x \mapsto 2x \pmod{11}$ into $b \mapsto b+1 \pmod{10}$. What about the $(\mod p, \mod q)$ co-ordinates from the Chinese remainder theorem?

undefined

$x \pmod{77}$ is represented as a vector $\mathbf{x}$ $= (\log_3 x, \log_2 x) \pmod{\mathbf{t}}$, where $\mathbf{t}$ $= (p-1,q-1)$. To go the other way, a vector $(a,b)$ represents the residue $45^a \times 57^b \pmod{77}$, since $45$ and $57$ represent "one step right" and "one step down" respectively.

In this co-ordinate system, multiplication $\mod pq$ becomes vector addition $\mod \mathbf{t}$, and exponentiation becomes scalar multiplication $\mod \mathbf{t}$. In particular, the encryption process $m \mapsto m^e \pmod{n}$ becomes $\mathbf{m} \mapsto e \mathbf{m} \pmod{\mathbf{t}}$, and the decryption process $c \mapsto c^d \pmod{n}$ becomes $\mathbf{c} \mapsto d \mathbf{c} \pmod{\mathbf{t}}$: from this perspective, encryption is multiplication by $e$, so decryption is simply division by $e$. Multiplicative inverses are always unique; this answers question 3. Now you know!

 

*The first of these definitions of prime numbers is taught in schools, but the second is the proper definition; the first is actually the definition of irreducible. It so happens that prime and irreducible are equivalent in $\mathbb{Z}$; in some contexts they are not.

**This is not the case when $p,q$ are not factors of $n$: clearly, $2 \times 5 \equiv 3 \pmod{7}$ has neither a factor $2$ nor $5$.

***For large $p,q$, the message $m$ is extremely unlikely to fall outside the green set. Finding such an $m$ would allow an attacker to find $p$ and $q$ by computing the highest common factor of $m$ and $n$ (e.g. by Euclid's algorithm); but we assumed this to be practically impossible.

****For $p,q>2$, no residue $m$ can generate a cycle visiting the whole green set. This is because a direct product of cyclic groups is cyclic if and only if their orders are coprime, but $p-1$ and $q-1$ have a common factor $2$. Indeed, by parity considerations, the index of a cyclic subgroup must always be even, so not every factor of $t$ is a possible cycle length. This would justify choosing $d$ $\equiv e^{-1} \pmod{t/2}$ rather than $\mod t$, which would make decryption marginally faster; in our example, we would find $c^7$ instead of $c^{37}$, saving $3$ iterations in an efficient algorithm - but for large primes, on average only $1$ iteration would be saved. Using the Chinese remainder theorem to invert $e$ $\mod p-1$ and $\mod q-1$ is much better.