Efficiently Approximating tan x
There are usually many equivalent ways to define any given concept in mathematics; we usually try to choose a definition which makes things better or easier for ourselves. However, that depends on what "better or easier" means: for example, defining the scalar product as $\mathbf{u}\cdot\mathbf{v}$ $= \newcommand\abs[1]{\left|#1\right|} \abs{\mathbf{u}}\abs{\mathbf{v}} \cos \theta$ (where $\theta$ is the angle between the vectors) makes it easier to solve geometry problems on paper, whereas $\mathbf{u}\cdot\mathbf{v}$ $= \sum u_i v_i$ lets us compute using $+$ and $\times$ rather than the much slower $\abs{~}$ and $\cos$.
Case in point, $\tan x$ is usually defined as ${\sin x} / {\cos x}$, which is convenient for all sorts of things on pen and paper. However, it's also widely used in (programming) libraries for fast approximate maths: for example, fastapprox* has the following code as its fastest $\tan$ approximation:
static inline float fastertan (float x)
{
return fastersin (x) / fastercos (x);
}
This clearly saves a lot of work for the mathematician, but since the goal is fast accurate approximation by computer, there seem to be two problems:
- It's twice as slow as the approximations for $\sin$ and $\cos$. (We'll assume $\cos$ is about as slow as $\sin$, even if it isn't calculated as $\cos x$ $= \sin(\frac{\pi}{2} - x)$.)
- It's about twice** as inaccurate as the approximations for $\sin$ and $\cos$, because the errors from each call can accumulate.
Let's call this approximation $T_1(x)$ $\approx \tan x$. Concretely, $T_1$ requires $9$ multiplications, $5$ additions and $1$ division (plus $4$ fast bitwise operations to make it work for $x < 0$). Its absolute and relative errors are shown below:
Is there a better way?
Let's try to come up with a formula which approximates $\tan$ directly, instead of via approximations of $\sin$ and $\cos$. For our formula to be computationally efficient, we should use operations which a computer processor can do quickly: addition, subtraction, multiplication and division (so, no square roots or suchlike). We will go hunting through the space of all possible formulae, which sounds unnavigable - any sequence of operations in any order, using any numbers, produces such a formula. Fortunately, the space is well-understood mathematically: it's the rational functions $\mathbb{R}(x)$.
These functions form a field, which roughly means you can add, subtract, multiply or divide any two formulae and you get another formula. Every rational function can be written as a ratio of polynomials $f(x) = p(x)/q(x)$; intuitively this is because without division, we have the space of polynomials $\mathbb{R}[x]$, dividing polynomials gives us ratios, and adding, subtracting, multiplying, or dividing ratios gives us other ratios. (Conveniently, every formula can be evaluated with at most one division - the slowest of these operations on actual processors.)
By the fundamental theorem of algebra, almost everything we would like to know about a rational function is determined by its zeroes (where $f(x) = 0$, so the graph $y = f(x)$ crosses the $x$ axis) and its poles (where $q(x) = 0$, so the graph $y = f(x)$ approaches a vertical asymptote).*** Specifically, if we know where the zeroes and poles should be, and their orders, then $f(x)$ is uniquely determined as a rational function (up to multiplication by a constant).
A reasonable domain for $T_2(x) \approx \tan x$ would be $-\frac{\pi}{2}$ $< x$ $< \frac{\pi}{2}$; $\tan$ is periodic, so any other $x$ can be mapped into this domain. A brief consideration of ${\sin x} / {\cos x}$ tells us that $T_2$ will need one simple zero at $x$ $=0$, and simple poles at $x$ $= \pm \frac{\pi}{2}$. By our earlier observation, the only such $T_2$ (up to multiplication by a constant) is $$T_2(x) = \frac{x}{1 - \tfrac{4}{\pi^2}x^2}$$and a reasonable choice for this constant would be $1$, so that $T_2'(0)$ $= \tan' 0$ $= 1$. $T_2$ requires only $2$ multiplications, $1$ addition, and $1$ division. How accurate is it?
Not very accurate at all. Of course, this is hardly surprising since $T_2$ requires so few operations. Where can we go from here?
The error is generally positive: we could multiply by a constant $\lambda$ to make it negative for some $x$, lowering the average error. The error is also much greater for larger $x$: we could take advantage of the fact that $\tan x$ $= 1 / \tan(\frac{\pi}{2} - x)$, and branch. This gives $$T^A_2(x) = \begin{cases}\dfrac{\lambda x}{1 - \frac{4}{\pi^2}x^2} & x \le \frac{\pi}{4} \\ \dfrac{1 - \frac{4}{\pi^2}(\frac{\pi}{2} - x)^2}{\lambda (\frac{\pi}{2} - x)} & x > \frac{\pi}{4} \end{cases}$$These improvements cost $1$ comparison, $\frac{1}{2}$ an addition on average, and if we're clever, no additional multiplications or divisions.
The errors for $\lambda_1$ $= 1/T_2(\frac{\pi}{4})$ (blue) and $\lambda_2$ $= \sqrt{\lambda_1}$ (red) are shown; the relative error can be bounded to about $\pm 4.5\%$, or $\pm 2.3\%$ if we don't mind a discontinuous approximation (unfortunately, we usually do). $T^A_2$ is about $\frac{1}{3}$ or $\frac{2}{3}$ as accurate as $T_1$ depending on this choice.
But $T_1$ also takes $4.5$ times as many multiplications. This is encouraging: the accuracy of a best rational approximation is better than linear in the allowed "size" of the denominator. Intuitively, doubling the number of operations we use will allow us to achieve far more than twice the accuracy, because there are far more than twice as many formulae to choose from.
So the next step must be allowing more operations; this means allowing the polynomials $p$ and $q$ to have higher degrees. If we do this, the problem becomes under-constrained - we already have simple zeroes/poles where we want them, and we even got to choose the gradient at $x=0$. Therefore we can choose even more constraints.
We could include more zeroes and poles of $\tan$, but really this means making $p$ a better approximation of $\sin$ and $q$ a better approximation of $\cos$, and we'd eventually recover $T_1$ (or something like it). The alternative is to constrain the gradient at more points: we already did at the zero; what if we do so at the poles?
It's not immediately clear this even makes sense; the gradient at a pole is undefined. However, the gradient of the reciprocal is defined, and this affects how the function approaches the pole. For all of our approximations so far, the relative error at the pole is non-zero, and the absolute error diverges: this means the reciprocal has the wrong gradient there.
For our next attempt, let's write $$T_3(x) = \dfrac{p(x)}{q(x)} = \dfrac{\alpha x + \gamma x^3}{1 - \frac{4}{\pi^2}x^2}$$I.e. we use the same $q$, but allow $p$ to be cubic; note that since $\tan$ is an odd function, $p$ has no quadratic term.**** Our constraints are $T_3'(0)$ $= 1$ and $(1 / T_3)'(\frac{\pi}{2})$ $= \cot' \frac{\pi}{2}$ $= -1$. (Note that this implies a similar constraint on the other pole.) A slightly-less-brief diversion via the quotient rule (omitted here) yields $\alpha$ $= 1$ and $\gamma$ $= \frac{32}{\pi^4} - \frac{4}{\pi^2}$. How did we do?
Very well! The worst of the noise is not actually error, but error in measuring the error - due to taking the difference or ratio between two very large (or very small) numbers. In fact, without floating point rounding errors, the absolute error is bounded! As for the relative error, $T_3$ is about $4.3$ times as accurate as $T_1$ - yet it requires $6$ fewer multiplications, $3$ fewer additions, and no bitwise operations to get it to work for $x<0$. Here's the C code:
// approximation of tan with bounded absolute error
static inline float T3 (float x)
{
static const float pisqby4 = 2.4674011002723397f;
static const float oneminus8bypisq = 0.1894305308612978f;
float xsq = x*x;
return x * (pisqby4 - oneminus8bypisq * xsq) / (pisqby4 - xsq);
}
Again, because the relative error is positive, we can give another approximation $T^A_3$ with half the maximum relative error (making it about $8.6$ times as accurate as $T_1$), with the same cost as $T_3$, by introducing a constant factor $\lambda$ $= 1.001737577360199$ (found by numerical approximation):
// approximation of tan with lower relative error
static inline float TA3 (float x)
{
static const float pisqby4 = 2.4674011002723397f;
static const float adjpisqby4 = 2.471688400562703f;
static const float adj1minus8bypisq = 0.189759681063053f;
float xsq = x * x;
return x * (adjpisqby4 - adj1minus8bypisq * xsq) / (pisqby4 - xsq);
}
I post these here as public domain code. For comparison, here are the relative errors for $T_1$ (red) and $T^A_3$ (blue):
*This is not to pick on fastapprox in particular; this was just the first result on Google. Plenty of other implementations do this too.
**In the worst case, the maximum relative errors from each call might arise from the same input, producing twice the maximum error. In the average case, if the relative errors from the evaluations of $\sin$ and $\cos$ are independent then the error in $\tan$ will be something like $\sqrt{2}$ times as large. Either way, we're getting less accuracy even though we're doing more work.
***There's a subtlety here because $f(x)$ might have complex roots and/or poles, which wouldn't appear as $x$-axis-crossings or vertical asymptotes on the graph of $y$ $= f(x)$ over the real numbers. They generally appear as stationary points; but $y$ $= \tan x$ doesn't have any, so let's not consider them.
****Of course, if we use bitwise operations to get the sign right, then $T(x)$ $\approx \tan x$ need not be an odd function. This could improve accuracy, but not for free - the extra $\beta x^2$ term would require another addition and another multiplication.