Fast inverse square root

Fast inverse square root, sometimes referred to as Fast InvSqrt() or by the hexadecimal constant 0x5F3759DF, is an algorithm that estimates ${\textstyle {\frac {1}{\sqrt {x}}}}$ , the reciprocal (or multiplicative inverse) of the square root of a 32-bit floating-point number $x$ in IEEE 754 floating-point format. The algorithm is best known for its implementation in 1999 in Quake III Arena, a first-person shooter video game heavily based on 3D graphics. With subsequent hardware advancements, especially the x86 SSE instruction rsqrtss, this algorithm is not generally the best choice for modern computers,^[1] though it remains an interesting historical example.^[2]

The algorithm accepts a 32-bit floating-point number as the input and stores a halved value for later use. Then, treating the bits representing the floating-point number as a 32-bit integer, a logical shift right by one bit is performed and the result subtracted from the number 0x5F3759DF, which is a floating-point representation of an approximation of ${\textstyle {\sqrt {2^{127}}}}$ .^[3] This results in the first approximation of the inverse square root of the input. Treating the bits again as a floating-point number, it runs one iteration of Newton's method, yielding a more precise approximation.

History

William Kahan and K.C. Ng at Berkeley wrote an unpublished paper in May 1986 describing how to calculate the square root using bit-fiddling techniques followed by Newton iterations.^[4] In the late 1980s, Cleve Moler at Ardent Computer learned about this technique^[5] and passed it along to his coworker Greg Walsh. Greg Walsh devised the now-famous constant and fast inverse square root algorithm. Gary Tarolli was consulting for Kubota, the company funding Ardent at the time, and likely brought the algorithm to 3dfx Interactive circa 1994.^[6]^[7]

Jim Blinn demonstrated a simple approximation of the inverse square root in a 1997 column for IEEE Computer Graphics and Applications.^[8] Reverse engineering of other contemporary 3D video games uncovered a variation of the algorithm in Activision's 1997 Interstate '76.^[9]

Quake III Arena, a first-person shooter video game, was released in 1999 by id Software and used the algorithm. Brian Hook may have brought the algorithm from 3dfx to id Software.^[6] A discussion of the code appeared on the Chinese developer forum CSDN in 2000,^[10] and Usenet and the gamedev.net forum spread the code widely in 2002 and 2003.^[11] Speculation arose as to who wrote the algorithm and how the constant was derived; some guessed John Carmack.^[7] Quake III's full source code was released at QuakeCon 2005, but provided no answers. The authorship question was answered in 2006 when Greg Walsh contacted Beyond3D as their speculation gained popularity on Slashdot.^[6]

In 2007 the algorithm was implemented in some dedicated hardware vertex shaders using field-programmable gate arrays (FPGA).^[12]^[13]

Motivation

The inverse square root of a floating point number is used in digital signal processing to normalize a vector, scaling it to length 1 to produce a unit vector.^[14] For example, computer graphics programs use inverse square roots to compute angles of incidence and reflection for lighting and shading. 3D graphics programs must perform millions of these calculations every second to simulate lighting. When the code was developed in the early 1990s, most floating point processing power lagged the speed of integer processing.^[7] This was troublesome for 3D graphics programs before the advent of specialized hardware to handle transform and lighting. Computation of square roots usually depends upon many division operations, which for floating point numbers are computationally expensive. The fast inverse square generates a good approximation with only one division step.

The length of the vector is determined by calculating its Euclidean norm: the square root of the sum of squares of the vector components. When each component of the vector is divided by that length, the new vector will be a unit vector pointing in the same direction. In a 3D graphics program, all vectors are in three-dimensional space, so ${\boldsymbol {v}}$ would be a vector $(v_{1},v_{2},v_{3})$ .

\|{\boldsymbol {v}}\|={\sqrt {v_{1}^{2}+v_{2}^{2}+v_{3}^{2}}}

is the Euclidean norm of the vector.

{\boldsymbol {\hat {v}}}={\frac {\boldsymbol {v}}{\left\|{\boldsymbol {v}}\right\|}}

is the normalized (unit) vector. Substituting ${\textstyle \|{\boldsymbol {v}}\|}$ , the equation can also be written as:

{\boldsymbol {\hat {v}}}={\frac {\boldsymbol {v}}{\sqrt {v_{1}^{2}+v_{2}^{2}+v_{3}^{2}}}}

which relates the unit vector to the inverse square root of the distance components. The inverse square root can be used to compute ${\boldsymbol {\hat {v}}}$ because this equation is equivalent to

{\boldsymbol {\hat {v}}}={\boldsymbol {v}}\,{\frac {1}{\sqrt {v_{1}^{2}+v_{2}^{2}+v_{3}^{2}}}}

where the fraction term is the inverse square root of $v_{1}^{2}+v_{2}^{2}+v_{3}^{2}$ .

At the time, floating-point division was generally expensive compared to multiplication; the fast inverse square root algorithm bypassed the division step, giving it its performance advantage.

Overview of the code

The following code is the fast inverse square root implementation from Quake III Arena, stripped of C preprocessor directives, but including the exact original comment text:^[15]

float Q_rsqrt(float number)
{
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y  = number;
  i  = * ( long * ) &y;                       // evil floating point bit level hacking
  i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
  y  = * ( float * ) &i;
  y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
  // y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

  return y;
}

At the time, the general method to compute the inverse square root was to calculate an approximation for ${\textstyle {\frac {1}{\sqrt {x}}}}$ , then revise that approximation via another method until it came within an acceptable error range of the actual result. Common software methods in the early 1990s drew approximations from a lookup table.^[16] The key of the fast inverse square root was to directly compute an approximation by utilizing the structure of floating-point numbers, proving faster than table lookups. The algorithm was approximately four times faster than computing the square root with another method and calculating the reciprocal via floating-point division.^[17] The algorithm was designed with the IEEE 754-1985 32-bit floating-point specification in mind, but investigation from Chris Lomont showed that it could be implemented in other floating-point specifications.^[18]

The advantages in speed offered by the fast inverse square root trick came from treating the 32-bit floating-point word^{[note 1]} as an integer, then subtracting it from a "magic" constant, 0x5F3759DF.^[7]^[19]^[20]^[21] This integer subtraction and bit shift results in a bit pattern which, when re-defined as a floating-point number, is a rough approximation for the inverse square root of the number. One iteration of Newton's method is performed to gain some accuracy, and the code is finished. The algorithm generates reasonably accurate results using a unique first approximation for Newton's method; however, it is much slower and less accurate than using the SSE instruction rsqrtss on x86 processors also released in 1999.^[1]^[22]

Worked example

As an example, the number $x=0.15625$ can be used to calculate ${\textstyle {\frac {1}{\sqrt {x}}}\approx 2.52982}$ . The first steps of the algorithm are illustrated below:

0011_1110_0010_0000_0000_0000_0000_0000  Bit pattern of both x and i
0001_1111_0001_0000_0000_0000_0000_0000  Shift right one position: (i >> 1)
0101_1111_0011_0111_0101_1001_1101_1111  The magic number 0x5F3759DF
0100_0000_0010_0111_0101_1001_1101_1111  The result of 0x5F3759DF - (i >> 1)

Interpreting as IEEE 32-bit representation:

0_01111100_01000000000000000000000  1.25 × 2⁻³
0_00111110_00100000000000000000000  1.125 × 2⁻⁶⁵
0_10111110_01101110101100111011111  1.432430... × 2⁶³
0_10000000_01001110101100111011111  1.307430... × 2¹

Reinterpreting this last bit pattern as a floating point number gives the approximation $y=2.61486$ , which has an error of about 3.4%. After one single iteration of Newton's method, the final result is $y=2.52549$ , an error of only 0.17%.

Avoiding undefined behavior

According to the C standard, reinterpreting a floating point value as an integer by casting then dereferencing the pointer to it is not valid (undefined behavior). Another way would be to place the floating point value in an anonymous union containing an additional 32-bit unsigned integer member, and accesses to that integer provides a bit level view of the contents of the floating point value. However, type punning through a union is also undefined behavior in C++.

# include <stdint.h> // uint32_t

float Q_rsqrt(float number)
{
  union {
    float    f;
    uint32_t i;
  } conv = { .f = number };
  conv.i  = 0x5f3759df - (conv.i >> 1);
  conv.f *= 1.5F - (number * 0.5F * conv.f * conv.f);
  return conv.f;
}

In modern C++, the usual method for implementing this function's casts is through C++20's std::bit_cast. This also allows the function to work in constexpr context:

# include <bit>
# include <limits>
# include <cstdint>

constexpr float Q_rsqrt(float number) noexcept
{
  static_assert(std::numeric_limits<float>::is_iec559); // (enable only on IEEE 754)

  float const y = std::bit_cast<float>(
    0x5f3759df - (std::bit_cast<std::uint32_t>(number) >> 1));
  return y * (1.5f - (number * 0.5f * y * y));
}

Algorithm

The algorithm computes ${\textstyle {\frac {1}{\sqrt {x}}}}$ by performing the following steps:

Alias the argument $x$ to an integer as a way to compute an approximation of the binary logarithm ${\textstyle \log _{2}(x)}$
Use this approximation to compute an approximation of ${\textstyle \log _{2}\left({\frac {1}{\sqrt {x}}}\right)=-{\frac {1}{2}}\log _{2}(x)}$

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[note 1]

[19]

[20]

[21]

[22]