TeraFLOPS

Floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.^[1]

For such cases, it is a more accurate measure than measuring instructions per second.^{[citation needed]}

Floating-point arithmetic

Floating-point arithmetic is needed for very large or very small real numbers, or computations that require a large dynamic range. Floating-point representation is similar to scientific notation, except everything is carried out in base two, rather than base ten. The encoding scheme stores the sign, the exponent (in base two for Cray and VAX, base two or ten for IEEE floating point formats, and base 16 for IBM Floating Point Architecture) and the significand (number after the radix point). While several similar formats are in use, the most common is ANSI/IEEE Std. 754-1985. This standard defines the format for 32-bit numbers called single precision, as well as 64-bit numbers called double precision and longer numbers called extended precision (used for intermediate results). Floating-point representations can support a much wider range of values than fixed-point, with the ability to represent very small numbers and very large numbers.^[2]

Dynamic range and precision

The exponentiation inherent in floating-point computation assures a much larger dynamic range – the largest and smallest numbers that can be represented – which is especially important when processing data sets where some of the data may have extremely large range of numerical values or where the range may be unpredictable. As such, floating-point processors are ideally suited for computationally intensive applications.^[3]

Computational performance

FLOPS and MIPS are units of measure for the numerical computing performance of a computer. Floating-point operations are typically used in fields such as scientific computational research, as well as in machine learning. However, before the late 1980s floating-point hardware (it's possible to implement FP arithmetic in software over any integer hardware) was typically an optional feature, and computers that had it were said to be "scientific computers", or to have "scientific computation" capability. Thus the unit MIPS was useful to measure integer performance of any computer, including those without such a capability, and to account for architecture differences, similar MOPS (million operations per second) was used as early as 1970^[4] as well. Note that besides integer (or fixed-point) arithmetics, examples of integer operation include data movement (A to B) or value testing (If A = B, then C). That's why MIPS as a performance benchmark is adequate when a computer is used in database queries, word processing, spreadsheets, or to run multiple virtual operating systems.^[5]^[6] In 1974 David Kuck coined the terms flops and megaflops for the description of supercomputer performance of the day by the number of floating-point calculations they performed per second.^[7] This was much better than using the prevalent MIPS to compare computers as this statistic usually had little bearing on the arithmetic capability of the machine on scientific tasks.

FLOPS on an HPC-system can be calculated using this equation:^[8]

{\text{FLOPS}}={\text{racks}}\times {\frac {\text{nodes}}{\text{rack}}}\times {\frac {\text{sockets}}{\text{node}}}\times {\frac {\text{cores}}{\text{socket}}}\times {\frac {\text{cycles}}{\text{second}}}\times {\frac {\text{FLOPs}}{\text{cycle}}}.

This can be simplified to the most common case: a computer that has exactly 1 CPU:

{\text{FLOPS}}={\text{cores}}\times {\frac {\text{cycles}}{\text{second}}}\times {\frac {\text{FLOPs}}{\text{cycle}}}.

FLOPS can be recorded in different measures of precision, for example, the TOP500 supercomputer list ranks computers by 64 bit (double-precision floating-point format) operations per second, abbreviated to FP64.^[9] Similar measures are available for 32-bit (FP32) and 16-bit (FP16) operations.

Floating-point operations per clock cycle for various processors

Floating-point operations per clock cycle per core^[10]
Microarchitecture	Instruction set architecture	FP64	FP32	FP16
Intel CPU
Intel 80486	x87 (32-bit)	?	0.128^[11]	?
Intel P5 Pentium Intel P6 Pentium Pro	x87 (32-bit)	?	0.5^[11]	?
Intel P5 Pentium MMX Intel P6 Pentium II	MMX (64-bit)	?	1^[12]	?
Intel P6 Pentium III	SSE (64-bit)	?	2^[12]	?
Intel NetBurst Pentium 4 (Willamette, Northwood)	SSE2 (64-bit)	2	4	?
Intel P6 Pentium M	SSE2 (64-bit)	1	2	?
Intel NetBurst Pentium 4 (Prescott, Cedar Mill) Intel NetBurst Pentium D (Smithfield, Presler) Intel P6 Core (Yonah)	SSE3 (64-bit)	2	4	?
Intel Core (Merom, Penryn) Intel Nehalem^[13] (Nehalem, Westmere)	SSSE3 (128-bit) SSE4 (128-bit)	4	8	?
Intel Atom (Bonnell, Saltwell, Silvermont and Goldmont)	SSE3 (128-bit)	2	4	?
Intel Sandy Bridge (Sandy Bridge, Ivy Bridge)	AVX (256-bit)	8	16	0
Intel Haswell^[13] (Haswell, Devil's Canyon, Broadwell) Intel Skylake (Skylake, Kaby Lake, Coffee Lake, Comet Lake, Whiskey Lake, Amber Lake)	AVX2 & FMA (256-bit)	16	32	0
Intel Xeon Phi (Knights Corner)	IMCI (512-bit)	16	32	0
Intel Skylake-X (Skylake-X, Cascade Lake) Intel Xeon Phi (Knights Landing, Knights Mill) Intel Ice Lake, Tiger Lake and Rocket Lake	AVX-512 & FMA (512-bit)	32	64	0
AMD CPU
AMD Bobcat	AMD64 (64-bit)	2	4	0
AMD Jaguar AMD Puma		4	8	0
AMD K10	SSE4/4a (128-bit)	4	8	0
AMD Bulldozer^[13] (Piledriver, Steamroller, Excavator)	AVX (128-bit) (Bulldozer, Steamroller) AVX2 (128-bit) (Excavator) FMA3 (Bulldozer)^[14] FMA3/4 (Piledriver, Excavator)	4	8	0
AMD Zen (Ryzen 1000 series, Threadripper 1000 series, Epyc Naples) AMD Zen+^[13]^[15]^[16]^[17] (Ryzen 2000 series, Threadripper 2000 series)	AVX2 & FMA (128-bit, 256-bit decoding)^[18]	8	16	0
AMD Zen 2^[19] (Ryzen 3000 series, Threadripper 3000 series, Epyc Rome)) AMD Zen 3 (Ryzen 5000 series, Epyc Milan)	AVX2 & FMA (256-bit)	16	32	0
ARM CPU
ARM Cortex-A7, A9, A15	ARMv7	1	8	0
ARM Cortex-A32, A35	ARMv8	2	8	0
ARM Cortex-A53, A55, A57,^[13] A72, A73, A75	ARMv8	4	8	0
ARM Cortex-A76, A77, A78	ARMv8	8	16	0
ARM Cortex-X1	ARMv8	16	32	?
Qualcomm Krait	ARMv8	1	8	0
Qualcomm Kryo (1xx - 3xx)	ARMv8	2	8	0
Qualcomm Kryo (4xx - 5xx)	ARMv8	8	16	0
Samsung Exynos M1 and M2	ARMv8	2	8	0
Samsung Exynos M3 and M4	ARMv8	3	12	0
IBM PowerPC A2 (Blue Gene/Q)	?	8	8 (as FP64)	0
Hitachi SH-4^[20]^[21]	SH-4	1	7	0
Nvidia GPU
Nvidia Curie (GeForce 6 series and GeForce 7 series)	PTX	?	8	?
Nvidia Tesla 2.0 (GeForce GTX 260–295)	PTX	?	2	?
Nvidia Fermi (only GeForce GTX 465–480, 560 Ti, 570–590)	PTX	1/4 (locked by driver, 1 in hardware)	2	0
Nvidia Fermi (only Quadro 600–2000)	PTX	1/8	2	0
Nvidia Fermi (only Quadro 4000–7000, Tesla)	PTX	1	2	0
Nvidia Kepler (GeForce (except Titan and Titan Black), Quadro (except K6000), Tesla K10)	PTX	1/12 (for GK110: locked by driver, 2/3 in hardware)	2	0
Nvidia Kepler (GeForce GTX Titan and Titan Black, Quadro K6000, Tesla (except K10))	PTX	2/3	2	0
Nvidia Maxwell Nvidia Pascal (all except Quadro GP100 and Tesla P100)	PTX	1/16	2	1/32
Nvidia Pascal (only Quadro GP100 and Tesla P100)	PTX	1	2	4
Nvidia Volta^[22]	PTX	1	2 (FP32) + 2 (INT32)	16
Nvidia Turing (only GeForce 16XX)	PTX	1/16	2 (FP32) + 2 (INT32)	4
Nvidia Turing (all except GeForce 16XX)	PTX	1/16	2 (FP32) + 2 (INT32)	16
Nvidia Ampere^[23]^[24] (only Tesla A100/A30)	PTX	2	2 (FP32) + 2 (INT32)	32
Nvidia Ampere (all GeForce and Quadro, Tesla A40/A10)	PTX	1/32	2 (FP32) + 0 (INT32) or 1 (FP32) + 1 (INT32)	8
AMD GPU
AMD TeraScale 1 (Radeon HD 4000 series)	TeraScale 1	0.4	2	?
AMD TeraScale 2 (Radeon HD 5000 series)	TeraScale 2	1	2	?
AMD TeraScale 3 (Radeon HD 6000 series)	TeraScale 3	1	4	?
AMD GCN (only Radeon Pro W 8100–9100)	GCN	1	2	?
AMD GCN (all except Radeon Pro W 8100–9100, Vega 10–20)	GCN	1/8	2	4
AMD GCN Vega 10	GCN	1/8	2	4
AMD GCN Vega 20 (only Radeon VII)	GCN	1/2 (locked by driver, 1 in hardware)	2	4
AMD GCN Vega 20 (only Radeon Instinct MI50 / MI60 and Radeon Pro VII)	GCN	1	2	4
AMD RDNA^[25]^[26] AMD RDNA 2	RDNA	1/8	2	4
AMD RDNA3	RDNA	1/8?	4	8?
AMD CDNA	CDNA	1	4 (Tensor)^[27]	16
AMD CDNA 2	CDNA 2	4 (Tensor)	4 (Tensor)	16
Intel GPU
Intel Xe-LP (Iris Xe MAX)^[28]	Xe	1/2?	2	4
Intel Xe-HPG (Arc Alchemist)^[28]	Xe	0	2	16
Intel Xe-HPC (Ponte Vecchio)^[29]	Xe	2	2	32
Qualcomm GPU
Qualcomm Adreno 5x0	Adreno 5xx	1	2	4
Qualcomm Adreno 6x0	Adreno 6xx	1	2	4
Graphcore
Graphcore Colossus GC2^[30]^[31]	?	0	16	64
Graphcore Colossus GC200 Mk2^[32] Graphcore Bow-2000^[33]	?	0	32	128
Supercomputer
ENIAC @ 100 kHz in 1945	Zdroj:https://en.wikipedia.org?pojem=TeraFLOPS Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok. Podrobnejšie informácie nájdete na stránke Podmienky použitia. Navigácia: Veda > Analytika Antropológia Aplikované vedy Bibliometria Dejiny vedy Encyklopédie Filozofia vedy Forenzné vedy Humanitné vedy Knižničná veda Kryogenika Kryptológia Kulturológia Literárna veda Medzidisciplinárne oblasti Metódy kvantitatívnej analýzy Metavedy Metodika Metodológia vedy Náboženstvo a veda Náučná literatúra Podvody vo vede Popularizácia vedy Potravinárstvo Prírodné vedy Pseudoveda Scientometria Spoločenské vedy Teórie Teatrológia Technické vedy Technika Terminológia Umenie Výskum Veda Veda a technika podľa štátu Veda a technika podľa kontinentu Veda a technika podľa roka Veda v kozme Vedci Vedecká literatúra Vedecké databázy Vedecké experimenty Vedecké konferencie Vedecké metódy Vedecké ocenenia Vedecké organizácie Vedecké parky Vedeckí spisovatelia Vzdelávanie Záhady Príbuzné výrazy: Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok. Podrobnejšie informácie nájdete na stránke Podmienky použitia. www.astronomia.sk \| www.biologia.sk \| www.botanika.sk \| www.dejiny.sk \| www.economy.sk \| www.elektrotechnika.sk \| www.estetika.sk \| www.farmakologia.sk \| www.filozofia.sk \| Fyzika \| www.futurologia.sk \| www.genetika.sk \| www.chemia.sk \| www.lingvistika.sk \| www.politologia.sk \| www.psychologia.sk \| www.sexuologia.sk \| www.sociologia.sk \| www.veda.sk I www.zoologia.sk

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]