Intrinsics Overview

Article
02/01/2013

Microsoft Specific

3DNow! technology provides up to 26 additional instructions to support high-performance 3D graphics and audio processing. 3DNow! instructions are vector instructions that operate on 64-bit registers. 3DNow! instructions are SIMD; each instruction operates on pairs of 32-bit values. See 3DNow! Intrinsics for the reference documentation for the AMD intrinsics.

Vector instructions operate in parallel on two sets of 32-bit single-precision, floating-point words. Scalar instructions operate on a single set of 32-bit operands (from the low halves of the two 64-bit operands).

The 3DNow! single-precision, floating-point format is compatible with the IEEE 754 single-precision format. This format comprises a 1-bit sign, an 8-bit biased exponent, and a 23-bit significand with one hidden integer bit for a total of 24 bits in the significand. The bias of the exponent is 127, consistent with the IEEE single-precision standard. The significands are normalized to be within the range of [1,2).

In contrast to the IEEE standard that dictates four rounding modes, 3DNow! technology supports one rounding mode, as either round-to-nearest or round-to-zero (truncation). The hardware implementation of 3DNow! technology determines the rounding mode. The AMD processors implement round-to-nearest mode. Regardless of the rounding mode used, the floating-point-to-integer and integer-to-floating-point conversion instructions, PF2ID and PI2FD, always use the round-to-zero (truncation) mode.

The largest representable normal number in magnitude for this precision in hexadecimal has an exponent of FEh and a significand of 7FFFFFh, with a numerical value of 2127 (2 – 2–23). All results that overflow above the maximum representable positive value are saturated to either this maximum representable normal number or to positive infinity. Similarly, all results that overflow below the minimum representable negative value are saturated to either this minimum representable normal number or to negative infinity.

The implementation of 3DNow! technology determines how arithmetic overflow is handled, as either properly signed maximum or minimum representable normal numbers or properly signed infinities. The processor generates properly signed maximum or minimum representable normal numbers.

Infinities and NANs are not supported as operands to 3DNow! instructions.

The smallest representable normal number in magnitude for this precision in hexadecimal has an exponent of 01h and a significand of 000000h, with a numerical value of 2–126. Accordingly, all results below this minimum representable value in magnitude are held to zero. The following table shows the exponent ranges supported by the 3DNow! technology.

3DNow! Technology Exponent Ranges

Biased exponent	Description
FFh	Unsupported. Unsupported numbers can be used as operands. The results of operations with unsupported numbers are undefined.
00h	Zero.
00h<x<FFh	Normal.
01h	2 (1–127) lowest possible exponent.
FEh	2 (254–127) largest possible exponent.

Like MMX instructions, 3DNow! instructions do not generate numeric exceptions or set any status flags. It is the user's responsibility to ensure that in-range data is provided to 3DNow! instructions and that all computations remain within valid ranges (or are held as expected).

The register operations of all 3DNow! floating-point instructions are executed by either the register X unit or the register Y unit. One operation can be issued to each register unit at each clock cycle for a maximum issue and execution rate of two 3DNow! operations per cycle.

Normally, in high-performance 3DNow! code, all 3DNow! instructions are properly scheduled apart from each other to avoid delays caused by execution resource contentions (as well as taking into account dependencies and execution latencies).

For further information regarding code optimization on the AMD-K6 processor, see the AMD-K6 Processor Code Optimization Application Note, order number 21924. This document provides in-depth discussions of code optimization techniques for the processor.

For execution resources information on the AMD Athlon processor, refer to the AMD Athlon Processor x86 Code Optimization Guide, order number 22007.

The 3DNow! performance enhancement instructions for AMD processors are summarized in the following tables.

AMD 3DNow! Floating-Point Instructions

Operation	Function	Opcode
PAVGUSB	Packed 8-bit unsigned integer averaging	BFh
PFADD	Packed floating-point addition	9Eh
PFSUB	Packed floating-point subtraction	9Ah
PFSUBR	Packed floating-point reverse subtraction	Aah
PFACC	Packed floating-point accumulate	Aeh
PFCMPGE	Packed floating-point comparison, greater or equal	90h
PFCMPGT	Packed floating-point comparison, greater	A0h
PFCMPEQ	Packed floating-point comparison, equal	B0h
PFMIN	Packed floating-point minimum	94h
PFMAX	Packed floating-point maximum	A4h
PI2FD	Packed 32-bit integer to floating-point conversion	0Dh
PF2ID	Packed floating-point to 32-bit integer	1Dh
PFRCP	Packed floating-point reciprocal approximation	96h
PFRSQRT	Packed floating-point reciprocal square root approximation	97h
PFMUL	Packed floating-point multiplication	B4h
PFRCPIT1	Packed floating-point reciprocal first iteration step	A6h
PFRSQIT1	Packed floating-point reciprocal square root first iteration step	A7h
PFRCPIT2	Packed floating-point reciprocal/reciprocal square root second iteration step	B6h
PMULHRW	Packed 16-bit integer multiply with rounding	B7h

AMD 3DNow! Performance Enhancement Instructions

Operation	Function	Opcode second byte
FEMMS	Faster entry/exit of the MMX or floating-point state.	0Eh
PREFETCH/PREFETCHW	The function prefetches at least a 32-byte line into L1 data cache (Dcache). The AMD-K6-2 and AMD-K6-III processors execute the PREFETCHW instruction identically to the PREFETCH instruction. On the AMD Athlon processor, PREFETCHW can increase performance by providing a hint to the processor of an intent to modify the cache line.	0Dh

AMD Athlon Processor 3DNow! Technology DSP Extensions

Operation	Function	Opcode / imm8
PF2IW	Packed floating-point to integer word conversion with sign extend	0Fh 0Fh / 1Ch
PFNACC	Packed floating-point negative accumulate	0Fh 0Fh / 8Ah
PFPNACC	Packed floating-point mixed positive-negative accumulate	0Fh 0Fh / 8Eh
PI2FW	Packed integer word to floating-point conversion	0Fh 0Fh / 0Ch
PSWAPD	Packed swap doubleword	0Fh 0Fh / BBh

MMX Instruction set extensions starting with AMD Athlon Processor

Operation	Function	Opcode / imm8
MASKMOVQ	Streaming (cache bypass) store using byte mask	0Fh F7h
MOVNTQ	Streaming (cache bypass) store	0Fh E7h
PAVGB	Packed average of unsigned byte	0Fh E0h
PAVGW	Packed average of unsigned word	0Fh E3h
PEXTRW	Extract word into integer register	0Fh C5h
PINSRW	Insert word from integer register	0Fh C4h
PMAXSW	Packed maximum signed word	0Fh Eeh
PMAXUB	Packed maximum unsigned byte	0Fh Deh
PMINSW	Packed minimum signed word	0Fh Eah
PMINUB	Packed minimum unsigned byte	0Fh Dah
PMOVMSKB	Move byte mask to integer register	0Fh D7h
PMULHUW	Packed multiply high unsigned word	0Fh E4h
PREFETCHNTA	Move data closer to the processor using the NTA reference	0Fh 18h 0*
PREFETCHT0	Move data closer to the processor using the T0 reference	0Fh 18h 1*
PREFETCHT1	Move data closer to the processor using the T1 reference	0Fh 18h 2*
PREFETCHT2	Move data closer to the processor using the T2 reference	0Fh 18h 3*
PSADBW	Packed sum of absolute byte differences	0Fh F6h
PSHUFW	Packed shuffle word	0Fh 70h
SFENCE	Store fence	0Fh AEh / 7h

*The number after the opcode indicates the different prefetch modes in the modR/M byte.

For further information regarding code optimization on the AMD-K6-2 processor, see the AMD-K6-2 Processor Code Optimization Application Note, order number 21924. This document provides in-depth discussions of code optimization techniques for the AMD-K6 family processor.

For execution resources information on the AMD Athlon processor, refer to the AMD Athlon Processor x86 Code Optimization Guide, order number 22007. This document provides in-depth discussions of code optimization techniques for the AMD Athlon processor.

See https://go.microsoft.com/fwlink/?LinkID=95131 for the online versions of these documents.

Share via

Intrinsics Overview

See Also

Reference

Additional resources