| January 2003 Laboratory Notes: Computer Engineering II | ||
|---|---|---|
| Prev | Appendix B. x86 Instruction Reference | Next |
ADDPS xmm1,xmm2/mem128 ; 0F 58 /r [KATMAI,SSE]
ADDPS performs addition on each of four packed single-precision FP value pairs:
dst[0-31] := dst[0-31] + src[0-31],
dst[32-63] := dst[32-63] + src[32-63],
dst[64-95] := dst[64-95] + src[64-95],
dst[96-127] := dst[96-127] + src[96-127].
The destination is an XMM register. The source operand can be either an XMM register or a 128-bit memory location.
ADDSS xmm1,xmm2/mem64 ; F2 0F 58 /r [KATMAI,SSE]
ADDSS adds the low single-precision FP values from the source and destination operands and stores the single-precision FP result in the destination operand.
dst[0-31] := dst[0-31] + src[0-31],
dst[32-127] remains unchanged.
The destination is an XMM register. The source operand can be either an XMM register or a 32-bit memory location.
ANDNPS xmm1,xmm2/mem128 ; 0F 55 /r [KATMAI,SSE]
ANDNPS inverts the bits of the four single-precision floating-point values in the destination register, and then performs a logical AND between the four single-precision floating-point values in the source operand and the temporary inverted result, storing the result in the destination register.
dst[0-31] := src[0-31] AND NOT dst[0-31],
dst[32-63] := src[32-63] AND NOT dst[32-63],
dst[64-95] := src[64-95] AND NOT dst[64-95],
dst[96-127] := src[96-127] AND NOT dst[96-127].
The destination is an XMM register. The source operand can be either an XMM register or a 128-bit memory location.
ANDPS xmm1,xmm2/mem128 ; 0F 54 /r [KATMAI,SSE]
ANDPS performs a bitwise logical AND of the four single-precision floating point values in the source and destination operand, and stores the result in the destination register.
dst[0-31] := src[0-31] AND dst[0-31],
dst[32-63] := src[32-63] AND dst[32-63],
dst[64-95] := src[64-95] AND dst[64-95],
dst[96-127] := src[96-127] AND dst[96-127].
The destination is an XMM register. The source operand can be either an XMM register or a 128-bit memory location.
CMPPS xmm1,xmm2/mem128,imm8 ; 0F C2 /r ib [KATMAI,SSE]
CMPEQPS xmm1,xmm2/mem128 ; 0F C2 /r 00 [KATMAI,SSE]
CMPLTPS xmm1,xmm2/mem128 ; 0F C2 /r 01 [KATMAI,SSE]
CMPLEPS xmm1,xmm2/mem128 ; 0F C2 /r 02 [KATMAI,SSE]
CMPUNORDPS xmm1,xmm2/mem128 ; 0F C2 /r 03 [KATMAI,SSE]
CMPNEQPS xmm1,xmm2/mem128 ; 0F C2 /r 04 [KATMAI,SSE]
CMPNLTPS xmm1,xmm2/mem128 ; 0F C2 /r 05 [KATMAI,SSE]
CMPNLEPS xmm1,xmm2/mem128 ; 0F C2 /r 06 [KATMAI,SSE]
CMPORDPS xmm1,xmm2/mem128 ; 0F C2 /r 07 [KATMAI,SSE]
The CMPccPS instructions compare the two packed single-precision FP values in the source and destination operands, and returns the result of the comparison in the destination register. The result of each comparison is a quadword mask of all 1s (comparison true) or all 0s (comparison false).
The destination is an XMM register. The source can be either an XMM register or a 128-bit memory location.
The third operand is an 8-bit immediate value, of which the low 3 bits define the type of comparison. For ease of programming, the 8 two-operand pseudo-instructions are provided, with the third operand already filled in. The "Condition Predicates" are:
| EQ | 0 | Equal |
| LT | 1 | Less than |
| LE | 2 | Less than or equal |
| UNORD | 3 | Unordered |
| NE | 4 | Not equal |
| NLT | 5 | Not less than |
| NLE | 6 | Not less than or equal |
| ORD | 7 | Ordered |
For more details of the comparison predicates, and details of how to emulate the "greater than" equivalents, see Section B.2.3.
COMISS xmm1,xmm2/mem64 ; 66 0F 2F /r [KATMAI,SSE]
COMISS compares the low-order single-precision FP value in the two source operands. ZF, PF, and CF are set according to the result. OF, AF, and AF are cleared. The unordered result is returned if either source is a NaN (QNaN or SNaN).
The destination operand is an XMM register. The source can be either an XMM register or a memory location.
The flags are set according to the following rules:
CVTPI2PS xmm,mm/mem64 ; 0F 2A /r [KATMAI,SSE]
CVTPI2PS converts two packed signed doublewords from the source operand to two packed single-precision FP values in the low quadword of the destination operand. The high quadword of the destination remains unchanged.
The destination operand is an XMM register. The source can be either an MMX register or a 64-bit memory location.
For more details of this instruction, see the Intel Processor manuals.
CVTPS2PI mm,xmm/mem64 ; 0F 2D /r [KATMAI,SSE]
CVTPS2PI converts two packed single-precision FP values from the source operand to two packed signed doublewords in the destination operand.
The destination operand is an MMX register. The source can be either an XMM register or a 64-bit memory location. If the source is a register, the input values are in the low quadword.
For more details of this instruction, see the Intel Processor manuals.
CVTSD2SS xmm1,xmm2/mem64 ; F2 0F 5A /r [KATMAI,SSE]
CVTSD2SS converts a double-precision FP value from the source perand to a single-precision FP value in the low doubleword of the estination operand. The upper 3 doublewords are left unchanged.
The destination operand is an XMM register. The source can be either an XMM register or a 64-bit memory location. If the source is a register, the input value is in the low quadword.
For more details of this instruction, see the Intel Processor manuals.
CVTSI2SS xmm,r/m32 ; F3 0F 2A /r [KATMAI,SSE]
CVTSI2SS converts a signed doubleword from the source operand to a single-precision FP value in the low doubleword of the destination operand. The upper 3 doublewords are left unchanged.
The destination operand is an XMM register. The source can be either a general purpose register or a 32-bit memory location.
For more details of this instruction, see the Intel Processor manuals.
CVTSS2SI reg32,xmm/mem32 ; F3 0F 2D /r [KATMAI,SSE]
CVTSS2SI converts a single-precision FP value from the source operand to a signed doubleword in the destination operand.
The destination operand is a general purpose register. The source can be either an XMM register or a 32-bit memory location. If the source is a register, the input value is in the low doubleword.
For more details of this instruction, see the Intel Processor manuals.
CVTTPS2PI mm,xmm/mem64 ; 0F 2C /r [KATMAI,SSE]
CVTTPS2PI converts two packed single-precision FP values in the source operand to two packed signed doublewords in the destination operand. If the result is inexact, it is truncated (rounded toward zero). If the source is a register, the input values are in the low quadword.
The destination operand is an MMX register. The source can be either an XMM register or a 64-bit memory location. If the source is a register, the input value is in the low quadword.
For more details of this instruction, see the Intel Processor manuals.
CVTTSD2SI reg32,xmm/mem32 ; F3 0F 2C /r [KATMAI,SSE]
CVTTSS2SI converts a single-precision FP value in the source operand to a signed doubleword in the destination operand. If the result is inexact, it is truncated (rounded toward zero).
The destination operand is a general purpose register. The source can be either an XMM register or a 32-bit memory location. If the source is a register, the input value is in the low doubleword.
For more details of this instruction, see the Intel Processor manuals.
DIVPS xmm1,xmm2/mem128 ; 0F 5E /r [KATMAI,SSE]
DIVPS divides the four packed single-precision FP values in the destination operand by the four packed single-precision FP values in the source operand, and stores the packed single-precision results in the destination register.
The destination is an XMM register. The source operand can be either an XMM register or a 128-bit memory location.
dst[0-31] := dst[0-31] / src[0-31],
dst[32-63] := dst[32-63] / src[32-63],
dst[64-95] := dst[64-95] / src[64-95],
dst[96-127] := dst[96-127] / src[96-127].
DIVSS xmm1,xmm2/mem32 ; F3 0F 5E /r [KATMAI,SSE]
DIVSS divides the low-order single-precision FP value in the destination operand by the low-order single-precision FP value in the source operand, and stores the single-precision result in the destination register.
The destination is an XMM register. The source operand can be either an XMM register or a 32-bit memory location.
dst[0-31] := dst[0-31] / src[0-31],
dst[32-127] remains unchanged.
LDMXCSR mem32 ; 0F AE /2 [KATMAI,SSE]
LDMXCSR loads 32-bits of data from the specified memory location into the MXCSR control/status register. MXCSR is used to enable masked/unmasked exception handling, to set rounding modes, to set flush-to-zero mode, and to view exception status flags.
For details of the MXCSR register, see the Intel processor docs.
See also STMXCSR (Section B.5.72).
MASKMOVQ mm1,mm2 ; 0F F7 /r [KATMAI,MMX]
MASKMOVQ stores data from mm1 to the location specified by ES:EDI (or ES:DI). The size of the store depends on the address-size attribute. The most significant bit in each byte of the mask register mm2 is used to selectively write the data (0 = no write, 1 = write) on a per-byte basis.
MAXPS xmm1,xmm2/m128 ; 0F 5F /r [KATMAI,SSE]
MAXPS performs a SIMD compare of the packed single-precision FP numbers from xmm1 and xmm2/mem, and stores the maximum values of each pair of values in xmm1. If the values being compared are both zeroes, source2 (xmm2/m128) would be returned. If source2 (xmm2/m128) is an SNaN, this SNaN is forwarded unchanged to the destination (i.e., a QNaN version of the SNaN is not returned).
MAXSS xmm1,xmm2/m32 ; F3 0F 5F /r [KATMAI,SSE]
MAXSS compares the low-order single-precision FP numbers from xmm1 and xmm2/mem, and stores the maximum value in xmm1. If the values being compared are both zeroes, source2 (xmm2/m32) would be returned. If source2 (xmm2/m32) is an SNaN, this SNaN is forwarded unchanged to the destination (i.e., a QNaN version of the SNaN is not returned). The high three doublewords of the destination are left unchanged.
MINPS xmm1,xmm2/m128 ; 0F 5D /r [KATMAI,SSE]
MINPS performs a SIMD compare of the packed single-precision FP numbers from xmm1 and xmm2/mem, and stores the minimum values of each pair of values in xmm1. If the values being compared are both zeroes, source2 (xmm2/m128) would be returned. If source2 (xmm2/m128) is an SNaN, this SNaN is forwarded unchanged to the destination (i.e., a QNaN version of the SNaN is not returned).
MINSS xmm1,xmm2/m32 ; F3 0F 5D /r [KATMAI,SSE]
MINSS compares the low-order single-precision FP numbers from xmm1 and xmm2/mem, and stores the minimum value in xmm1. If the values being compared are both zeroes, source2 (xmm2/m32) would be returned. If source2 (xmm2/m32) is an SNaN, this SNaN is forwarded unchanged to the destination (i.e., a QNaN version of the SNaN is not returned). The high three doublewords of the destination are left unchanged.
MOVAPS xmm1,xmm2/mem128 ; 0F 28 /r [KATMAI,SSE]
MOVAPS xmm1/mem128,xmm2 ; 0F 29 /r [KATMAI,SSE]
MOVAPS moves a double quadword containing 4 packed single-precision FP values from the source operand to the destination. When the source or destination operand is a memory location, it must be aligned on a 16-byte boundary.
To move data in and out of memory locations that are not known to be on 16-byte boundaries, use the MOVUPS instruction (Section B.5.33).
MOVD mm,r/m32 ; 0F 6E /r [PENT,MMX]
MOVD r/m32,mm ; 0F 7E /r [PENT,MMX]
MOVD copies 32 bits from its source (second) operand into its destination (first) operand. The input value is zero-extended to fill the destination register.
MOVHLPS xmm1,xmm2 ; OF 12 /r [KATMAI,SSE]
MOVHLPS moves the two packed single-precision FP values from the high quadword of the source register xmm2 to the low quadword of the destination register, xmm2. The upper quadword of xmm1 is left unchanged.
The operation of this instruction is:
dst[0-63] := src[64-127],
dst[64-127] remains unchanged.
MOVHPS xmm,m64 ; 0F 16 /r [KATMAI,SSE]
MOVHPS m64,xmm ; 0F 17 /r [KATMAI,SSE]
MOVHPS moves two packed single-precision FP values between the source and destination operands. One of the operands is a 64-bit memory location, the other is the high quadword of an XMM register.
The operation of this instruction is:
mem[0-63] := xmm[64-127];
or
xmm[0-63] remains unchanged;
xmm[64-127] := mem[0-63].
MOVLHPS xmm1,xmm2 ; OF 16 /r [KATMAI,SSE]
MOVLHPS moves the two packed single-precision FP values from the low quadword of the source register xmm2 to the high quadword of the destination register, xmm2. The low quadword of xmm1 is left unchanged.
The operation of this instruction is:
dst[0-63] remains unchanged;
dst[64-127] := src[0-63].
MOVLPS xmm,m64 ; OF 12 /r [KATMAI,SSE]
MOVLPS m64,xmm ; OF 13 /r [KATMAI,SSE]
MOVLPS moves two packed single-precision FP values between the source and destination operands. One of the operands is a 64-bit memory location, the other is the low quadword of an XMM register.
The operation of this instruction is:
mem(0-63) := xmm(0-63);
or
xmm(0-63) := mem(0-63);
xmm(64-127) remains unchanged.
MOVMSKPS reg32,xmm ; 0F 50 /r [KATMAI,SSE]
MOVMSKPS inserts a 4-bit mask in r32, formed of the most significant bits of each single-precision FP number of the source operand.
MOVNTPS m128,xmm ; 0F 2B /r [KATMAI,SSE]
MOVNTPS moves the double quadword from the XMM source register to the destination memory location, using a non-temporal hint. This store instruction minimizes cache pollution. The memory location must be aligned to a 16-byte boundary.
MOVNTQ m64,mm ; 0F E7 /r [KATMAI,MMX]
MOVNTQ moves the quadword in the MMX source register to the destination memory location, using a non-temporal hint. This store instruction minimizes cache pollution.
MOVQ mm1,mm2/m64 ; 0F 6F /r [PENT,MMX]
MOVQ mm1/m64,mm2 ; 0F 7F /r [PENT,MMX]
MOVQ copies 64 bits from its source (second) operand into its destination (first) operand.
MOVSS xmm1,xmm2/m32 ; F3 0F 10 /r [KATMAI,SSE]
MOVSS xmm1/m32,xmm2 ; F3 0F 11 /r [KATMAI,SSE]
MOVSS moves a single-precision FP value from the source operand to the destination operand. When the source or destination is a register, the low-order FP value is read or written.
MOVUPS xmm1,xmm2/mem128 ; 0F 10 /r [KATMAI,SSE]
MOVUPS xmm1/mem128,xmm2 ; 0F 11 /r [KATMAI,SSE]
MOVUPS moves a double quadword containing 4 packed single-precision FP values from the source operand to the destination. This instruction makes no assumptions about alignment of memory operands.
To move data in and out of memory locations that are known to be on 16-byte boundaries, use the MOVAPS instruction (Section B.5.22).
MULPS xmm1,xmm2/mem128 ; 0F 59 /r [KATMAI,SSE]
MULPS performs a SIMD multiply of the packed single-precision FP values in both operands, and stores the results in the destination register.
MULSS xmm1,xmm2/mem32 ; F3 0F 59 /r [KATMAI,SSE]
MULSS multiplies the lowest single-precision FP values of both operands, and stores the result in the low doubleword of xmm1.
ORPS xmm1,xmm2/m128 ; 0F 56 /r [KATMAI,SSE]
ORPS return a bit-wise logical OR between xmm1 and xmm2/mem, and stores the result in xmm1. If the source operand is a memory location, it must be aligned to a 16-byte boundary.
PACKSSDW mm1,mm2/m64 ; 0F 6B /r [PENT,MMX]
PACKSSWB mm1,mm2/m64 ; 0F 63 /r [PENT,MMX]
PACKUSWB mm1,mm2/m64 ; 0F 67 /r [PENT,MMX]
All these instructions start by combining the source and destination operands, and then splitting the result in smaller sections which it then packs into the destination register. The two 64-bit operands are packed into one 64-bit register.
PACKSSWB splits the combined value into words, and then reduces the words to bytes, using signed saturation. It then packs the bytes into the destination register in the same order the words were in.
PACKSSDW performs the same operation as PACKSSWB, except that it reduces doublewords to words, then packs them into the destination register.
PACKUSWB performs the same operation as PACKSSWB, except that it uses unsigned saturation when reducing the size of the elements.
To perform signed saturation on a number, it is replaced by the largest signed number (7FFFh or 7Fh) that will fit, and if it is too small it is replaced by the smallest signed number (8000h or 80h) that will fit. To perform unsigned saturation, the input is treated as unsigned, and the input is replaced by the largest unsigned number that will fit.
PADDB mm1,mm2/m64 ; 0F FC /r [PENT,MMX]
PADDW mm1,mm2/m64 ; 0F FD /r [PENT,MMX]
PADDD mm1,mm2/m64 ; 0F FE /r [PENT,MMX]
PADDx performs packed addition of the two operands, storing the result in the destination (first) operand.
PADDB treats the operands as packed bytes, and adds each byte individually;
PADDW treats the operands as packed words;
PADDD treats its operands as packed doublewords.
When an individual result is too large to fit in its destination, it is wrapped around and the low bits are stored, with the carry bit discarded.
PADDQ mm1,mm2/m64 ; 0F D4 /r [PENT,MMX]
PADDQ adds the quadwords in the source and destination operands, and stores the result in the destination register.
When an individual result is too large to fit in its destination, it is wrapped around and the low bits are stored, with the carry bit discarded.
PADDSB mm1,mm2/m64 ; 0F EC /r [PENT,MMX]
PADDSW mm1,mm2/m64 ; 0F ED /r [PENT,MMX]
PADDSx performs packed addition of the two operands, storing the result in the destination (first) operand. PADDSB treats the operands as packed bytes, and adds each byte individually; and PADDSW treats the operands as packed words.
When an individual result is too large to fit in its destination, a saturated value is stored. The resulting value is the value with the largest magnitude of the same sign as the result which will fit in the available space.
PADDUSB mm1,mm2/m64 ; 0F DC /r [PENT,MMX]
PADDUSW mm1,mm2/m64 ; 0F DD /r [PENT,MMX]
PADDUSx performs packed addition of the two operands, storing the result in the destination (first) operand. PADDUSB treats the operands as packed bytes, and adds each byte individually; and PADDUSW treats the operands as packed words.
When an individual result is too large to fit in its destination, a saturated value is stored. The resulting value is the maximum value that will fit in the available space.
PAND mm1,mm2/m64 ; 0F DB /r [PENT,MMX]
PANDN mm1,mm2/m64 ; 0F DF /r [PENT,MMX]
PAND performs a bitwise AND operation between its two operands (i.e. each bit of the result is 1 if and only if the corresponding bits of the two inputs were both 1), and stores the result in the destination (first) operand.
PANDN performs the same operation, but performs a one's complement operation on the destination (first) operand first.
PAVGB mm1,mm2/m64 ; 0F E0 /r [KATMAI,MMX]
PAVGW mm1,mm2/m64 ; 0F E3 /r [KATMAI,MMX,SM]
PAVGB and PAVGW add the unsigned data elements of the source operand to the unsigned data elements of the destination register, then adds 1 to the temporary results. The results of the add are then each independently right-shifted by one bit position. The high order bits of each element are filled with the carry bits of the corresponding sum.
PAVGB operates on packed unsigned bytes.
PAVGW operates on packed unsigned words.
PCMPEQB mm1,mm2/m64 ; 0F 74 /r [PENT,MMX]
PCMPEQW mm1,mm2/m64 ; 0F 75 /r [PENT,MMX]
PCMPEQD mm1,mm2/m64 ; 0F 76 /r [PENT,MMX]
PCMPGTB mm1,mm2/m64 ; 0F 64 /r [PENT,MMX]
PCMPGTW mm1,mm2/m64 ; 0F 65 /r [PENT,MMX]
PCMPGTD mm1,mm2/m64 ; 0F 66 /r [PENT,MMX]
The PCMPxx instructions all treat their operands as vectors of bytes, words, or doublewords; corresponding elements of the source and destination are compared, and the corresponding element of the destination (first) operand is set to all zeros or all ones depending on the result of the comparison.
PCMPxxB treats the operands as vectors of bytes.
PCMPxxW treats the operands as vectors of words.
PCMPxxD treats the operands as vectors of doublewords.
PCMPEQx sets the corresponding element of the destination operand to all ones if the two elements compared are equal.
PCMPGTx sets the destination element to all ones if the element of the first (destination) operand is greater (treated as a signed integer) than that of the second (source) operand.
PEXTRW reg32,mm,imm8 ; 0F C5 /r ib [KATMAI,MMX]
PEXTRW moves the word in the source register (second operand) that is pointed to by the count operand (third operand), into the lower half of a 32-bit general purpose register. The upper half of the register is cleared to all 0s.
The two least significant bits of the count specify the source word.
PINSRW mm,r16/r32/m16,imm8 ;0F C4 /r ib [KATMAI,MMX]
PINSRW loads a word from a 16-bit register (or the low half of a 32-bit register), or from memory, and loads it to the word position in the destination register, pointed at by the count operand (third operand). The low two bits of the count byte are used. The insertion is done in such a way that the other words from the destination register are left untouched.
PMADDWD mm1,mm2/m64 ; 0F F5 /r [PENT,MMX],
PMADDWD treats its two inputs as vectors of signed words. It multiplies corresponding elements of the two operands, giving doubleword results. These are then added together in pairs and stored in the destination operand.
The operation of this instruction is:
dst[0-31] := (dst[0-15] * src[0-15])
+ (dst[16-31] * src[16-31]);
dst[32-63] := (dst[32-47] * src[32-47])
+ (dst[48-63] * src[48-63]);
PMAXSW mm1,mm2/m64 ; 0F EE /r [KATMAI,MMX]
PMAXSW compares each pair of words in the two source operands, and for each pair it stores the maximum value in the destination register.
PMAXUB mm1,mm2/m64 ; 0F DE /r [KATMAI,MMX]
PMAXUB compares each pair of bytes in the two source operands, and for each pair it stores the maximum value in the destination register.
PMINSW mm1,mm2/m64 ; 0F EA /r [KATMAI,MMX]
PMINSW compares each pair of words in the two source operands, and for each pair it stores the minimum value in the destination register.
PMINUB mm1,mm2/m64 ; 0F DA /r [KATMAI,MMX]
PMINUB compares each pair of bytes in the two source operands, and for each pair it stores the minimum value in the destination register.
PMOVMSKB reg32,mm ; 0F D7 /r [KATMAI,MMX]
PMOVMSKB returns an 8-bit mask formed of the most significant bits of each byte of the source operand.
PMULHUW mm1,mm2/m64 ; 0F E4 /r [KATMAI,MMX]
PMULHUW takes two packed unsigned 16-bit integer inputs, multiplies the values in the inputs, then stores bits 16-31 of each result to the corresponding position of the destination register.
PMULHW mm1,mm2/m64 ; 0F E5 /r [PENT,MMX]
PMULLW mm1,mm2/m64 ; 0F D5 /r [PENT,MMX]
PMULxW takes two packed signed 16-bit integer inputs, and multiplies the values in the inputs, forming doubleword results.
PMULHW then stores the top 16 bits of each doubleword in the destination (first) operand;
PMULLW stores the bottom 16 bits of each doubleword in the destination operand.
POR mm1,mm2/m64 ; 0F EB /r [PENT,MMX]
POR performs a bitwise OR operation between its two operands (i.e. each bit of the result is 1 if and only if at least one of the corresponding bits of the two inputs was 1), and stores the result in the destination (first) operand.
PSADBW mm1,mm2/m64 ; 0F F6 /r [KATMAI,MMX]
The PSADBW instruction computes the absolute value of the difference of the packed unsigned bytes in the two source operands. These differences are then summed to produce a word result in the lower 16-bit field of the destination register; the rest of the register is cleared. The destination operand is an MMX register. The source operand can either be a register or a memory operand.
PSHUFW mm1,mm2/m64,imm8 ; 0F 70 /r ib [KATMAI,MMX]
PSHUFW shuffles the words in the source (second) operand according to the encoding specified by imm8, and stores the result in the destination (first) operand.
Bits 0 and 1 of imm8 encode the source position of the word to be copied to position 0 in the destination operand. Bits 2 and 3 encode for position 1, bits 4 and 5 encode for position 2, and bits 6 and 7 encode for position 3. For example, an encoding of 10 in bits 0 and 1 of imm8 indicates that the word at bits 32-47 of the source operand will be copied to bits 0-15 of the destination.
PSLLW mm1,mm2/m64 ; 0F F1 /r [PENT,MMX]
PSLLW mm,imm8 ; 0F 71 /6 ib [PENT,MMX]
PSLLD mm1,mm2/m64 ; 0F F2 /r [PENT,MMX]
PSLLD mm,imm8 ; 0F 72 /6 ib [PENT,MMX]
PSLLQ mm1,mm2/m64 ; 0F F3 /r [PENT,MMX]
PSLLQ mm,imm8 ; 0F 73 /6 ib [PENT,MMX]
PSLLx performs logical left shifts of the data elements in the destination (first) operand, moving each bit in the separate elements left by the number of bits specified in the source (second) operand, clearing the low-order bits as they are vacated.
PSLLW shifts word sized elements.
PSLLD shifts doubleword sized elements.
PSLLQ shifts quadword sized elements.
PSRAW mm1,mm2/m64 ; 0F E1 /r [PENT,MMX]
PSRAW mm,imm8 ; 0F 71 /4 ib [PENT,MMX]
PSRAD mm1,mm2/m64 ; 0F E2 /r [PENT,MMX]
PSRAD mm,imm8 ; 0F 72 /4 ib [PENT,MMX]
PSRAx performs arithmetic right shifts of the data elements in the destination (first) operand, moving each bit in the separate elements right by the number of bits specified in the source (second) operand, setting the high-order bits to the value of the original sign bit.
PSRAW shifts word sized elements.
PSRAD shifts doubleword sized elements.
PSRLW mm1,mm2/m64 ; 0F D1 /r [PENT,MMX]
PSRLW mm,imm8 ; 0F 71 /2 ib [PENT,MMX]
PSRLD mm1,mm2/m64 ; 0F D2 /r [PENT,MMX]
PSRLD mm,imm8 ; 0F 72 /2 ib [PENT,MMX]
PSRLQ mm1,mm2/m64 ; 0F D3 /r [PENT,MMX]
PSRLQ mm,imm8 ; 0F 73 /2 ib [PENT,MMX]
PSRLx performs logical right shifts of the data elements in the destination (first) operand, moving each bit in the separate elements right by the number of bits specified in the source (second) operand, clearing the high-order bits as they are vacated.
PSRLW shifts word sized elements.
PSRLD shifts doubleword sized elements.
PSRLQ shifts quadword sized elements.
PSUBB mm1,mm2/m64 ; 0F F8 /r [PENT,MMX]
PSUBW mm1,mm2/m64 ; 0F F9 /r [PENT,MMX]
PSUBD mm1,mm2/m64 ; 0F FA /r [PENT,MMX]
PSUBx subtracts packed integers in the source operand from those in the destination operand. It doesn't differentiate between signed and unsigned integers, and doesn't set any of the flags.
PSUBB operates on byte sized elements.
PSUBW operates on word sized elements.
PSUBD operates on doubleword sized elements.
PSUBSB mm1,mm2/m64 ; 0F E8 /r [PENT,MMX]
PSUBSW mm1,mm2/m64 ; 0F E9 /r [PENT,MMX]
PSUBUSB mm1,mm2/m64 ; 0F D8 /r [PENT,MMX]
PSUBUSW mm1,mm2/m64 ; 0F D9 /r [PENT,MMX]
PSUBSx and PSUBUSx subtracts packed integers in the source operand from those in the destination operand, and use saturation for results that are outside the range supported by the destination operand.
PSUBSB operates on signed bytes, and uses signed saturation on the results.
PSUBSW operates on signed words, and uses signed saturation on the results.
PSUBUSB operates on unsigned bytes, and uses unsigned saturation on the results.
PSUBUSW operates on unsigned words, and uses unsigned saturation on the results.
PUNPCKHBW mm1,mm2/m64 ; 0F 68 /r [PENT,MMX]
PUNPCKHWD mm1,mm2/m64 ; 0F 69 /r [PENT,MMX]
PUNPCKHDQ mm1,mm2/m64 ; 0F 6A /r [PENT,MMX]
PUNPCKLBW mm1,mm2/m32 ; 0F 60 /r [PENT,MMX]
PUNPCKLWD mm1,mm2/m32 ; 0F 61 /r [PENT,MMX]
PUNPCKLDQ mm1,mm2/m32 ; 0F 62 /r [PENT,MMX]
PUNPCKxx all treat their operands as vectors, and produce a new vector generated by interleaving elements from the two inputs. The PUNPCKHxx instructions start by throwing away the bottom half of each input operand, and the PUNPCKLxx instructions throw away the top half.
The remaining elements are then interleaved into the destination, alternating elements from the second (source) operand and the first (destination) operand: so the leftmost part of each element in the result always comes from the second operand, and the rightmost from the destination.
PUNPCKxBW works a byte at a time, producing word sized output elements.
PUNPCKxWD works a word at a time, producing doubleword sized output elements.
PUNPCKxDQ works a doubleword at a time, producing quadword sized output elements.
So, for example, for MMX operands, if the first operand held 0x7A6A5A4A3A2A1A0A and the second held 0x7B6B5B4B3B2B1B0B, then:
| PUNPCKHBW would return 0x7B7A6B6A5B5A4B4A. |
| PUNPCKHWD would return 0x7B6B7A6A5B4B5A4A. |
| PUNPCKHDQ would return 0x7B6B5B4B7A6A5A4A. |
| PUNPCKLBW would return 0x3B3A2B2A1B1A0B0A. |
| PUNPCKLWD would return 0x3B2B3A2A1B0B1A0A. |
| PUNPCKLDQ would return 0x3B2B1B0B3A2A1A0A. |
PXOR mm1,mm2/m64 ; 0F EF /r [PENT,MMX]
PXOR performs a bitwise XOR operation between its two operands (i.e. each bit of the result is 1 if and only if exactly one of the corresponding bits of the two inputs was 1), and stores the result in the destination (first) operand.
RCPPS xmm1,xmm2/m128 ; 0F 53 /r [KATMAI,SSE]
RCPPS returns an approximation of the reciprocal of the packed single-precision FP values from xmm2/m128. The maximum error for this approximation is: |Error| <= 1.5 x 2-12
RCPSS xmm1,xmm2/m128 ; F3 0F 53 /r [KATMAI,SSE]
RCPSS returns an approximation of the reciprocal of the lower single-precision FP value from xmm2/m32; the upper three fields are passed through from xmm1. The maximum error for this approximation is: |Error| <= 1.5 x 2-12
RSQRTPS xmm1,xmm2/m128 ; 0F 52 /r [KATMAI,SSE]
RSQRTPS computes the approximate reciprocals of the square roots of the packed single-precision floating-point values in the source and stores the results in xmm1. The maximum error for this approximation is: |Error| <= 1.5 x 2-12
RSQRTSS xmm1,xmm2/m128 ; F3 0F 52 /r [KATMAI,SSE]
RSQRTSS returns an approximation of the reciprocal of the square root of the lowest order single-precision FP value from the source, and stores it in the low doubleword of the destination register. The upper three fields of xmm1 are preserved. The maximum error for this approximation is: |Error| <= 1.5 x 2-12
SHUFPS xmm1,xmm2/m128,imm8 ; 0F C6 /r ib [KATMAI,SSE]
SHUFPS moves two of the packed single-precision FP values from the destination operand into the low quadword of the destination operand; the upper quadword is generated by moving two of the single-precision FP values from the source operand into the destination. The select (third) operand selects which of the values are moved to the destination register.
The select operand is an 8-bit immediate: bits 0 and 1 select the value to be moved from the destination operand the low doubleword of the result, bits 2 and 3 select the value to be moved from the destination operand the second doubleword of the result, bits 4 and 5 select the value to be moved from the source operand the third doubleword of the result, and bits 6 and 7 select the value to be moved from the source operand to the high doubleword of the result.
SQRTPS xmm1,xmm2/m128 ; 0F 51 /r [KATMAI,SSE]
SQRTPS calculates the square root of the packed single-precision FP value from the source operand, and stores the single-precision results in the destination register.
SQRTSS xmm1,xmm2/m128 ; F3 0F 51 /r [KATMAI,SSE]
SQRTSS calculates the square root of the low-order single-precision FP value from the source operand, and stores the single-precision result in the destination register. The three high doublewords remain unchanged.
STMXCSR m32 ; 0F AE /3 [KATMAI,SSE]
STMXCSR stores the contents of the MXCSR control/status register to the specified memory location. MXCSR is used to enable masked/unmasked exception handling, to set rounding modes, to set flush-to-zero mode, and to view exception status flags. The reserved bits in the MXCSR register are stored as 0s.
For details of the MXCSR register, see the Intel processor docs.
See also LDMXCSR (Section B.5.16).
SUBPS xmm1,xmm2/m128 ; 0F 5C /r [KATMAI,SSE]
SUBPS subtracts the packed single-precision FP values of the source operand from those of the destination operand, and stores the result in the destination operation.
SUBSS xmm1,xmm2/m128 ; F3 0F 5C /r [KATMAI,SSE]
SUBSS subtracts the low-order single-precision FP value of the source operand from that of the destination operand, and stores the result in the destination operation. The three high doublewords are unchanged.
UCOMISS xmm1,xmm2/m128 ; 0F 2E /r [KATMAI,SSE]
UCOMISS compares the low-order single-precision FP numbers in the two operands, and sets the ZF, PF, and CF bits in the EFLAGS register. In addition, the OF, SF and AF bits in the EFLAGS register are zeroed out. The unordered predicate (ZF, PF, and CF all set) is returned if either source operand is a NaN (qNaN or sNaN).
UNPCKHPS xmm1,xmm2/m128 ; 0F 15 /r [KATMAI,SSE]
UNPCKHPS performs an interleaved unpack of the high-order data elements of the source and destination operands, saving the result in xmm1. It ignores the lower half of the sources.
The operation of this instruction is:
dst[31-0] := dst[95-64];
dst[63-32] := src[95-64];
dst[95-64] := dst[127-96];
dst[127-96] := src[127-96].
UNPCKLPS xmm1,xmm2/m128 ; 0F 14 /r [KATMAI,SSE]
UNPCKLPS performs an interleaved unpack of the low-order data elements of the source and destination operands, saving the result in xmm1. It ignores the lower half of the sources.
The operation of this instruction is:
dst[31-0] := dst[31-0];
dst[63-32] := src[31-0];
dst[95-64] := dst[63-32];
dst[127-96] := src[63-32].
XORPS xmm1,xmm2/m128 ; 0F 57 /r [KATMAI,SSE]
XORPS returns a bit-wise logical XOR between the source and destination operands, storing the result in the destination operand.