A SERVICE OF

logo

Appendix C Instruction Latencies 311
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
FMUL [mem32real] D8h mm-001-xxx DirectPath FMUL 6
FMUL [mem64real] DCh mm-001-xxx DirectPath FMUL 6
FMULP ST(i), ST DEh 11-001-xxx DirectPath FMUL 4 1
FNCLEX DBh E2h VectorPath 16
FNINIT DBh E3h VectorPath 89
FNOP D9h 11-010-000 DirectPath FADD/FMUL/
FSTORE
22
FPATAN D9h 11-110-011 VectorPath - 136
FPREM D9h 11-111-000 DirectPath FMUL 9+e+n 4
FPREM1 D9h 11-110-101 DirectPath FMUL 9+e+n 4
FPTAN D9h 11-110-010 VectorPath - 107
FRNDINT D9h 11-111-100 VectorPath - 10
FRSTOR [mem94byte] DDh mm-100-xxx VectorPath - 138
FRSTOR [mem108byte] DDh mm-100-xxx VectorPath - 138
FSAVE [mem94byte] DDh mm-110-xxx VectorPath - 159
FSAVE [mem108byte] DDh mm-110-xxx VectorPath - 159
FSCALE D9h 11-111-101 VectorPath - 9
FSIN D9h 11-111-110 VectorPath - 93
FSINCOS D9h 11-111-011 VectorPath - 104
FSQRT D9h 11-111-010 DirectPath FMUL 35
FST [mem32real] D9h mm-010-xxx DirectPath FSTORE 2
FST [mem64real] DDh mm-010-xxx DirectPath FSTORE 2
FST ST(i) DDh 11-010xxx DirectPath FADD/FMUL 2
FSTCW [mem16] D9h mm-111-xxx VectorPath - 4
FSTENV [mem14byte] D9h mm-110-xxx VectorPath - 89
Table 15. x87 Floating-Point Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency Note
First
byte
Second
byte
ModRM byte
Notes:
1. The last three bits of the ModRM byte select the stack entry ST(i).
2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP
with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of
three per cycle and can use any of the three execution resources.
3. This is a VectorPath decoded operation that uses one execution pipe (one ROP).
4. There is additional latency associated with this instruction. ā€œeā€ represents the difference between the exponents
of the divisor and the dividend. If ā€œsā€ is the number of normalization shifts performed on the result, then
n = (s+1)/2 where (0 <= n <= 32).
5. The latency provided for this operation is the best-case latency.
6. The three latency numbers represent the latency values for precision control settings of single precision, double
precision, and extended precision, respectively.