Inferring Wide Multipliers
Multiplication operations wider than 18x17, are decomposed into smaller operations. The software adds the partial sums to form the final product. For example, a 32x32 multiplication is broken down into 4 smaller multiply operations and 3 summations. By default, these partial sums are implemented inside the DSP block using the post-adder feature, and the cascade paths between DSP blocks.
Because long cascade combinational paths can cause slower fMAX, you can insert pipeline registers in the DSP operation to improve throughput (fMAX) at the cost of extra latency.
Warning: The
effect of extra latency in the synthesized netlist must be accounted for in
post-synthesis simulation. Otherwise, simulation mismatch may be observed between
the RTL and synthesized netlists.
See the --mult-auto-pipeline option in Synthesis Options.