Inferring Wide Multipliers

Multiplication operations wider than 18x17, are decomposed into smaller operations. The software adds the partial sums to form the final product. For example, a 32x32 multiplication is broken down into 4 smaller multiply operations and 3 summations. By default, these partial sums are implemented inside the DSP block using the post-adder feature, and the cascade paths between DSP blocks.

Because long cascade combinational paths can cause slower f_MAX, you can insert pipeline registers in the DSP operation to improve throughput (f_MAX) at the cost of extra latency.

Warning: The effect of extra latency in the synthesized netlist must be accounted for in post-synthesis simulation. Otherwise, simulation mismatch may be observed between the RTL and synthesized netlists.

See the --mult-auto-pipeline option in Synthesis Options.