Below are some small lessons learned from experimenting with the abilities and limitations of synthesis tools. I won’t claim that it works for all of them, but it works for a least the one that I work with right now.
Muxing
Leading edge synthesis is able to optimize multiple mathematical operations into carry-save logic, but it quickly breaks down. Adding a mux between two chains of arithmetic in order to fuse them into a single result will break the optimization. Prototype your data path design in a separately synthesized module to get best the quality of results before writing the actual RTL.
Muxing a non power of two array with result = array[idx]; will not only result in simulation vs synthesis matches, but also give you worse area than using a n-input mux such as
wire [N-1:0] onehot = 1'b1 << idx; always_comb begin result = i; for (int i = 0; i < N; i++) begin if (onehot[i]) result |= array[i]; end end
Create your own mux_n module. You will use it a ton. For loops over 256 incur a pretty heavy runtime and memory penalty when the synthesis tool maps to generic logic cells even when timing closure is a breeze. Synthesis tools have a cap on the maximum number of iterations (around 1000) in a loop partially for this reason. Other EDA tools, such as Linters, have similar caps. Potentially large for loops such a mux_n module can be structured into multiple modules through generate statements and recursion. Wait with the generate and recursion until you need it. You will have to test it a lot.
Multi-dimensional arrays
Multi-dimensional arrays are great as long as they are sized in powers of two and you don’t need to map them into a linear space for access from a CPU. When none of the dimensions are a power of two or when you need CPU access to the array then you have to map to a linear space. Fortunately, a simple mapping scheme such as
assign idx_z = SIZE_A*idx_b + idx_a;
is optimal in terms of area both when the the array is sized in powers of two and when it isn’t. When the array is sized in powers of two, the result is exactly the same as
assign idx_z = {idx_b, idx_a};
There is no loss in quality of results. We only incur the overhead of an adder when SIZE_A isn’t a power of two.
Programmable Counters
Always count from N down to zero instead of counting up to N-1. It requires less logic and the counter is functionally correct when reprogrammed. When counting down there will be two counts: Old N and new N. When counting up there will be three counts: Old N, random and new N. Granted, it is a bit tricky to get the trigger from a down counter to match an up counter exactly, but you only have to figure out how to do it once.
Dividing by constants
A lot of people and EDA Linting tools are scared of the division operator. Some are even scared of multiplication by a constant. The truth is that synthesis tools are surprisingly good at constant multiplication or division by a constant as long as there is only one constant. A mux between data and a constant will break the optimization.