Category Archives: Uncategorized

Combined Binary and Decimal Floating-Point Unit

I have noticed that the university server which hosts my thesis has disappeared from the internet, thus making my thesis unavailable. I have also noticed that this thesis has 8 citations, so in the interest of preserving history, I am making it available again below. This thesis is a decade old, so newer research is most likely much better, but section 2 and 3.1 still have some value.

Abstract

There has been renewed interest in a number system for integrated circuits
called decimal floating-point. There are inherent problems in today’s binary
floating-point numbers that require financial computations to be performed
in software which is 100-1000 times slower than a hardware based equivalent.
For example adding a 5% tax to a $0.70 phone call will round off wrongly in
binary floating-point.
With the increase in transistor count surpassing the increase in clock
speed on modern computer processors, manufacturers are looking into new
strategies for speeding up computations including the financial ones and a
commercial processor with decimal floating-point already exists. Despite the
interest, the problems of decimal arithmetic slower and requiring more area
than binary remain.
This thesis explores the possibility of combining binary and decimal
floating-point units thereby saving area without sacrificing too much speed.
In the course of this investigation a novel decimal fused multiply-add (FMA)
based floating-point unit is developed and combined with a known binary
FMA algorithm. This FMA primarily uses known components, but one major
component – the decimal leading one predictor – is new.
Synthesis results show that the latencies for the binary and decimal paths
are comparable to current solutions, but the area used is much larger than the
individual units.
In order to make the FMA fulfill the goals of this thesis, further research
should look into reducing area in the FMA. Before the area is reduced
significantly, it is not recommended to use a combined binary and decimal
floating-point unit.

Master-Thesis-Peter-Monsson-final-web

A do it yourself metal mask programmable ROM

Leave a reply

Gate array ECO standard cells are a relatively new phenomenon which haven’t gotten much attention in the world of Digital RTL Design. They are pretty neat in that you can change a filler or decap cell to a different standard cell by changing only the metallization of a chip that is the masks for the back-end-of-line of chip production. This reduces cost and time to market for a respin. With mask costs reaching multi-million dollars for a single tape-out this means that you can easily save a million dollars through the help of gate level arrays.

Mask programmable ROMs are similar to gate level arrays in that you only need to change the metallazation of an existing chip to enable new functionality. The change is usually limited to just a via layer between two metal layers in order to keep costs at a minumum.

The insight here is that you can get same benefit of having a mask programmable ROM by using gate level arrays without having to source an externally developed metal programmable ROM or build a custom one yourself. Also, you will not need special BIST or production test requirements that you will have to spend engineering time on handling before tapeout. The downside is that the gate array standard cell is twice the size of a normal standard cell and probably larger than a single bit cell of a metal mask programmable ROM. Also, you will probably have to change metal1 and the contact layer which increases cost for a new mask set compared to the classical ROM version.

Pros

No external sourcing
No custom ROM development
No BIST or special product test requirement

Cons

Higher area
Higher metal mask cost if you need to change anything

How to create your own metal mask programmable ROM

Pick two gate array cells – one inverter and one buffer – of the same size and your standard N-input mux and stick a the correct gate array buffer or inverter output onto the input of N-input mux. Tie the inputs of gate array cells to zero or to the output of a flop (for example for test or debug purposes). You can also optimize this by combining the first stage of the mux with the buffer/inverter by using a nand2/nor2 gate array pair. You will need to fiddle around a little with some truth tables to make it work out. There are a couple of possible combinations one of which may work better for your specific case, so you may have to play with it a little.

What is the value of improving an engineer by 10%?

Leave a reply

The value of improving an engineer is directly related to the fully loaded cost of an engineer, that is, salary and the overhead of employing said engineer.

Salary in 2018 for a low experience engineer doesn’t get much lower than $80,000/year while a senior engineer with 8-12 years of experience goes for about $120,000/year on average.

The overhead of employing an engineer is often measured as a multiplier. While some things such as office costs don’t scale with salary, other things such as software licenses scale much faster than salary since a more experienced and more valuable engineer is much more productive and makes more effective use of software thereby providing better returns for the employer. Thus even for an expensive engineer the overhead is at least a x1.5 multiplier for very lean companies while most go by the x2 rule of thumb.

In total, this makes the fully loaded cost of an engineer between $120,000/year to $240,000/year or $10,000/month to $20,000/month. Improving the performance of an engineer by 10% thus saves the company $1,000/month to $2,000/month or $12,000/year to $24,000/year.

References

SystemVerilog Libraries

2 Replies

The standard library for SystemVerilog is tiny and inadequate. Fortunately, the industry has improved the state of the world a little. Below is a curated list of libraries for SystemVerilog design and verification.

Design

OH! Open Hardware for Chip Designers A kitchen sink of RTL code. The common area is the most interesting part. Other parts are very application specific. MIT License

Verification

UVM The state of the art verification framework for simulation. The killer feature is the register layer. Replaces VMM and OVM. You want to be using this. Apache License
svlib String and file handling functions that SystemVerilog is missing. Also Regular expressions and YAML parsing. Apache License
ClueLib String handling and collections. Also some networking related methods. MIT License
OVL The forerunner to SystemVerilog Assertions. A bit dated today. Apache License
SVUnit Unit testing for your testbench code. Apache License.
SVAUnit Unit testing for your SystemVerilog Assertions. Apache License.

A SystemVerilog Assertions Checklist and Cheat Sheet

Leave a reply

I have been meaning to write a SystemVerilog Assertions (SVA) Cheat Sheet for about 5 years. It was one of those things that I kept putting off because I didn’t know how I wanted to approach it. There are many features in SVAs and most of them aren’t used much. I couldn’t get myself to list all the features of the IEEE1800-2012 standard in a condensed format, because it would serve me no purpose. The simple assertions are the easiest to understand, easiest to get right and they simulate the fastest, so a cheat sheet should focus on that.

I created a SystemVerilog Assertions Checklist in the spring of 2015 and thought that it would be useful to combine the checklist and a cheat sheet into a useful whole. I have been using this combination since and tested it on a few coworkers. So far it appears to be useful. I think that it would be a good tool for relatively inexperienced SVA practitioners as it makes low level verification a more mechanical than intellectual effort. Please let me know if this is the case or if it falls short of that.

The assertions in the checklist are all valid SystemVerilog 2005. They should work in both simulators and in formal verification engines, but the with an emphasis on simulation performance. Download it, print it out and put it next to you on your desk. Happy coding:

SystemVerilog Assertions Checklist Cheat Sheet v0.3

A Checklist for SystemVerilog Assertions

Leave a reply

SystemVerilog Assertions (SVAs) are effective for low level verification. UVM is effective for high level verification. Unfortunalely, the high level concerns cloud my critical thinking and I forget what low level issues that I should be concerned about. Fortunately, many SVAs are mostly mechanical applications of a few simple concepts. I have listed my checklist for SVAs when writing RTL below. I look at it for inspiration when I feel that I should add another assertion.

Item	Comments
Registers and outputs on reset
No unknown control signals	Input and Output
No unknown data in transactions	Input and Output
Restricted data ranges	F. x. address ranges – math functions – data relationships
Restricted enumerations	Enum and control signal combinations
Valid handshake protocols	Every interface – spurious ack – cause has effect – data stable
Time	Clock periods – setup/hold time – glitches – CDC – rate control
Invariants	Data integrity – overflow/underflow – FSM transitions

The first assertions in the list are the easiest to write. The last ones are the hardest to write.

Fighting my parser fear

Leave a reply

I admit it. I have parser fear. My parser fear is rooted in multiple fears that create a big indescribable monster that I can’t put aside. I think this is a shame, because I see value in external domain specific languages (DSLs) beyond what embedded DSLs can provide in terms of succinctness and robustness for the end user. So in order to overcome my parser fear, I have written down all my fears and concerns related to parsers and DSLs so that I can tackle them individually and fight this fear. This post is in part a description of my fears and in part my current solution at overcoming it.

It has been a long time

I have two experiences with writing parsers in university courses back in 2005 and 2006. The first experience was not good at all. It was a mess of spaghetti code and I didn’t understand what was really going on. The second experience was much better. There I learnt how to write a lexer, parser and put it into an abtract syntax tree (AST) so that the code was easily evaluated. Unfortunately the second one involved much handholding, so I am afraid that my own code is going to devolve into an unmaintainable mess of spaghetti code now that I am on my own.

I realize that much of the hand-holding came from having a good model in the AST. I also realize that if I create a good old fashioned model, then I will have a much easier time creating good, maintainable software. There is nothing fancy in this at all. Some call the model a semantic model, but I think of it as the M in MVC even when I am writing command line programs. Specifically, the structure of the command line program is as follows in pseudo code

main:
  command = parse_cmd_line(argv)
  model = parse(tokenize(command.files))
  model.validate()
  for each generator
    generate(command, model)

This is about as straight forward as it can be and it gives a decent amount of structure to the program. Different tasks are split into different sections and there is a clear split from getting all required into to model and generation based on the model.

The pseudo code leaves out error handling which is a sore spot on my part and it is not clear whether there is room for sharing code in the generators. I suspect that there is, but I haven’t gotten into that yet. I feel that I could use good template language to overcome some of the problems inherit in my current generators. So far this hub and spoke design looks reasonable.

Designing DSLs

I don’t have any experience with designing domain specific languages. Specifically, I am afraid that I will not be able to extend a DSL in a reasonable fashion if I design a succinct language

I haven’t solved this problem at all, but here is what I am going to do, to see if I can’t overcome this problem. My plan is to write down the EBNF for the language that I am about to design and create examples that fit into the EBNF in order to see if they are succinct. I can then annotate my thoughts on how to extend the language next to the EBNF where I think that it would make sense to extend the language if needed. I can look at the EBNF and the examples in order to see if I need to do a shorthand/longhand distinction in some of the code and whether that is feasible with the current language design and the parser that I have at hand.

Selecting a parser library

Most programming languages have multiple parser libraries to choose from, thus I fear that I will end up choosing the wrong parser for the job. Ideally I want a well documented library, where problems are easy to resolve when I invariably get myself into trouble. Having the ability to go onto stackoverflow to get help will be nice. Furthermore, it would be nice if error messages would be useful to the end user. Finally, the parser must be strong enough for me to be able to express the languages that I intend to create. Here I am at a loss. Is recursive decent aka. LL(*) good enough. Will I need LR or GLR and what is PEG? What if I select the wrong parser? How screwed am I?

I don’t think that there is any other way than to pick a library and just try it out. If I alreay have the EBNF and the model, then the damage that a wrong choice can make is pretty limited. I am going to have to invest some time in any parser library, so I don’t understand why I just sink some time into finding a qualified library.

Wasting my time

My final big fear is what if this isn’t worth it? What if spend too much time fighting the parser library? What if the time invested creating the DSL, the program and the testing exceeds the time that I save from using the program? If I want others to benefit from the program then I will need to get management buy-in. If I can’t show break even for myself, then it gets hard to justify the time spend educating others, writing documentation and supporting the code base.

I think that the solution to this problem is solved in part by experience and in part through standard project management. Experience make me better at creating DSLs and programs that parse them. It will reduce the time I need to complete a project or at least get it into a viable state from where I can reap the benefits. Plain old project management will hopefully resolve the initial qustion whether it is worth doing. I don’t see a problem sitting down for a few hours to write up a short project proposal, estimate the time required and estimating the value of the benefit incurred. If I can’t convince myself of an actual benefit after that point, then there is no reason to go ahead with the project and waste much more time on it.

The project proposal is basically the introduction page of the documentation. It answers the questions

What is it?
What does it do?
What are the goals?
What are the benefits?
What does it not do?
What is future development which is not included in the first release?

Examples of benefits for the kind of work I do are

Correct by construction
Seamlessly works with all our tools
Abstraction over implementation details on …

After having written the proposal I can create a work break down and estimate the time required like I do on every project. An initial break down will probably have the following parts:

Model
EBNF development
Lexer
Parser
Validation
Internal algorithms
An entry for each generator
Unit testing
System testing
Documentation
Examples
Presentations

Finally I will need to estimate the return of the investment. How much time do I currently spend on this per year? How much time will I spend in the future? Usually there is no positive return of investment in the projects that I consider when looking at the initial effort alone. The return usually kicks in in maintenance scenarios and in testing. Quantifying these two scenarios will be tricky. One way to get around this is to quantify the conditions under which ROI will be positive.

Summary

I think that I have tackled my parser fear to a reasonable degree. It was a great exercise to write down my fear, divide it into individual fears and tackle them one by one. I believe that I have found some methods to become a better engineer.

Linting in 2015

Leave a reply

Linting tools don’t get much love from design engineers. The reasons for this are valid. Linting is often an afterthought after verification is essentially complete. The rule set is often the default rules from the vendor or an old unmaintained one that was put together by a junior engineer many years ago. The tools don’t get much attention from vendors and language support is often lacking. All in all linting becomes a tedious process of weeding out irrelevant warnings in the search for that one bug which can make the last weeks of pain and suffering worthwhile.

This is a shame. Linting or static analysis should be a fast and effective bug hunting tool. Lint should be the the second tool to be run on RTL, just after elaboration. I believe that I can find a way to make lint find the simple and stupid bugs that I write in a matter of seconds with a low false positive rate. My personal goal for 2015 is to find that way.

For 2015 I envision that my verification work flow will look like this:

elaboration
high impact lint
Quick formal check of embedded SystemVerilog Assertions
Personal code review
UVM testbench and simulation
Exhaustive lint

When it is time for the personal code review, I will kick off an elaboration in our synthesis tool. Full synthesis and LEC will probably start while developing the UVM testbench. There is no reason to start before this time as buggy RTL often results in completely incorrect synthesis reports when big chunks get optimized away.

By structuring my workflow this way, I believe that I can get the highest possible code quality at the earliest possible moment throughout a project.

High impact lint

High impact lint is normal linting with a personal rule set that actually matters. I am going to take a standard rule set and simply remove rules that I find noisy. I see five levels of warnings that I linter can emit:

Definitely a bug (Latches, combinatorial loops)
Almost always a bug
Could be a bug, I don’t know
Almost never a bug
Not a bug, why is this a rule?

Linters are usually faster when the rules are fewer, so I am inclined to keep only the first level, but I am hesitant to leave out level 2. I should leave out level 3 and below, but I am not sure that I successfully will be able to.

I know for sure that the rule set that I will use is for my use only. I don’t expect anyone else to use it. I want full control over the rule set so that I can maintain a high impact from the linting. Other people make different mistakes, have different coding styles and want different rules to apply in order to detect problems. I don’t want to spend time discussing with others whether a specific rule should be present in my rule set.

I don’t know whether I should keep score on the rules. I am inclined to do so, but I don’t know if I will find this too tedious. The alternative is to keep two rule sets, put level 2 and 3 on the candidate rule set and check the candidate set when it is time for the personal code review.

Formal verification

I am not sold on formal verification, yet. Formal verification has its place in base components and module interfaces, but it fails in the messy real world of chip design. Still, if I can verify simple SystemVerilog Assertions that I have embedded into my design before writing a testbench or running a single simulation, why should I not do that? The worst case is that I can’t prove or disprove any of the assertions and I have wasted a few minutes.

I have only a little experience with formal verification, but my initial findings are that I find a bug somewhere with low effort and that I write more assertions to check more things because it is faster that way. It feels a lot like the first time I run simulation with a new testbench, just that the feedback cycle is shorter as I write short assertions instead of testbench code.

Personal code review

Code reviews has been shown to have one of the highest bug detection rates per hour invested. There is no reason why I should not do code reviews. Some structured software development approaches place code review just after compilation, but I don’t think that this is a pragmatic place to put it. I prefer that the code review comes after the simple automatically detectable problems have been fixed. This allows me to focus on logic problems. The personal code review is not as strong as code reviews done by others, but it doesn’t require buy-in from others so that is what I am going with for 2015.

UVM testbench

Constrained random verification (CRV) is the gold standard of functional verification. There is no way that I would sign-off on a design without an UVM testbench with sufficient code coverage and a finished verification plan. Having said that, there are disadvantages to CRV. UVM testbenches with all the checks and tests take a significant amount of time to write and run. I expect CRV to be the tool that takes me to 100%, but I don’t see a reason why I should start with CRV at 0% and debug simple logic bugs with it. I want to hunt for the hard and nasty bugs with UVM, not the simple and stupid ones. If I can be at 50% with high impact lint, quick formal verification and code review when writing the UVM testbench then I will be very happy.

Exhaustive lint

There is no way around it. Someone is going to require that I wade through an endless amount of lint warnings from an ineffective linting rule set at some point. The goal is to push this task until the end of the project where it can do the least harm. The end of a project is heavier on computer cycles than engineering cycles. It is a good time to go through warnings while waiting for tools to complete. Exhaustive lint is also the lowest value task. If schedule pressure demands that something gets cut then the exhaustive lint is the correct task to remove.

Wrapping up

I think that I have found a way to improve my productivity by incorporating linting into my verification work flow. I am starting on a new project in 2015 where I can test my new work flow. I will need to change the way I write RTL to suit the new tools that I am going to use, but I can live with that. I hope that linting first will give higher quality code at an earlier point in time.

SystemVerilog Assertion for thermometer code

Leave a reply

A lot of analog circuits as well as a few digital ones use thermometer coding. For years I have had the suspicion that there was a really simple way to write an assertion that checked a bit-vector for thermometer coding. It hasn’t been a priority, so I didn’t think too much about it, but today it hit me: A bit vector is a thermometer code if the code plus 1 is a one-hot signal or power of two number. I feel stupid. This is simply the reverse of the bit-twiddling hack to check for a power of two number. So without further ado, the thermometer code checking assertion:

a_is_therm_code: assert property (
  @(posedge clk) disable iff (!rst_n)
  $onehot0(code+1)
);

And the reverse version if the msbs are ones (bit inversion plus one is negation):

a_is_rev_therm_code: assert property (
  @(posedge clk) disable iff (!rst_n)
  $onehot0(-code)
);

Small RTL notes

Leave a reply

Below are some small lessons learned from experimenting with the abilities and limitations of synthesis tools. I won’t claim that it works for all of them, but it works for a least the one that I work with right now.

Muxing

Leading edge synthesis is able to optimize multiple mathematical operations into carry-save logic, but it quickly breaks down. Adding a mux between two chains of arithmetic in order to fuse them into a single result will break the optimization. Prototype your data path design in a separately synthesized module to get best the quality of results before writing the actual RTL.

Muxing a non power of two array with result = array[idx]; will not only result in simulation vs synthesis matches, but also give you worse area than using a n-input mux such as

wire [N-1:0] onehot = 1'b1 << idx;
always_comb begin
  result = i;
  for (int i = 0; i < N; i++) begin
    if (onehot[i]) result |= array[i];
  end
end

Create your own mux_n module. You will use it a ton. For loops over 256 incur a pretty heavy runtime and memory penalty when the synthesis tool maps to generic logic cells even when timing closure is a breeze. Synthesis tools have a cap on the maximum number of iterations (around 1000) in a loop partially for this reason. Other EDA tools, such as Linters, have similar caps. Potentially large for loops such a mux_n module can be structured into multiple modules through generate statements and recursion. Wait with the generate and recursion until you need it. You will have to test it a lot.

Multi-dimensional arrays

Multi-dimensional arrays are great as long as they are sized in powers of two and you don’t need to map them into a linear space for access from a CPU. When none of the dimensions are a power of two or when you need CPU access to the array then you have to map to a linear space. Fortunately, a simple mapping scheme such as

assign idx_z = SIZE_A*idx_b + idx_a;

is optimal in terms of area both when the the array is sized in powers of two and when it isn’t. When the array is sized in powers of two, the result is exactly the same as

assign idx_z = {idx_b, idx_a};

There is no loss in quality of results. We only incur the overhead of an adder when SIZE_A isn’t a power of two.

Programmable Counters

Always count from N down to zero instead of counting up to N-1. It requires less logic and the counter is functionally correct when reprogrammed. When counting down there will be two counts: Old N and new N. When counting up there will be three counts: Old N, random and new N. Granted, it is a bit tricky to get the trigger from a down counter to match an up counter exactly, but you only have to figure out how to do it once.

Dividing by constants

A lot of people and EDA Linting tools are scared of the division operator. Some are even scared of multiplication by a constant. The truth is that synthesis tools are surprisingly good at constant multiplication or division by a constant as long as there is only one constant. A mux between data and a constant will break the optimization.