VLSI Physical Design: July 2016

Wednesday, July 27, 2016

Basic Static Timing Analysis : "Setup and Hold Time"

What is Setup and Hold Time?

To understand the origin of the Setup and Hold time concepts first understand it with respect to a system as shown in the fig. An input DIN and external clock CLK are buffered and passed through combinational logic before they reach a synchronous input and a clock input of a D flipflop (positive edge triggered). Now to capture the data correctly at D flip flop, data should be present at the time of positive edge of clock signal at the C pin .
Note: Here we are assuming D flip flop is ideal so Zero hold and setup time for this.

There maybe only 2 condition.
Tpd DIN > Tpd Clk
- For capture the data at the same time when Clock signal (positive clock edge) reaches at pin C, you
have to apply the input Data at pin DIN "Ts(in) = (Tpd DIN - Tpd Clk)" time before the positive
clock edge at pin CLK.
- in other word, at DIN pin, data should stable "Ts(in)" time before the positive clock edge at CLK pin.
- This time "T(s)in" is know as the setup time of the system.

Tpd DIN < Tpd Clk
- For capture the data at the same time when clock signal (positive clock edge ) reaches at pin C, input data at pin DIN should no change before "Th(in) = Tpd Clk - Tpd DIN" time. If it will
change, positive clock edge at Pin C will capture the next data.
- in other word, at DIN pin, Data should be stable "Th(in)" time after the positive clock edge at CLK pin.
- This time "Th(in)" is know as Hold Time of the System.

For the above condition it looks that both the condition can't exist at the same time and you are right.
But we have to consider a few more things in this.
- Worst case and best case (max delay and min delay)
- because of environment condition or because of PVT, we can do this analysis for the worst case (max delay) and best case (min delay) also.
- shortest path or longest path (min delay and max delay)
- if combinational logic has multiple paths, we have to do this analysis for the shortest path (min delay) and longest path (max delay) also.

So we can say that above condition can be like this.
- Tpd DIN (max) > Tpd Clk (min)
- setup time == Tpd DIN (max) - Tpd Clk (min)
- Tpd DIN (min) < Tpd Clk (max)
- hold time == Tpd Clk (max) - Tpd DIN (min)

for example for combinational logic delays are:
data path (max, min) = (5ns, 4ns)
clock path (max, min) = (4.5ns, 4.1ns)
then setup time = 5 - 4.1 = 0.9ns
hold time = 4.5 - 4 = 0.5ns

Now similar type of explanation we can give for a D flip-flop. There is a combinational logic between C and Q, between D and Q of the flipflop. There are different delays in those combinational logic and based on there max and min value, a flipflop has setup and hold time. One circuitry of the positive edge triggered D flip is shown below.

There are different ways for making the D flip-flop. Like by JK flip-flop, master slave flipflop, Using 2 D type latches etc. Since the internal circuitry is different for each type of flipflop, setup and hold time is different for every flip-flop.

Definition:
Setup Time:
- setup time is the minimum amount of time the data signal should be held steady before the clock event so that the data are reliably sampled by the clock. This applied to synchronous circuits such as flipflop.
- or in short I can say that amount of time the Synchronous input (D) must be stable before the active edge of the clock.
- The time when the input data is available and stable before the clock pulse is applied is called setup time.

Hold Time:
- Hold time is the minimum amount of time the data signal should be held steady after the clock event so that the data are reliably sampled. This applies to synchronous circuits such as the flipflop.
- or in short I can say that amount of time the synchronous input (D) must be stable after active edge of the clock.
- the time after clock pulse where data input is held stable is called hold time.

setup and hold violation:
in simple language-
if setup time is Ts for a flip-flop and if data is not stable before the Ts time from active edge of the clock, there is a setup violation at that flipflop. So if data is changing in the non-shaded area ( in the above figure) before active clock edge, then it's a setup violation.
And if hold time is Th for a flipflop and if data is not stable after Th time from active edge of the clock, there is a hold violation at that flipflop. So if data is changing in the non-shaded area ( in the above figure) after active clock edge, then it's a hold violation.

Friday, July 8, 2016

Static Timing Analysis: Timing Paths (2)

Clock Gating Path:
Clock Path may be passed through a "gated element" to achieve additional advantages. In this case, characteristics and definitions of the clock change accordingly. We call this type of clock path as "gated clock path".
As in the following fig you can see that,

LD pin is not a part of any clock but it is using for gating the original clock signal. Such type of paths are neither a part of clock path nor of data path because as per the Start Point and End Point definition of these paths, its different. So such type of paths are part of clock gating path.

Asynchronous Path:

A path from an input port to an asynchronous set or clear pin of sequential element.

See the following fig for understanding clearly,

As you know that the functionality of set/reset pin is independent from the clock edge. Its level triggered pins can start functioning at any time of data. So in other way we can say that this path is not in synchronous with the rest of the circuit and that's the reason we are saying such type of path Asynchronous Path.

Other Type of Paths:

There are few more type of paths which we usually use during timing analysis reports. Those are subset of above mention paths with some specific characteristics. Since we are discussing about the timing paths, so it will be good if we will discuss those here also.

Few Names are

critical path

false path

multi-cycle path

single cycle path

launch path

capture path

longest path (also know as Worst Path, Late Path, Max Path, Maximum Delay Path)

shortest path (also know as Best Path, Early Path, Min Path, Minimum Delay Path)

Critical Path:

In short, we can say that the path which creates longest delay is the critical path.

- Critical Paths are timing sensitive functional paths, because of the timing of these paths is critical, no additional gates are allowed to be added to the path, to prevent increasing the delay of the critical paths.

- Timing critical path are those path that do not meet your timing. What normally happens is that after synthesis the tool will give you a number of path which have a negative slag. The first thing you would do is to make sure those path are not false or multi-cycle path since it that case you can just ignore them.

Take a typical example ( in a very simpler way), the STA tool will add the delay contributed from all the logic connecting the Q output of one flop to the D input of the next (including the CLK->Q of the first flop), and then compare it against the defined clock period of the CLK pins ( assuming both flops are on the same clock, and taking into account the setup time of the second flop and the clock skew). This should be strictly less than the clock period defined for that clock. If the delay is less than the clock period , then the ' path meets the timing'. If it is greater, then the 'path fails timing'. The 'critical path' is the path out of all the possible paths that either exceeds its constraint by the largest amount, or , if all paths pass, then the one that comes closest to failing.

False Path:

- Physically exist in the design but those are logically/functionally incorrect path. Means no data is transferred from Start Point to End Point. There are maybe several reasons of such path present in the design.

- Some time we have to explicitly define/create few false path with in the design. E.g, for setting a relationship between between 2 Asynchronous Clocks.

- The goal in static timing analysis is to do timing analysis on all "true" timing paths, these paths are excluded from timing analysis.

- Since false path are not exercised during normal circuit operation, they typically don't meet timing specification , considering false path during timing closure can result into timing violations and the procedure to fix would introduce unnecessary complexities in the design.

- There may be few paths in your design which are not critical for timing or masking other paths which are important for timing optimization , or never occur with in normal situation. In such case, to increase the run time and improving the timing result, sometime we have to declare such path as a False Path, so that Timing analysis tool ignore these paths and so the proper analysis with respect to other paths. Or During optimization don't concentrate over such paths. One example of this, e.g A path between two multiplexed blocks that are never enabled at the same time. You can see the following the picture for this.

Here you can see that False path 1 and False Path 2 can not occur at the same time but during the optimization it can effect the timing of another path. So in such scenario, we have no define one of the path as false path.

Same thing I can explain in another way, (Note- Took snapshot from one of the forum). As we know that, not all paths that exist in a circuit are "real" timing paths. For example, let us assume that one of the primary inputs to the chip is a configuration input ; on the board it must be tied to either to VCC or to GND. Since this pin can never change, there are never any timing events on that signal. As a result, all STA paths that start at this particular start point are false. The STA tool (and the synthesis tool) cannot know that this pin is going to be tied off, so it needs to be told that these STA paths are false, which the designer can do by telling the tool using a "false_path" directive. When told that the paths are false. the STA tool will not analysis it ( and hence will not compare it to a constraint, so this path can not fail), nor will a synthesis tool do any optimization on that particular path to make it faster; synthesis tools try and improve paths until they "meet timing" - since the path is false, the synthesis tool has no work to do on this path. Thus, a path should be declared false if the designer KNOWS that the path in question is not a real timing path, even though it looks like one to the STA tool. One must be very careful with declaring a path false. If you declare a path false , and there is ANY situation where it is actually a real path, then you have created the potential for a circuit to fail, and for the most part, you will not catch the error until the chip is on a board, and (not) working. Typically , the false path exists

- from configuration inputs like the one described above

- from "test" inputs, inputs that are only used in the testing of the chip, and are tied off in normal mode ( however, there may still be some static timing constraints for the test mode of the chip).

- from asynchronous inputs to the chip ( and you must have some form of synchronizing circuit o this input) ( this is not an exhaustive list, but covers the majority of legitimate false paths).

So we can say that false paths should NOT be derived from running the STA tool ( or synthesis tool); they should known by the designer as part of the definition of the circuit, and constrained accordingly at the time of initial synthesis.

MultiCycle Path:

- A multi-cycle path is a timing path that is designed to take more than one clock cycle for the data to propagate from the startpoint to endpoint.

A multi-cycle path is a path that is allowed multiple clock cycles for propagation. Again, it is a path that starts at a timing startpoints and ends at timing ends point. However, for a multi-cycle path, the normal constraint on this path is overridden to allow for the propagation to take multiple clocks.

In the simplest example, the startpoint and endpoint are flops clocked by the same clock. The normal constraint is therefore applied by the definition of the clock; the sum of all delays from the CLK arrival at the first flop to the arrival at the D at the second clock should take no more than 1 clock period minus the setup time of the second flop and adjusted for clock skew.

By defining the path as a multi-cycle path you can tell the synthesis or STA tool that the path has N clock cycles to propagate; so the timing check becomes "the propagation must be less than

N x clock-period, minus the setup time and clock skew", N can be any number greater than 1.

Few example are

- When you are doing clock crossing from two closely related clocks; ie, from 30Mhz clock to 60 Mhz clock ,

-assuming the two clocks are from the same clock source (i.e. one is the divided clock of the other), and the two clocks are in phase.

-The normal constraint in this case is from the rising edge of the 30MHz clock to the nearest edge of the 60MHz clock, which is 60 ns later. However, if you have a signal in the 60MHz domain and that indicates the phase of the 30MHz clock, you can design a circuit that allows you for the full 33ns for the clock crossing, then the path from flop30 -> flop60 is a MCP (again with N = 2).

- The generation of the signal 30MHz_is_low is not trivial, since it must come from a flop which is clocked by the 60MHz clock, but show the phase of the 30MHz clock.

- Another place would be when you have different parts of the design that run at different, but related frequencies. Again, consider a circuit that has some stuff running at 60MHz and some running on a divided clock at 30MHz

- Instead of actually defining 2 clocks, you can use only the faster clock, and have a clock enable that prevents the clock in the slower domain from updating every other clocks.

- Then all the path from the "30MHz" flops to the "30MHz" flops can be MCP.

- This is often done since it is usually a good idea to keep the number of different clock domains to a minimum.

Single Cycle Path:

A single cycle path is a timing path that is designed to take only one clock cycle for the data to propagate from the startpoint to endpoint.

Lunch Path and Capture Path:

Both are inter-related so I am describing both in one place. When a flip-flop path to flip-flop path such as UFF1 to UFF3 is considered, one of the flip-flop launches the data and other captures the data. So here UFF1 is referred to "launch flip-flop" and UFF3 referred to "capture flip-flop".

These launch and capture terminology are always referred to a flip-flop to flip-flop path. (Means for this particular path (UFF1 > UFF3), UFF1 is launch flip-flop and UFF3 is capture flip-flop. Now if there is any other path starting from UFF3 and ends to some other flip-flops (let's assume UFF4), then for that path UFF3 become launch flip-flop and UFF4 be as capture flip-flop.

The name "launch path" referred to a part of clock path. Launch path is launch clock path which is responsible for launching the data at launch flip-flop.

And similarly capture path is also a part of clock path. Capture path is capture clock path which is responsible for capturing the data at capture flip-flop.

This is can be clearly understood by following fig,

Here UFF0 is referred to launch flip-flop and UFF1 as capture flip-flop for "data path " between UFF0 and UFF1 . So start point for this data path is UFF0/CK and end point is UFF1/D.

One thing we want to add here,

- Launch path and data path together constitute arrival time of data at the input of capture flip-flop.

- Capture clock period and its path delay together constitute required time of data at the input of capture register.

Note: It's very clear that capture and launch path are correspond to Data path. Means some clock path can be a launch path for one data path and be a capture path for another datapath . It will be clear by following fig,

Here you can see that for Data Path1 the clock path through BUF cell is a capture path but for Data Path2 its a launch path.

Longest and shortest Path:

Between any 2 points , there can be many paths.

Longest path is the one that takes longest time, this is also called worst path or late path or a max path.

The shortest path is the one that takes the shortest time; this is also called the best path or early path or a min path.

In the above fig, the longest path between the 2 flip-flop is through the cells UBUF1, UNOR2, and UAND3. The shortest path between the 2 flip-flops is through the cell UNAND3.

Static Timing Analysis: Timing Paths

Static Timing Analysis is a method of validating the timing performance of a design by checking all possible paths for timing violations under worst-case conditions. It considers the worst possible delay through each logic element, but not the logical operation of the circuit.

In comparison to circuit simulation, static timing analysis is
. faster -- It is faster because it does not need to simulate multiple test vectors.
. more thorough -- It is more thorough because it checks the worst-case timing for all possible logic conditions, not just those sensitized by a particular set of test vectors.

Once Again Note Those Thing: Static Timing Analysis checks the design only for proper timing, not for correct logical functionality.

Static Timing Analysis seeks to answer the question, "Will the correct data be present at the data input of each synchronous device when the clock edge arrives, under all possible conditions?"

In static timing analysis, the word static alludes to the fact that this timing analysis is carried out in an input independent manner. It locates the worst-case delay of the circuit over all possible input combinations. There are huge of numbers of logic paths inside a chip of complex design. The advantage of STA is that it performs timing analysis on all possible paths ( whether they are real or potential false paths)
However, it is worth noting that STA is not suitable for all design styles. It has proven efficient only for fully synchronous designs. Since the majority of chip design is synchronous, it has become a mainstay of chip design over the last few decades.

The Way STA is performed on a given circuit:
To check a design for violations or say to perform STA there are there are 3 main steps:
. Design is broken down into sets of timing paths.
. Calculates the signal propagation delay along each path.
. And checks for violations of timing constraints inside the design and at the input/output interface.

The STA tool analyzes ALL paths from each and every startpoint to each and every endpoint and compares it against the constraint that (should) exist for that path. All path should be constrained, most paths are constrained , most paths are constrained by the definition of the period of the clock, and the timing characteristics of the primary inputs and outputs of the circuit.

Before we start this we should know a few key concepts in STA method:
timing path, arrive time, required time, slack and critical path.
Let's talk about these one by one in detail.

Timing Paths:
Timing Paths can be divided by as per the type of signals ( e.g. clock signal, data signal, etc)

Type of Paths for timing analysis :
- Data Path
- Clock Path
- Clock Gating Path
- Asynchronous Path

Each Timing Path has a "start point" and an "end point". Definition of Start Point and End Point vary as per the type of the timing path-. The Start Point is a place in the design where data is launched by a clock edge. The data is propagated through combinational logic in the path and then captured at the endpoint by another clock edge.

Startpoint and Endpoint are different for each type of paths. It's very important to understand this clearly to understand and analysing the Timing Analysis Report and Fixing the timing violation.

- Data Path
- Start Point
Input Port of the Design ( because the input data can be launched from some external source).
Clock Pin of the flip-flop/latch/memory ( sequential cell)
- End Point
Data Input Pin of the flip-flop/latch/sequential cell)
Output Port of the design (because the output data can be captured by some external sink).

- Clock Path
- Start Point
Input Port of the design
- End Point
Set/Reset/Clear Pin of the flip-flop /latch/sequential cell

Data Paths:

If we use all the combinational of 2 types of Starting Point and 2 types of End Point, we can say that there are that there are 4 types of timing paths on the basis of Start Point and End Point.
- Input Pin/Port to Register (flip-flop).
- Input Pin/port to Output Pin/Port
- Register (flip-flop) to Register (flip-flop)
- Register (flip-flop) to output pin/port.

Please see the following fig:

PATH1 - starts at an input port and ends at the data input of a sequential element.(input port to register)
PATH2 - starts at the clock pin of a sequential element and ends at the data input of a sequential element. ( register to register )
PATH3 - starts at the clock pin of a sequential element and ends at an output port. (Register to output port)
PATH4 - starts at an input port and ends at an output port. ( Input port to output port).

Clock Path:
Please check the following fig:

In the above fig its very clear that for the clock path the starts from the input port/pin of the design which is specific for the Clock input and the end point is the clock pin of sequential element. In between the Start point and the end point there may be lots of buffers/inverters/clock divider.

Saturday, July 2, 2016

setup and hold time violations

Few important things to note down here -
-- Data is launched from FF1/D to FF1/Q at the positive clock edge at FF1/C.
-- At FF2/D, input data is coming from FF1/Q through a conbinational logic.
-- Data is capturing at FF/D, at the positive clock edge at FF2/C
-- So I can say that launching flip-flop is FF1 and capturing flip-flop is FF2.
-- So data path is FF1/C --> FF1/Q --> FF2/D
-- For a single cycle circuit, Signal has to be propagate through data path in one clock cycle. Means if data is launched at time = 0ns from FF1 then it should be captured at time = 10 ns by FF2.
So for setup analysis at FF2, data should be stable "Ts" time before the positive edge at FF2/C. Where "Ts" time is the setup time of FF2.
-- If Ts = 0ns, then, data launched from FF1 at time = 0ns should arrive at D of FF2 before or at time
= 10ns. If data takes too long ( greater than 10ns) to arrive ( means it is not stable before clock edge
at FF2), it is reported as setup violation.
If Ts = 1 ns, then, data launched from FF1 at time = 0ns should arrive at D of FF2 before or at time = (10ns - 1ns) = 9ns . If data takes too long (greater than 9ns) to arrive ( means it is not stable
before 1 ns of clock edge at FF2), it is reported as setup violation.

For hold analysis at FF2, data should be stable "Th" time after the positive edge at FF2/C, where "Th" is the hold time of FF2. Means there should not be any change in the input data at FF2/D between positive edge of clock at FF2 at Time = 10ns and Time = 10ns + Th.
-- To satisfy the hold condition at FF2 for the data launched at FF1 at 0ns, the data launched by FF1 at 10ns should not reach at FF2/D before 10ns + Th.
-- If Th = 0.5 ns, then we can say that the data launched from FF1 at time 10ns does not get propagated so soon that it reaches at FF2 before time (10+0.5) = 10.5ns ( or say it should reach from FF1 to FF2 with in 0.5ns ). If data arrive so soon ( means with in 0.5ns from FF1 to FF2, data can't be stable at FF2 for time = 0.5ns after the clock edge at FF2), its reported Hold violation.

With above explanation, there is 2 important things;
1. Setup is checked at next clock edge.
2. Hold is checked at same clock edge.

Setup check timing can be more clear for the above flip-flop combination with the help of following explanation.

In the above fig you can see that data launched at FF1/D ( at launch edge) reaches at FF2/D after a specific delay ( CLK-to-Q delay + Combinational Logic Delay) well before the setup time requirement of Flip-Flop FF2, so there is no setup violation.
From the fig its clear that if Slack = Required Time - Arrival Time < 0 (-ive) , then there is a setup violation at FF2.

Hold Check Timing can be more clear with the help of following circuit and explanation.

In the above fig you can see that there is a delay in the CLK and CLKB because of delay introduced by the serious of buffer in the clock path. Now flip-flop FF2 has a hold requirement and as per that data should be constant after the capture edge of CLKB at Flip-Flop FF2.
You can see that desired data which suppose to capture by CLKB at FF2.D should be at Zero (0) logic state and be constant long enough after the CLKB capture edge to meet hold requirement but because of very short of logic delay between FF1/Q and FF1/D, the change in the FF1/Q propagates very soon. As a result of that there occurs a Hold violations.
This type of violation (hold violation)can be fixed by shorting the delay in the clock line or by increasing the delay in the data path.

Setup and Hold violation calculation for the single clock cycle path is very easy to understand. But the complexity increases in case of multi-cycle path, Gated clock, flip-flop using different clocks, latches in place of Flip-Flop.