Saturday, September 17, 2016

Clock Tree Synthesis CTS


What is CTS?
  • Process of balancing clock skew and Minimizing Insertion Delay in order to meet timing , power requirements and other requirements.
  • Process of distributing clock signal to clock pins based on physical information.
  •  Clock Buffer Tree is build to achieve the CTS goals.
CTS Goals:
  • Meet CTS design rule constraints such as Maximum Transition Delay, Maximum Load Capacitance, Maximum Fanout,  Maximum Buffer Levels.
  • Meet the clock tree targets such as Maximum skew, Min/Max Insertion Delay.
Checklist before Clock Tree Synthesis:
  • The design is placed and optimized
  • Power and Ground nets are pre-routed
  • Acceptable Congestion
  • Meet Timing Constraints (~0ns Slack)
  • No DRC Violations Such as Max Tran, Max Cap
  • High Fanout Nets (Reset , Enable Pins are synthesized with buffers)
How to check whether Design is ready for CTS or not?

check_physical_design -for_cts:
1. checks design placed or not
2. checks all clocks are defined or not
3.checks for clock roots are not hierarchical pins

check_clock_tree:
checks and warns if:

1. A clock source pin is hierarchical pin  
2. There are multiple clocks per Register
3. A clock tree has no synchronous pin
4. A generated clock with improperly specified Master clock.





MCMM: Multi-Corner Multi-Mod

What's MCMM
MCMM stands for: Multi-Corner Multi-Mode (static timing analysis )


What's a Mode


A mode is defined by a set of clocks, supply voltages, timing constraints, and libraries. It can also have annotation data, such as SDF or parasitics files.


Many chip have multiple modes such as functional modes, test mode, sleep mode, and etc.

What's a Corner

A corner is defined as a set of libraries characterized for process, voltage, and temperature variations.


Corners are not dependent on functional settings; they are meant to capture variations in the manufacturing process, along with expected variations in the voltage and temperature of the environment in which the chip will operate.

Example:

Multi-mode multi-corner (MMMC) analysis refers to performing STA across multiple operating modes, PVT corners and parasitic interconnect corners at the same time. For example, consider a DUA that has four operating modes (Normal, Sleep, Scan shift, Jtag), and is being analyzed at three PVT corners (WCS, BCF, WCL) and three parasitic interconnect corners (Typical, Min C, Min RC)





There are a total of thirty six possible scenarios at which all timing checks, such as setup, hold, slew, and clock gating checks can be performed. Running STA for all thirty six scenarios at the same time can be prohibitive in terms of runtime depending upon the size of the design. It is possible that a scenario may not be necessary as it may be included within another scenario, or a scenario may not be required. For example, the designer may determine
that scenarios 4, 6, 7 and 9 are not relevant and thus are not required. Also, it may not be necessary to run all modes in one corner, such as Scan shift or Jtag modes may not be needed in scenario 5. STA could be run on a single scenario or on multiple scenarios concurrently if multi-mode multicorner capability is available.

Friday, September 16, 2016

Congestion

Congestion needs to be analyzed after placement and the routing results depend on how congested your design is. Routing congestion may be localized. Some of the things that you can do to make sure routing is hassle free are:

Placement blockages: 

The utilization constraint is not a hard rule, and if you want to specifically avoid placement in certain areas, use placement blockages.
Soft blockages (buffer only)
Hard blockages (No std cells and buffers are allowed to Place)
Partial blockages (same as density screens)
Halo (same as soft blockage but blockage can also be moved w.r.t Macro.)


Macro-padding:

 Macro padding or placement halos around the macros are placement blockages around the edge of the macros. This makes sure that no standard cells are placed near the pin outs of the macros, thereby giving extra breathing space for the macro pin connections to standard cells.

Cell padding:


Cell Padding refers to placement clearance applied to std cells in PnR tools. This is typically done to ease placement congestion or reserve some space for future use down the flow.
For example typically people apply cell padding to the buffers/inverters used to build clock tree, so that space is reserved to insert DECAP cells near them after CTS.

Cell padding adds hard constraints to placement. The constraints are honored by cell legalization, CTS, and timing optimization, unless the padding is reset after placement so
those operations can use the reserved space. You can use cell padding to reserve space for routing.

The command "specifyCellPad" is used to specify the cell padding in SOC-Encounter.

This command adds padding on the right side of library cells during placement.

The padding is specified in terms of a factor that is applied to the metal2 pitch. For example, if you specify a factor of 2, the software ensures that there is additional clearance of two times the metal2 pitch on the right side of the specified cells.


Maximum Utilization constraint (density screens): 

Some tools let you specify maximum core utilization numbers for specific regions. If any region has routing congestion, utilization there can be reduced, thus freeing up more area for routing.
each tool is having this setting, check wityh your DA for the detail.

set_congestion_options -max_util .6 -coordinate {10 20 40 40}


Sunday, August 28, 2016

Antenna effect

The antenna effect [plasma induced gate oxide damage] is an effect that can potentially cause yield and reliability problems during the manufacture of MOS integrated circuits. The IC fabs normally supply antenna rules that must be obeyed to avoid this problem and violation of such rules is called an antenna violation. The real problem here is the collection of charge.
A net in an IC will have at least one driver (which must contain a source or drain diffusion or in newer technology implantation is used), and at least one receiver (which will consist of a gate electrode over a thin gate dielectric). Since the gate dielectric is very thin, the layer  will breakdown if the net somehow acquires a voltage somewhat higher than the normal operating voltage of the chip. Once the chip is fabricated, this cannot happen, since every net has at least some source/drain implant connected to it. The source/drain implant forms a diode, which breaks down at a lower voltage than the oxide (either forward diode conduction, or reverse breakdown), and does so non-destructively. This protects the gate oxide. But during the construction phase, if the voltage is build up to the breakdown level when not protected by this diode, the gate oxide will breakdown.
Antenna rules are normally expressed as an allowable ratio of metal area to gate area. There is one such ratio for each interconnect layer. Each oxide will have different rule.
Antenna violations must be fixed by the router. Connecting gate oxide to the highest metal layer, adding vias to near the gate oxide to connect to highest layers used and adding diode to the net near the gate are some fixes that can be applied. Adding diode rises the capacitance and makes circuit slower and consumes more power.




Aspect Ratio of Core/Block/Design

The Aspect Ratio of Core/Block/Design is given as:



 


The aspect ratios of different core shapes are given in below :






The Role of Aspect Ratio on the Design:


  1. The aspect ratio effects the routing resources available in the design
  2. The aspect ratio effects the congestion
  3. The floorplanning need to be done depend on the aspect ratio
  4. The placement of the standard cells also effect due to aspect ratio
  5. The timing and there by the frequency of the chip also effects due to aspect ratio
  6. The clock tree build on the chip also effect due to aspect ratio
  7. The placement of the IO pads on the IO area also effects due to aspect ratio
  8. The packaging also effects due to the aspect ratio
  9. The placement of the chip on the board also effects
  10. Ultimately every thing depends on the aspect ration of core/block/design

NON Default Rule: NDR Rules

NONDEFAULT rule. This is a routing rule that is, well, not the default! It usually consists of double-wide or triple-wide metal, and at least double-wide spacing, but it can be whatever you like as long as it follows DRC rules (no violating the min or max metal widths, for example). NONDEFAULT rules are typically used to route clock nets or other sensitive nets. If you are very lucky, your tech LEF came with some NONDEFAULT rules already defined. But this is not usually the case. Those of us who have been around a while always dreaded the creation of NONDEFAULT rules -- it's not difficult, but it is tedious to write out a large tech LEF section by hand.

Well, for some time now, EDI has had the ability to create NONDEFAULT rules for us! It's easy and fast. Here's how to do it:


Name your NONDEFAULT rule something descriptive, and then choose an existing rule to start from. In most cases, all you'll have so far is the Default rule. That's a great starting point. With the default rule width and spacing numbers right in front of you, it's easy to enter values that are twice or three times as large for a double- or triple-wide rule.

Now, the vias: the vias from the default rule (or whatever rule you chose as your starting point) will be listed. In most cases, it's fine to just use the default rule vias.

Finally, decide if you want the NONDEFAULT rule to follow Hard Spacing. This means that violations of the NONDEFAULT rule spacing are considered and flagged as true violations. Without Hard Spacing turned on, the NONDEFAULT spacing is followed as much as possible, but if it needs to be broken to complete the route or follow other routing/spacing rules, then it's not considered a violation.

When you click OK or Apply, the rule is created and exists in your design database. But here is the crucial part: we want to add this NONDEFAULT rule to our tech LEF. Let's say we named our NONDEFAULT rule "DblWide" and we'll output it to a temporary file called tmp.lef. At the EDI prompt, type:





Non-Default rules are mostly used for routing only as they determine the width of the wires. Particularly for clock routing when there is the issue of Clock tree structuring and then the clock tuning, you might want to increase or decrease the width of the wires due to the insertion delay/skew requirements. 

So in case you have wider wire, which means the sheet resistance is lower which means there is faster current.

Sometimes you might also want to make the clock tree as variable sliding widths like a "in a leaf"..And then the non default rule must be used.

Non default rule as far as We  know is very less used for signal routing. 

Wednesday, August 24, 2016

difference between crosstalk noise and crosstalk delay

\Noise: 
The term “noise” in electronic design generally means any undesirable deviation in voltage of a net that ought to have a constant voltage, such as a power supply or ground line. In CMOS circuits, this includes data signals being held constant at logic 1 or logic 0.

For noise analysis tool considers the cross-coupling between aggressor nets and victim nets.
it determines the worst-case noise bump or glitch on steady-state victim net.
Steady-state means that the net is constant at logic 1 or logic 0.
The main commands for noise analysis are the check_noise, update_noise, and report_noise commands, which operate in a manner similar to the check_timing, update_timing, and report_timing
Prime time gives the noise reports as
1.Above high 
2.Above Low
3.Below Low
4. Below high
There are many different causes of noise such as charge storage effects at p-n junctions, power supply noise, and substrate noise. However, the dominant noise effect in deep-submicron CMOS circuits is crosstalk noise

Crosstalk delay: Crosstalk delay is same as noise but in this case both the nets are not in a steady state. 
there is some transition happening on both the nets.
crosstalk delay depends on the propagating direction of the aggressor and victim nets which makes the transition slower or faster.

Note: for setup analysis tool add crosstalk delay to the timing path and for hold it subtract the delta delay from the cell delay.


FinFET Technology

A multigate device or multiple gate field-effect transistor (MuGFET) refers to a MOSFET (metal-oxide-semiconductor field effect transistor) which incorporates more than one gate into a single device. The multiple gates may be controlled by a single gate electrode, wherein the multiple gate surfaces act electrically as a single gate , or by independent gate electrodes. A multigate device employing independent gate electrodes is sometimes called a Multiple Independent Gate Field Effect Transistor (MIGFET).

General MOSFET at submicron level is suffering from several submicron issues like short channel effects, threshold voltage variation etc. FinFET is supposed to overcome the short channel effects. Structure of of FinFET is shown in below,

Silicon on insulator (SOI) process is used to fabricate FinFET.  This process insures the ultra thin specifications of device regions. in FinFET electrical potential throughout the channel is controlled by the gate voltage. This is possible due to the proximity of gate control electrode to the current conduction path between source and drain. These characteristics of the FinFET minimize the short channel effect. Advantages of the FinFET over its bulk-si counterpart are as follows:
Conventional MOSFET manufacturing processes can also be used to fabricate FinFET.
FinFET provides better area efficiency compared to MOSFET.
mobiblity of the carriers can be improved by using FinFET process in conjunction with the strained silicon process.
FinFET device structure Silicon on Insulator (SOI) process is used to manufacture FinFET.  A single poly silicon layer is deposited over a fin. Thus poly silicon straddles the fin structure to form perfectly aligned gates. Here a fin itself acts as a channel and it terminates on both sides of source and drain.  In general MOSFET device, over the si substrate poly silicon gate is formed. Poly silicon gate controls the channel.  Straddling of poly silicon gate over the Si fin gives efficient gate controlled characteristics compared to MOSFET.  Since gate straddles the fin the length of the channel is same as that of width of the fin. As there are two gates effectively around the fin we can write, width of the channel is equivalent to twice the height of the fin i.e. w=2*h. A term called *fin pitch* is used to define the space between two fins. Height of the FinFET is equivalent to width of the MOSET.  If w is the fin pitch then to attain same area efficiency required fin height is w/2. But practical experiments have shown that fin height can be greater than w/2 for a fin pitch of w. thus FinFET achieves more area efficiency than MOSFET.

The basic electrical layout and the mode of operation of a FinFET does not differ from traditional field effect transistor. There is one source and one drain contact as well as a gate to control the current flow.

In contrast to planner MOSFETs the channel between source and drain is build as a three dimensional bar on a top of a silicon substrate , called fin.  The gate electrode is then wrapped around the channel, so that there can be formed several gate electrodes on each side which leads to reduced leakage effects and an enhanced drive current.


Tuesday, August 23, 2016

low power techniques

We can use the following techniques for a low power design.
1. power gating
2. multiple supply voltages (multi-VDD)
3. voltage scaling.
4.Multi-threshold CMOS (Multi-VT)
5.Adaptive Body-Biasin
6. clock gating

Power Gating: UPF (Unified Power Format)
Power gating is a technique used in integrated circuit design to reduce power consumption by shutting off to blocks of the circuit that are not in use. In addition to reducing stand-by or leakage power , power gating has the benefit of enabling Iddq testing.
The basic purpose of power gating is to temporarily shutting down blocks in a design when the blocks are not in use. This will reduce the leakage power of the chip. Power gating means switching off  an area of a design when its functionality is not require, and then restoring power when it is required. This temporary shutdown time can also called as "low power mode" or "inactive mode", again when we need that particular part of the design in operation then we can turn on the power and that state is called as "active mode".

Switching ON and OFF can be down either by software or hardware control. Th power supply of the entire design is cut off when the circuit is not in use. Such designs do not require data to be retained in the registers or latches used in the design. Functional verification of design is still required to make sure that the position of the designs that are awake function properly and also ensure that the system would work when power is restored in the sleeping part of the device.

When the power is shut off, each power domain must be isolated from rest of the design, so that it does not corrupt the downstream logic.  Power shutdown results in slow output from the power gated blocks. These output spends significant time at threshold voltage, causing large crowbar currents in the always on block, for this purpose we need isolation cells.

Isolation cells are used to prevent these crowbar currents. The isolation cells are placed between the output of the power gated blocks and inputs of the always on blocks.
Lets see the following 2 power domains D1 and D2. D1 is the power shut down domain and D2 is always-on. Now lets say there are a few signals from D1 to D2, suppose at any time if the D1 goes to in-active mode (Switched OFF) and if the signal traversing from D1 to D2 gets some noise or some unwanted signal from some source it can trigger the logic in the D2 domain, which will do unwanted functionality of the circuit, to prevent this Isolation cells are used in between the two domains.

Now if we have isolated the shut down domain from the other domain but we need to retain the last values stored in the registers in the shut down domain for this we use the retention Registers.
Retention Registers:
Retention cells are used in the low power domain to retain the values when the domain power goes into OFF state. these retention registers are special low leakage flip-flops used to hold the data of main register of the power gated block. Thus internal state of the block during power down mode can be retained and loaded back to it when the block is reactivated, retention registers are always powered up. The retention strategy is design dependent. During the power gating data can be retained and transferred back to block when power gating is withdrawn. Power gating controller controls the retention mechanism such as when to save the current contents of the power gating block and when to restore it back.

Level Shifters:
Level shifters are used in such a design where multi voltage supply have been used, now Consider the above two voltage domain D2 and D1 , if there are few signals from D2 (1.0 V) are travelling to D1 0.85V domain, their supply voltage is different then we need to insert level shifter in that domain.

The main function of level shifter is to shift the voltage of the particular domain as per the signal (from which domain it is coming).


Power Switches:
Power switches are used to switch off the power shut domain.



PVT

PVT is acronym for Process-Voltage-Temperature.


PVTs model variations in Process, Voltage and Temperature. There's other term OCV which refers to On-Chip Variation. PVTs model inter-chip variations while OCVs model intra-chip variations. 

Let's talk about PVTs in detail:

1) Process:


You must have heard people talking in terms of process values like 90nm, 65nm, 45nm and other technology nodes. These values are characteristic of any technology and represents the length between the Source and Drain of a MOS transistor . While manufacturing any die, it has been seen that the dies that are present at the center are pretty accurate in their process values. But the ones lying on the periphery tend to deviate from this process value. The deviation is not big, but can have significant impact on timing.


The following formula for current flowing in a MOS transistor:



 L represents the process value. For same temperature and voltage values, current for 45nm process would be more than current for 65nm process.
More is the current, faster is the charging/discharging of capacitors. And this means, delays are less.


2) Voltage:

The voltage that any semiconductor chip works upon is given from outside. Recall while working on breadboards in your labs, you used to connect a 5V supply to the Vcc pin of your IC. Modern chips work on very less voltage than that. Typically around 1V-1.2V.


This voltage must be the output of either a DC source or maybe the output of some voltage regulator. The output voltage of voltage regulator might not be a constant over a period of time. Let's say, you expected your voltage regulator to give 1.2V, but after 4 years, it's voltage dropped down to 1.08V or increased up to 1.32V. So, you gotta make sure your chip is working well between 1.08 and 1.32V!!


This is where the need to model Voltage variations come into picture.
From the same equation as above, it can be seen that more is the voltage, more is the current. And hence, delays are less.

3) Temperature:

The ambient temperature also impacts the timing. Let's say you are working on a gadget in Siachen glacier where temperature can drop down to -40 degrees centigrade in winters and you expect your device to be working fine. Or maybe you are in Sahara desert, where ambient temperature is +50 degrees and your car engine temperature is +150 degrees and again you expect your chip to working fine. While designing, therefore, STA engineers need to make sure that their chip will function correctly in the temperatures between -40 to +150 degrees.



Higher is the temperature, more is the collision rate of electrons within the device. This increased collision rate forbids other electrons in the periphery to move. Since electron movement is responsible for current flowing in the device, current would decrease with increase in temperature. Therefore, delays are normally more at higher temperatures.


For technology nodes below 65nm, there's a phenomenon called TEMPERATURE INVERSION, where delays tend to increase with decreasing temperature. We shall talk about the same later. Don't get confused with it here.


WORST PVT:

Process worst-Voltage min- Temperature-max


BEST PVT:

Process best-Voltage max- Temperature-min

WORST COLD PVT:

Process worst-Voltage min-Temperature min

BEST HOT:

Process best-Voltage max-Temperature max

MCMM: Multi-Corner Multi-Mode

What is MCMM?

MCMM stands for: Multi-Corner Multi-Mode (static timing analysis used in the design of digital ICs)


What's a Mode


A mode is defined by a set of clocks, supply voltages, timing constraints, and libraries. It can also have annotation data, such as SDF or parasitics files.


Many chip have multiple modes such as functional modes, test mode, sleep mode, and etc. 


What's a Corner

A corner is defined as a set of libraries characterized for process, voltage, and temperature variations.

Corners are not dependent on functional settings; they are meant to capture variations in the manufacturing process, along with expected variations in the voltage and temperature of the environment in which the chip will operate.



Example:

Multi-mode multi-corner (MMMC) analysis refers to performing STA across multiple operating modes, PVT corners and parasitic interconnect corners at the same time. For example, consider a DUA that has four operating modes (Normal, Sleep, Scan shift, Jtag), and is being analyzed at three PVT corners (WCS, BCF, WCL) and three parasitic interconnect corners (Typical, Min C, Min RC)
There are a total of thirty six possible scenarios at which all timing checks, such as setup, hold, slew, and clock gating checks can be performed. Running STA for all thirty six scenarios at the same time can be prohibitive in terms of runtime depending upon the size of the design. It is possible that a scenario may not be necessary as it may be included within another scenario, or a scenario may not be required. For example, the designer may determine
that scenarios 4, 6, 7 and 9 are not relevant and thus are not required. Also, it may not be necessary to run all modes in one corner, such as Scan shift or Jtag modes may not be needed in scenario 5. STA could be run on a single scenario or on multiple scenarios concurrently if multi-mode multicorner capability is available.

Saturday, August 20, 2016

setup time and hold time


  • setup time is the minimum amount of time input (D) must be stable before the clock edge.
  • hold time is the minimum amount of time input (D) must be stable after the clock edge.




Both setup and hold time for a flip-flop is specified in the library.

1.1 setup time

  • data should be stable before the clock edge
  • setup time is the amount of time the synchronous input (D) must show up, and be stable before the capturing edge of clock.
  • this is so that the data can be stored successfully in the storage device.
  • setup violation can be fixed by either slowing down the clock (increase the period ) or by decreasing the delay of the data path logic.


setup information .lib :
timing () {

                related_pin        : "CK";

                timing_type        : setup_rising;

                fall_constraint(Setup_3_3) {

                     index_1 ("0.000932129,0.0331496,0.146240");

                     index_2 ("0.000932129,0.0331496,0.146240");

                     values ("0.035190,0.035919,0.049386", \

                             "0.047993,0.048403,0.061538", \

                             "0.082503,0.082207,0.094815");

                }


1.2 Hold Time

  • data should be stable after the clock edge
  • hold time is the amount of time the synchronous input (D) stays long enough after the capturing edge of clock so that the data can be stored successfully in the storage device.
  • hold violation can be fixed by increasing the delay of the data path or by decreasing the clock uncertainty (skew) if specified in the design.

Hold Information .lib:
timing () {

              related_pin      : "CK";

              timing_type      : hold_rising;

              fall_constraint(Hold_3_3) {

                   index_1 ("0.000932129,0.0331496,0.146240");

                   index_2 ("0.000932129,0.0331496,0.146240");

                   values ("-0.013960,-0.014316,-0.023648", \

                           "-0.016951,-0.015219,-0.034272", \

                           "0.108006,0.110026,0.090834");

              }

Friday, August 19, 2016

Maximum Clock Frequency

As we know that now a day all the chips has combinational   + sequential circuit. So before we move forward, we should know the definition of "Propagation delay" in both type of circuits.

Propagation Delay in the combinational circuits:
Let's consider a "NOT" gate and input/output waveform as shown in the figure,

From the above figure,
- Rise Time(tr): the time required for a single to transition from 10% of its maximum value to 90% of its maximum value.
- Fall Time(tf): the time required for a single to transition from 90% of its maximum value to 10% of
 its maximum value.
- Propagation Delay (tpLH, tpHL) The delay measured from the time the input is at 50% of its full swing value to the time the output reaches its 50% value.

We want to rephrase above mention definition as:
- This value indicates the amount of time needed to reflect a permanent change at an output, if there is any change in logic of input.
- Combinational logic is guaranteed not to show any further output changes in response to input changes after tpLH or tpHL time units have passed.
So, when an input X change, the output Y is not going to change instantaneous. Inverter output is going to maintain its initial value for some time and then its going to change from it's initial value.
After the propagation delay (tpLH or tpHL -- depends on what type of change -- low to high or high to low) , the inverter output is stable and is guaranteed not to change again until another input change (here we are not considering any SI/noise effect).

Propagation Delay in the sequential circuits:
In the sequential circuits, timing characteristics are with respect to the clock input. You can correlate it in this way that in the combinational circuit every timing characteristic/parameter are with respect to the data input change but in the sequential circuits the change in the "data input" is important but change in the clock value has higher precedence. E.g. in a positive-edge-triggered flip-flop, the output value will change only after a presence of positive-edge of clock whether the input data has changed long time ago.
so flip-flops only change value in response to a change in the clock value, timing parameters can be specified in relation to the rising (for positive edge-triggered ) or falling (for negative edge-triggered) clock edge.

Let's consider the positive edge flip-flop as shown in fig,
Propagation delay , tpHL and tpLH, has the same meaning as in combinational circuit - beware propagation delays usually will not be equal for all input to output pairs.

setup time (tsu) - this value indicates the amount of time before the clock edge that date input D must be stabel.
Hold Time(th) - this value indicates the amount of time after the clock edge that data input D must be held stable.
The circuit must be designed so that the D flip-flop input signal arrives at least "tsu" time units before the clock edge and does not change until at least "th" time units after the clock edge. If either of these restrictions are violated for any of the flip-flops in the circuit, the circuit will not operate correctly. These restrictions limit the maximum clock frequency at which the circuit can operate.

The maximum clock frequency for a circuit:
now let's understand the flow of data across these flip-flops.
- Let's assume data is already present at D of flip-flop A and it's in the stable form.
- Now clock pin of FF (flip-flop)A , i.e Clk has been triggered with a positive clock edge (low to high) at time "0ns".
- As per the propagation delay of the sequential circuit (tclk->Q), it will take at least 10ns for a valid output data at the pin X.
       -- Remember -- If you will capture the output before 10ns, then no one can give you the guarantee for the accurate/valid value at the pin X.
- This data is going to transfer through the inverter F. Since the propagation delay of "F" is 5ns, it means, you can notice the valid output at the pin Y only after 10ns+5ns = 15ns ( with reference to the positive clock edge - 10ns of FF A and 5 ns of  inverter).
       -- Practically this is the place where a more complex combinational circuit are present between 2 FFs. So in a more compext design, if a single path is present between X and Y,  then the total time taken by the data to travel from X to Y is equal to the sum of the propagation delay of all the combinational circuits/devices. 
- Now once valid data reaches at the pin Y , then this data supposed to capture by FF B at the next clock positive edge ( in a single cycle circuit).
      -- we generally try to design all the circuit in such a way that it operates in a single clock cycle. 
- For properly capturing the data at FF B, data should be present and stable 2 ns (setup time) before the next clock edge as part of setup definition.

So it means between 2 consecutive positive clock edge, there should be minimum time difference of 10ns + 5ns + 2ns = 17ns . And we can say that for this circuit the minimum clock period should be 17ns. ( if we want to operate the circuit in single clock cycle and accurately).
Now we can generalize  this
minimum clock period = tclk-Q (A) + tpd (F) + ts(B)
And "Maximum Clock Frequency = 1/(Min clock period)"








Thursday, August 11, 2016

What is the diffrence between DRC and DFM ?

DRC - Design Rule Check

This is specified by the technology and foundrys all over the world give out these
rules for a particular technology. These rules have to be satisfied in any design (physically)

DFM - Design For Manufacturing

Its a thing which has been there since the 90nm node. As yield and reliability became important factors, foundrys brought out the DFM, which is making designs reliable and yield high.
These are set of rules , we would say guidelines that cover over the DRC rules. These rules are optional but as we go down the deep submicron technologies, its highly important to follow these DFM rules in order to make sure the product is reliable and yield is maintained high.
Making a design with DRC rules has a high probability that the yield of the design is quite less...
Foundrys specifically recommend all the design companies to follow DFM rules wherever possible... 

It is very important of following points:
Yield
Performance 
and
life time of chip.

DFM is very big issue in VLSI.


Tuesday, August 2, 2016

Interconnect Delay --> Net delay + Cell Delay

In a digital design, a wire connecting pins of standard cells and blocks is referred to as a NET. A Net

- has only one driver
- has number of fanout cells or blocks
- can travel on multiple of metal layers of the chip.

"Net Delay" refers to the total time needed to charge to discharge all of the parasitic ( Capacitance / Resistance / Inductance)  of a given net. So we can say that net delay is a function of
- Net Resistance
- Net Capacitance
- Net Topology

Now to calculate the NET delay, the wires are modeled in different ways and there are different way to do the calculation. Practically, when you are applying a particular delay model in a design, then you have to apply that to all cells in a particular library.  You cannot mix delay models within a single library. There are a few recommendations provided by experts or experienced designer regarding the application of a Particular Delay model in a design and that depends on,
- technology of the design
- At what stage you are, and or say at what stage you want to apply a delay model
- how accurately you want to calculate the model

Note:
Ideally Till the physical wire is not  present in your design, you cannot calculate the net delay. Reason is .. if wire is not present,  you have no idea about width/length of the wires. So you can not calculate the accurate values of parasitic or say delay value of the wire. But here main point is accurate value , means there is possibility of inaccurate or say approximate value of delay value before physical laying of wire in a design.

There are several delay models. Those which can provide more accurate result, takes more run time to do the calculation, and those which are fast provides less accurate value of delay. Here are  a few of them, most popular delay models,

- Lumped Capacitor Model
- Lumped RC Model
- Distributed RC Model
     -pi RC Network
     -T RC Network
- RLC Model
- Wire Load Model
- Elmore Delay Model
- Transmission Line Model






STA: Delay -- Timing Path Delay

Questions:
I have a doubt regarding how delay is calculated alone a path . I think there are two ways
1)to calculate max delay and min delay, we keep adding max delays and min delays of all cells (buffer/inverter/max)from start point to end point respectively.
2)In other way, we calculate path delay for rising edge and falling edge separately. We apply a  rise edge at start point and keep adding cell delay. cell delay depends upon input transition and output fanout.  so now we have two path delay values for rise edge and falling edge. greater one is considered as Max delay and smaller one is min delay.
Which one is correct?

Short Ans is ... both are correct and you have to use both.  So here are a few details.

As we have mention that for setup and hold calculation, you have to calculate the delay of the timing path ( capture path or launch path). Now in a circuit there are 2 major type of Delay.
     1. CELL DELAY
         -- Timing delay between an input pin and an output pin of a cell.
         -- cell delay information is contained in the library of the cell. e.g- lef file
     2.  NET DELAY
          -- interconnect delay between a driver pin and a load pin.
          -- To calculate the NET delay generally you require 3 most important information.
                     - Characteristics of the Driver Cell ( which is driving the particular net)
                     - Load characteristic of the receiver cell . ( which is driven by the net)
                     - RC (resistance capacitance ) value of the net. ( it depends on several factor - which we                        will discuss later)

Both the delay can be calculated by multiple ways. It depends at what stage you require this information with in the design, e.g During pre layout or post layout or during Signoff timing. As per the stage you are using this, you can use different ways to calculate these delay. Sometime you require accurate numbers and sometime approximate numbers are also sufficient.

Now let's discuss this with previous background and then we will discuss few new concepts:


Now in the above fig, the delay of the circuit will be after calculating,
Delay = 0.5+0.04+0.62+0.21+0.83+0.15+1.01+0.12+0.57=4.05ns ( if all the delay in ns)

Now let's add few more values in this. As we know that every gate and net has max and min value, so in that case we can find out the max delay and min delay. ( on what basis these max delay or min delay we are calculating ... we will discuss after that).

so in the above example, first value is max value and 2nd value is min value. So
Delay(max) =0.5+0.04+0.62+0.21+0.83+0.15+1.01+0.12+0.57=4.05ns
Delay(min)=0.4+0.03+0.6+0.18+0.8+0.1+0.8+0.1+0.5=3.51ns

Now let's see what is the meaning of min and max delay.

The delay of a cell or net depends on various parameters. Few of them are listed below:

  • library setup time
  • library delay model
  • external delay
  • cell load characteristic
  • cell drive characteristic
  • operating condition (PVT)
  • wire load model
  • effective cell output load
  • input skew
  • back annotated delay
If any of these parameter vary, the delay very accordingly. Few of them are manually exclusive. and in what case we have to consider the effect of only one parameter at a time. If that's the case, then for STA, we calculated the delay in both the condition and then categorize them in worst (max delay) condition or the best condition (min delay). E.g - if a cell has different delay for rise edge and fall edge. Then we are sure that in delay calculation we have to use only one value. So as per their value, we can categorize fall and rise of all the cell in the max and min bucket. Ans finally we come up with max delay and min delay.

The way the delay is calculated also depends which tool are you using for STA or delay calculation. Cadence may have different algorithm from Synopsys and same is the case of other vendor tools like Mentor , and all. But in general the basic or say concepts always remain same.
Here is an example about the circuit, and you want to calculate the delay.

In the above diagram, we have 2 paths between UFF1 and UFF3. So whenever we are doing setup and hold analysis, these path will be the part of launch path (arrival time), so lets assume we want to calculate the max and min value of delay between UFF1 and UFF3.

Informations1:

Monday, August 1, 2016

STA: Examples of Setup and Hold Violations

Now it's time to discuss the practical implementation of setup and hold time, means in a circuit.
- How will you calculate the setup and hold values?
- How will you analyze setup and hold violation in a circuit?
- If you have to improve timing of a circuit then what can you do?

There are few formulas to calculate different parameter.  First we will solve few examples which will give you an basic idea about these formulas, then in the last we will summarize all those in one place.

We saw a lot of confusion with respect to setup and hold timing calculation. Actually there are two things.
- Timing specification of a Block/Circuit/Library:
   - You have a block with input A and output Y.  Some combinational logic is there between A and Y.
      Now you have to calculate following parameters for that block
        - Setup Time Value at Input A
        - Hold time value at Input A
        - Maximum operating clock frequency or Time Period for that block.
        - Clock to Y delay value
        - Input A to output Y delay value
- Timing Violation of a Circuit
   - You have to operate a circuit at a particular clock frequency and now you have to find out                    whether this circuit has any setup or hold violation.

So in second case all the parameters are given and you have to find out whether this circuit has any violation or not and in first case you have to find out all the parameters keeping in mind that there should not be any violation.
Let's discuss in reverse order.
=====================================
=====================================
Problem1: In the following circuit, Find out whether there is any Setup or Hold violation?

Solution:
Hold Analysis:
When a hold check is performed, we have to consider two things-
- Minimum delay along the data path
- Maximum delay along the clock path
If the difference between the data path and the clock path is negative, then a timing violation has occurred. (Note: There are few exceptions for this- we will discuss this some other time)

Data Path is : CLK --> FF1/CLK --> FF1/Q --> inverter --> FF2/D

Delay in data Path
=min(wire delay to the clock input of FF1) + min(CLK-to-Q delay of FF1) + min(cell delay of inverter) + min( 2 wire delay - "Q of FF1-to-inverter" and "inveter-to-D of FF2")
=Td =1+9+6+(1+1)= 18ns

Clock Path is:  CLK --> Buffer -->FF2/CLK

Clock Path Delay
=max(wire delay from CLK to Buffer Input) + max(cell delay of buffer) + max(wire delay from buffer output to FF2/CLK pin) + (hold time of FF2)
=Tclk = 3 +9+3+2 = 17ns

Hold Slack = Td - Tclk = 18ns - 17ns = 1 ns
Since Hold Slack is Positive --> No Hold Violation.

Note: If the hold time had been 4ns instead of 2ns, then there would have been a hold violation.
Td = 18ns and Tclk = 3+9+3+4 = 19ns
so hold Slack = Td -Tclk = 18ns -19ns = -1ns

Setup Analysis:
When a setup check is performed, we have to consider two things-
- Maximum delay along the data path
- Minimum delay along the clock path
If the difference between the clock path and data path is negative, then a timing violation has occurred. (Note: There are few exceptions for this- we will discuss this some other time)

Data Path is: CLK-->FF1/CLK-->FF1/Q-->inverter-->FF2/D

Delay in data Path:
=max(wire delay to the clock input of FF1) + max(Clk-to-Q delay of FF1) + max(cell delay of inverter) + max(2 wire delay - "Q of FF1-to-inverter" and "inverter-to-D of FF2")
=Td = 2+11+9+(2+2)=26ns

Note: The first part of the clock path delay (during setup calculation) is the clock period, which has been set to 15ns. Hope you remember in last blog.  We have mentioned very clearly that Setup is checked at the next clock cycle. That is the reason for clock path delay we have to include the clock period also.

Clk Path is:  CLK-->Buffer -->FF2/CLK

Clock Path Delay
= (clock period) + min(wire delay from CLK to Buffer Input) + min(cell delay of Buffer) + min(wire delay from buffer output to FF2/CLK pin) - (Setup of FF2)
=Tclk = 15+2+5+2-4=20ns

Setup Slack = Tclk - Td = 20ns - 26ns = -6ns
Since Setup Slack is negative , --> Setup Violation.

Note: A bigger clock period or a less maximum delay of the inverter solve this setup violations in the circuit.
E.g
If clock period is 22ns , then
Tclk = 22+2+5+2-4=31-4=27ns   and Td = 26ns
Setup Slack = Tclk - Td = 27 - 26 = 1 ns ( no violation)

============================================
Problem2: In order to work correctly, what should be the setup and hold time at Input A in the following circuit. Also find out the maximum operating frequency for this circuit. (Note: Ignore Wire Delay). Where Tsu- setup time; Thd- hold time ; Tc2q- Clock-2-Q delay.

Solution:
Step1: Find Out the maximum Register to register Delay.

Max Register to Register Delay
=(clk-to-Q delay of U2 ) + (Cell Delay of U3) + (all wire delay ) + ( setup time of U1)
= 5 + 8 + 3 = 16 ns

Note:
There are 2 register to register paths
   - U2 -->U3-->U1 (delay = 5+8+3= 16ns)
   - U1 -->U4-->U2 (delay = 5+7+3=15ns )
   - we have to pick maximum one.

Step2: Find Out Setup Time:

A Setup Time = Setup time of FlipFlop + Max (Data Path Delay) - Min (clock path delay)
                       =  (Setup Time of Flipflop + A2D Max Delay ) - (Clk Path Min Delay)
                       =  Tsu + (Tpd U7 +Tpd U3 + Wire Delay ) - Tpd U8
                       = 3 + (1+8) - 2 = 10ns

Note:
- Here we are not using the clock period. Because we are not suppose to calculate the Setup violation.   We are calculating Setup time.
- All the wire delay is neglected. If wire delay present, we have to consider those one.
- There are 2 data path
   -    A-->U7-->U4-->D of U2 (Data Path Delay = 1+7 = 8ns )
   -    A-->U7-->U3-->D of U1 (Data Path Delay = 1+8 =9ns )
   - Since for Setup calculation we need maximum data path delay, we have choose 2nd for calculation.

Step3: Find Out Hold Time:
A Hold Time = Hold Time of FlipFlop + max (clock path delay ) -min (data path delay)
                      = (hold time of flipflop + clock path max delay ) - (A2D max delay)
                      = Thd + Tpd U8 - (Tpd U7 + Tpd U4 + Wire Delay )
                      =  4 + 2 - ( 1 + 7)  = -2 ns

Note: Same explanation as for setup time. For hold time we need minimum data path, so we have picked the first data path.

Step4: Find out clock to out time.
clock to out
= cell delay of U8+ Clk-to-Q delay of Flipflop+ cell delay of U5+ cell delay of U6 + (all wire delay)
= Tpd U8 + U2 Tc2q + U5 Tpd + U6 Tpd
= 2 + 5 + 9 +6 = 22 ns

Note:
- There are 2 clock to out path - one from flipflop U1 and other from U2
- Since in this case the CLK-to-Q path for both flipflop is same, we can consider any path. But in some other circuit where the delay is different for both the paths, we should consider the max path delay.

Step5:  Find Pin to Pin combinational Delay (A to Y Delay)
Pin to Pin combinational Delay (A to Y)
= U7 Tpd + U5 Tpd + U6 Tpd
= 1 + 9 + 6 = 16ns

Step6 : Find Out Max Clock Frquemcy:

Max Clock Fren  = 1/Max (Reg2reg, Clk2out, Pin2pin)
                            = 1/Max ( 16, 22, 16)
                            = 45.5 Mhz
So summary is:
                                                                                                               
Parameter          Description       Min             Max                 Units       
Tclk                   clock period       22                                         ns          
Fclk                  clock Frquency                       45.5                  Mhz      
Atsu                  A setup time        10                                         ns         
Athd                  A Hold Time       -2                                          ns         
A2Y                  A to Y Tpd                                16                    ns         
Ck2Y               Clock to Y Tpd                          22                     ns         

Note: Negative Hold Time are typically specified as 0 ns

=========================================
Problem3: In the above circuit, Try to improve the timing by adding any "Buffer" or "Register".

Solution:
Best way of doing this is "Register of all input and Output." We are adding DFF so same specification (as U2 and U1).