Wednesday, September 30, 2015

Timing Exceptions

When certain paths are not intended to operate according to the default setup/hold behavior assumed by PT,  you shoulf specify those paths as timing exceptions. Otherwise, PT might incorrectly report those paths as having timing violations.

PT lets you specify the following types of timing exceptions:

** false path --A path that is never sensitzed due to the logic configuration, expected data sequence, or operating mode.

** Multicycle Path -- A path designed to take more than one clock cycle from launch to capture.

** Minimum/maximum delay path -- A path that must meet a delay constraint that you specify explicity as a time value.


setup and hold checks

Notes,
Assume that the flip-flops are defined in the technology library to have a minimum setup time 1.0 time units and a minimum hold time of 0.0 time units. The clock period is defined in PT to be 10 time units. ( The time unit size, such as ns or ps, is specified in the technology library).

By Default, PT assumes that signals are to be propagated through each data path in one clock cycle. Therefore, when PT performs a setup check, it verifies that the data path delay is small enough so that data launched from FF1 reaches FF2 within one clock cycle, and arrives at least 1.0 time unit before the data gets captured by the next clock edge at FF2. If the data path delay is too long, it is reported as a timing violation. For this setup check, PT considers the longest possible delay along the data path and the shortest possible delay along the clock path between FF1 and Ff2.

When PT performs a hold check, it verifies that the data launched from FF1 reaches FF2 no sooner than the capture clock edge for the previous clock cycle. This check ensures that the data already existing at the input of FF2 remains stable long enough after the clock edge that captures data for the previous cycle. For this hold check, PT considers the shortest possible delay along the data path and the longest possible delay along the clock path between FF1 and FF2.  A hold violation can occur if the clock path has a long delay.























Tuesday, September 29, 2015

Cell Delay and Net Delay

Cell Delay

Cell Delay is the amount of delay from input to output of a logic gate in a path. PT calculates the cell delay from delay tables provided in the technology library for the cell.


Net Delay
Net delay is the amount of delay from the output of a cell to the input of the next cell in a timing path.
This delay is caused by the parasitic capacitance of the interconnection between the two cells, combined with net resistance and the limited drive strength of the cell driving the net.


Delay Calculation

The total delay of a path is the sum of all cell and net delays in the path.

The method od delay calculation depends on if chip layout has been completed.
Before layout, the chip topography is unknown, so PT must estimate the net delays using wire load models.

After layout, an external tool can accurately determine the delays and write them to a StandardDelay Format(SDF)file. PT can read the SDF file and back-annotate the design with the delay information for layout-accurate timing analysis. PT can also accept a detailed description of parasitic capacitors and resistors in the interconnection network, and then accurately calculate net delays based on that information.

Path Types

Path Types:

--Clock path ( a path from a clock input port or cell pin, through one or moew buffers or inverters, to the  clock pin or a sequential element) for data setup and hold checks.

--Clock-gating path ( a path from an input port to a clock-gating element) for clock-gating setup and hold checks

--Asynchronous path ( a path from an input port to an asynchronous set or clear pin of a sequential element) for recovery and removal checks.

timing paths

timing paths


In this figure, Each path starts at a data point, passes through some combinational logic, and ends at a data capture point:

* path 1 starts at an input port and ends at the data input of a sequential element.
* path 2 starts at the clock pin of a sequential element and ends at the data input of a sequential element.
* path 3 starts at the clock pin of  a sequential element and ends at an output port.
* path 4 starts at an input port and ends at an output port.

Notes:
Each path has a startpoint and an endpoint. The startpoint is a place in the design where data is launched by a clock edge. The data is propagated through combinational logic in the path and then captured at the end point by another clock edge.
The startpoint of a path is a clock pin of a sequential element, or possibly an input port of the design ( because the input data can be launched from some external source.) The endpoint of a path is a data input pin of a sequential element, or possibly an output port of the design ( because the output data can be captured by some external sink).

STA


It is said as static because the timing information is obtain through calculation, not by simulation.

STA analysis the delay of all paths register to register, input to register, register to output and check if there is a violation.

If the delay of one path is too large, there will be setup violation.  The target register will sample the old value.

If the delay of one path is too small, there will be hold tie violation. The target register will sample the next value.

Both of two situation will cause error.


IR Drop

IR Drop as said above is voltage drop from the PAD circuitry to the standard cells.

>The implication is the reference voltage VDD is different at different places in the chip causing on chip variations. Also a negative impact on timing due to reduced VDD ==> ( VDD - I*R)

>To keep the IR Drop ( Voltage Drop ) within a particular range, we generally do power planning, by deliving

-- The number of core power pads
-- The core ring width
-- The core straps (Mesh) width & spacing & Number

##REF: Power Network design for an ASIC with Peripheral IO Power PADS ( Solvnet) for detailed calculations.

Note: This power planning is effectively nothing but Kirchoffs current law.


Monday, September 28, 2015

IR Drop

What is IR Drop?

V = I x R is the Ohm's law.
Basiclly everything that has current going through it also has an associated voltage drop.

IR Drop affects timing because the cells will not get sufficient voltage and thus cannot rise to the desired level within the rise time.

Essentially, your power straps are not with zero resistance.  So as your cells switch and draw current, there is drop in the VDD seen by the cells. This changes the cell characteristics like drive resistance and slow down the cell, causing timing violations.

parasitic extractions cworst

rcworst gives worst [max] delay? rcbest gives best [min] delay? so this give max and min timing ... why run cworst, cbest, what does this tell us?

There is no clear relation between RC Corners and Maximum delay.
Some nets are resistance-dominated and some are capacitance-dominated and some are mix.

Add to that the fact that increasing delay is bad for setup timing but good for hold timing, and vice-versa for decreasing delay.

So all you can say is that by picking different R and C combinations you will get a different delay picture for your circuit. Not necessarily better or worst, just different.

Foundries try to restrict the number of corners people need to verify, so they pick two corners that are typically best and worst. Just keep in mind it is a simplification of what really happens.

***rcworst or cworst ***
They are different kinds of spef ( standard parasitic extraction file) which are extracted by the tool like QRC for post routing analysis for setup and hold violations.
we back annotate these exacted files for timing closure.



clock generator and clock synthesizer

clock synthesizer:  Usually synthesize high frequency clocks from a low frequency reference clock ( Osciator or X'tal)

Clock Generator: usually contain clock synthesizer and divide the high frequency clock from synthesizer to low frequency, such that all the low frequency clocks may have "synchronous" properties.

For example:
reference clock: 10 mhz
output of synthesizer: 10 mhz
outputs of clock generator: 50mhz, 25mhz, 33mhz, 20mhz

Term clock generator is used to describe circuit or generator generally. It can be fixed or adjustable.

Clock Synthesizer is more a description of how clock signal is generated ( by synthesis ).

Synthesizer should more care about noise performance, but generator mostly do not care.





What is clock uncertainty?

In ideal mode the clock signal can arrive at all clock pins simultaneously. But in fact, that perfection is not acheievable. So, to anticipate the fact that the clock will arrive at different times at different clock pins, the  "ideal mode" clock assumes a clock uncertainty.
For example, a 1 ns clock with a 100 ps clock uncertainty means that the next clock tick will arrive in 1 ns plus or minus 50 ps.

A deeper question gets into "why" the clock does not always arrive exactly one clock period later.
There are several possible reasons but I will list 3 major ones:

a) The insertion delay to the launching flip-flop's clock pin is different than the insertion delay to the capturing flip-flop's clock pin ( one paths through th clock tree can be longer than another path).
The is called clock skew.

b)The clock period is not constant. Some clock cycles are loner or shorter than others in a random fashion. This is called clock jitter.

c) Even if the launching clock path and the capturing clock path are absolutely identical,
their path delays can still be different because of on-chip variation. This where the chip's delay properties vary the die due to process variations or temperature variations or other reasons.
This essentially increases the clock skew.


what is clock latency?

The first important point is that there are two phases in the design of a clock signal.

At first the clock is in "ideal mode"( e.g:: during RTL design, during synthesis and during placement).  An "ideal" clock has no physical distribution tree, it is just shows up magically on time at all the clock pins.

The second phase comes when clock source pin to the (thousands) of flip-flop that need to get it.
CTS is done after placement and before routing. After CTS is finished, the clock is said to be in "propagated mode."

Now, what is clock latency?  Clock latency is an ideal mode term. It refers to the delay that is specified to exist between the source of the clock signal and the flip-flop clock pin.
That is delay specified by the user -- not a real measured thing. ( In fact, there is 'clock source latency' and 'clock network latency' -- the difference is not important for this discussion ). When the clock is actually created, then the same delay is now referred to as the "insertion delay."  Indertion Delay (ID) is a real, measurable delay path through a tree of buffers. Sometimes the clock latency is interpreted as desired target value for the insertion delay.

what is clock latency?

what is clock latency and clock uncertainty?

___ Clock latency is defined as the amount of time from the clock origin point to the sync pin of the flop and uncertainty is jitter which is generated by the oscillator that is PLL.

___ Clock latency is the delay between the clock source and the clock pin. It is depended on hardware, PCB, traces, etc.

___Clock uncertainty is the difference between 2 clock signals. It could be the same clock signal arriving at two different points on a PCB ( skew).

___Clock latency is the delay in the clock signal from the clock source port to any clock pin in the circuit.  Clock uncertainty is jitter. But jitter and skew are two different terms.  Jitter is the variation i the clock period ( that is the clock edge might not be at the required time).

Jitter could be caused due to various on chip variations. Jitter need not be expressed with respect to two nodes.

Clock skew is the difference between the clock arrival times at two different nodes.



insertion delay and skew

On chip variation is taken care by applying derates in clock and datapath. Normally in our design we apply 8-10% derate in clock path. This means reported cell delay will be actual cell delay + 10% of actural cell delay. If your insertion delay is high then your derate factor in delay will also high.
This will directly effect your timing.

clock skew and insertion delay (2)

Why clock tree needed in synchronous asic design ?
two resaons:
1. to maintain a reasonable rising time of the clock signal
2. to help reduce clock skew

what is clock skew?
Clock Skew is the difference of clock arriving time at the DFF's clock pin

How to improve clock skew?
use a clock tree

What is max insertion delay?
Max insertion delay is the longest delay from the clock source point to the DFF clock pin in a clock newwork.

How to improve max insertion delay?

max insertion delay depends on several facts,
1. the number of DFFs the clock is driving
2. the die area the DFFs scattered.
to reduce max insertion delay, you need to reduce the area, minimize the  number of DFFs that driven by a single clock, this may lead to changing your clocking strategy.







clock skew and insertion delay

why clock tree needed in synchronous asic design ?
what is clock skew ?
how to improve clock skew ?
what is max insertion delay ?
how to improve max insertion delay ?


Clock Tree is needed to ensure that the clock signal ( from the clock source to logical cell ) are synchronised at the same time with the same clock delay.


Clock skew is the variation of arrival time ( of the clock signal ) to the destination logic cell using the same clock source .  Clock skew is due to  (1) variation in the RC of the clock interconnect due to the geometrical layout of the length and width.  (2) process variation in permittivity and thickness ( due to the actual fabrication of the interconnect),  thus causes imperfection.

To improve clock skew,
1.  Use clock tree,  with branches as short as possible to reduce R and C, and using wider width for
     the higher branches closer to the clock source.

2.  Use a  DLL ( delay lock loop)

3.  Avoid using clock interconnect over many layers. Try to design the clock interconnect on the same metal layer or within 2 layers in order to reduce vertical resistance due to vias, which is highly resistance.

4. Maximum insertion delay = setup time + hold time + maximum propagation delay of the logic cell + maximum time of flight ( propagation delay of the interconnect )

To improve max insertion delay,
1. Reduce maximum time of flight
2. Reduce propagation delay of logic cell.
3. Reduce critical path in the logic cell
4. Alternatively , expand maximum insertion delay by re-timing.



Example Top_Level Implementation Flow with ILMs

1.  prepare the top level verilog file, if needed.

2.  set_analysis_views -setup {module_slowCorner} -hold {module_fastCorner}

3.  Load the config file, including the top-level netlist,  ILM directory name, ilm_blocks.lib

loadConfig fileName
specifyilm -cell block_A -dir ../block_A/block_A ILM
specifyilm -cell block_B -dir ../block_B ILM

4. load the floorplan

loadFPan top_floorplan

5. Place the design
placeDesign

6. Run pre_CTS timing optimization.
optDesign -preCTS

7. Build the clock tree
clockDesign

8. Run post-CTS timing optimization
Or
optDesign -postCTS -hold  ; #optional

9. Route the design
routeDesign

10. Run post-route optimization for setup
optDesign -postRoute

11. Run post-route optimization for setup and hold.
optDesign -postRoute -hold

12. Run post-route optimization for SI
optDesign -postRoute -si




SI and Timing Analysis

1.  Restore the Design
restoreDesign routedSession.dat designname

2.  Import the coupled SPEF file from RCX
spefIn rcx coupled_spef

3. Perform timing analysis in EDI System
timeDesign -postRoute -reportOnly

4. Analysis signal integrity by preforming SI analysis in EDI system.
timeDesign -postRoute -si

5. The following listing is a sample script for signal integrity and timing analysis in EDI system.
timeDesign -postRoute -reportOnly -si


local skew and global skew

*  during CTS, when you choose global skew ,  skew = longest path - shortest path
    local skew, skew = endpoint - startpoint

*  local skew : source and destination flop insertion delay is called local skew.
    global skew: Max insertion delay minus Min insertion delay is called global skew.

*  the EDA speaks also of the useful/local/global skews
    useful is as you called the local skew
    the local skew is the skew in a sub-group
    global skew is the skew that include all clock pins.



Friday, September 25, 2015

How to fix max transition time violations?

The max transition time is one of the three Design Rules ( max fanout, max transition, and max capacitance)
It is much more important than setup/hold timing.

As we all know, in STA, the delay of each std cells is calculated from looking up the NLDM ( non-linear delay model) tables which is defined in library. These tables are two factors: input transition time , and output load. The result of table is the delay value of cell under certain input transition and output load.

If the input transition or output load is  out of range is within but not the values in NDLM,  interpolation is utilized to calculate.

If the input transition or output load is out of range of NLDM, ext-interpolation is used to calculation.
But it is natural the result would be rather in-accurate.

So the STA will be rather in-accurate . Timing analysis is un-believable.

Now you can understand how important max tran is.

--one more reason of fixing ma transition violation is that bigger transition will result in bigger DC power consumption.

--the margin in 30% of max transition is allowed.

For example, if the constraint of max transition is 1 ns, then 1.3 ns is allowed.





What is the meaning of constraints?

Constraints are type of restrictions:

So constraints are the instructions that the designer apply during various step in VLSI chip implementation, such as logic synthesis, clock tree synthesis, place and route, ans static timing analysis.

These are basically two types of design constraints:
Design Rule Constraints,
*Design Rule constraints are defined by the ASIC vendor in the technology library (library file *.lib) file (implicit constraints)
*You cannot discard or override these rules.
*You can apply more restrictive design rules, but you cannot apply less restrictive ones. This thing you can do with the help of optimization constraints.
*Design rules constrain the nets of a design but are associated with the pins of cells from a technology library.
*These constraints can be library specific ( common on all the cells defined in the library file) or maybe individual cell specific.

Optimization Constraints:
>Optimization constraints are explicit constraints ( set by the designer )
>They describe the design goals(area, timing , ans so on) the designer has set for the design.
>They must be realistic

Design Rule Constraints:
> Maximum transition time
*The transition time of a net is the longest time required for its driving pin to change logic values. Transition time is decided on the basic of rise time and fall time.
*This constraint (max_transition) is based on the library data.

DRV means max transition, max cap , max fanout violations.



Design Constraints: Maximum Transition Time

Design Constraints are divided into several parts because it's really a wide and important topic.
Part 1a -> Basics of design constraints and details of "Maximum transition time" ( max_transition)
Part 1b -> Maximum Fanout constraint. ( max_fanout )
Part 1c -> Maximum ( and minimum ) capacitance ( mac capacitance and min capacitance)
Part 1d -> Cell degradation ( cell degradation )

.Basics of Design Constraints
.Classfication of types of "Design Constraints."
  *Design Rule Constraints
  *Optimization Constraints
.Different type of "Design Rule Constraints".
  *Maximum transition time
  *Maximum fanout
  *Maximum ( and minimum ) capacitance
  *Cell degradation

.Details of Maximum Transition Time --Design Rule Constraints


PVT corners

Generally in most user environments, the process , voltage, and the temperature (PVT) point is specified . By referring to a predefined operating condition in a specific timing library. The library operating condition provides the system with values for P, V, and T, and these then are used to calculate derating parameters and other aspects of the analysis.

However, there are situations when there are no predefined operating conditions in the user timing libraries or the pre-existing operating conditions are not consistent with the users's operating environment.



Defining Operating Condition in PD using EDI

I want to introduce operating conditions into the flow.
 Here are my questions?
1). How these operating conditions are defined?
2). Is it dependent on Synthesis and Rtl?
3). On what basis these operating conditions are decided?
4). What is difference between prePD and postPD netlist?

Answers:
1). Using setOptCond command.
2). No.
3). Normally best case and worst case operating conditions is used for EDI do timing optimization          for setup and hold.
4). postPD netlist has complete clock tree structure ans so more accurate timing analysis with wire          capacitance.

Tuesday, September 22, 2015

DRC fix

Problem

In EDI System, verifyGeometry (verify_drc for 20nm and below) is not reporting DRC violations reported by my third party physical verification tool. After some debug I found that these DRCs are due to mismatches between the LEF and GDS libraries, so EDI does not see the violations, and therefore is not able to fix them. I can fix the violations by running the same set of wire editing commands at each location. How can I identify the location of each violation and execute the given command(s)?

Solution

Below is an example script that can be used to read in the violation markers from the third party tool, identify the location of the specific violations to fix, and execute the commands to fix them. In this example the violations are fixed by trimming back the metal.
## sample script
########################################################################
# This script will clean up DRC violations from Calibre markers
########################################################################
#
# Clear DRCs and load in the violation file from Calibre:
#
clearDrc
set drc_marker_file calibre_drc_markers.err
loadViolationReport -type Calibre -filename $drc_marker_file
#

# Set script variables:
#
set datex [exec /bin/date +_%d%b%y_%H%M]
set file [open "verify_connectivity_{datex}.tcl" "w"]
set cnt 0;
#
# Foreach marker retrieve its type and location:
#
foreach marker_id [dbGet -p -e top.markers.userOriginator Calibre] {
  set db_userType [dbget $marker_id.userType]
  set db_box [dbget $marker_id.box]

  #
  # Check if marker's type matches one we want to fix:
  #
  if {[regexp "M4.L.3" $db_userType] \
    || [regexp "M4.L.3.3" $db_userType] \
    || [regexp "M4.L.3.1" $db_userType] \
    || [regexp "M4.L.5" $db_userType] \
    || [regexp "M4.L.3.2" $db_userType]} {
    foreach object [dbQuery -area $db_box -objType wire] {
      set object_layer [dbget $object.layer.name]
      #
      # If violation is on specified layer (M4) and of a specific size (0.09) then fix it:
      #
      if {[regexp M4 $object_layer] && [dbget $object.box_sizey] == 0.09 } {

        #
        # Execute commands to fix violation:
        #

        incr cnt
        deselectAll
        puts "Working on violation $cnt $db_userType $db_box has Y value: [dbget $object.box_sizey]. $object"
        puts $file "verifyConnectivity -net [dbget $object.net.name]"
        editSelect -area [dbget $object.box] -layer M4
        dbSet [dbGet -p top.nets.name $net ].wires.status routed
        editTrim -selected
      }
    }
  }
}
close $file
## end

Monday, September 21, 2015

blockages

Here have some narrow channels which we want to fill with some placement blockages so cells don't place in these area.
Here is the way to do this:

1) Using finishFloorplan

The finishFloorplan command can be used to fill channels of a specified width with the specified blockage type.
For example, the following will fill any channels with a hard blockage if the channel width is less than 10um wide occurring between the core edge , hard macros, and hard blockages:

setFinishFPlanMode -activeObj {core macro hardBlkg} -direction xy -override false
finishFloorplan -fillPlaceBlockage hard 10

2)Use checkFPlan -narrow_channel to determine narrow channels and fill them

This method uses the violation markers from checkFPlan -narrow_channel to identify where to create blockages.
1. clearDrc

2. Source the file containing the code below. Change the width of the channel to your desired value.
## to check narrow channel of 10 um
checkFPlan -narrow_channel 10

proc mk_create_narr_ch_blk { } {
  set f [open narrowChannels_blk.tcl "w"]
  foreach marker_id [dbGet top.markers] {
    set db_userType [dbget $marker_id.userType]
    set db_box [dbget $marker_id.box]
    if {[regexp "narrowChannels" $db_userType] } {
      #Puts " $db_userType"
      puts $f "createPlaceBlockage -type hard -name NARROWCHANNEL -box $db_box"
    }
  }
  close $f
}
## End
3. Run the procedure above:
mk_create_narr_ch_blk
4. Source the resulting file to add the blockages:
source narrowChannels_blk.tcl


Friday, September 18, 2015

floorplan

Floorplanning is the process of defining the chip-size, placing the macrocell and placing I/O pads .
At this stage you can do an estimate for routing requirements etc.

Floorplanning is the process of creating core and IO areas with row defined in the core area. Each row will be divided into a minimum unit of area called SITE. Any placed cell will occupy one or more these sites. The floorplan can be created as soon as the netlist and the LEF is read into the backend tool. After floorplan you can get parameters like area of the die, utilization etc.

The output format of the floorplan depends on the tool used to do the floorplan , but .def is the design exchange format accepted by all the tools, and using all the tools we can dump out the .def file, it contains the physical location of the pins, macros which are used in the design.