VLSI Physical Design: June 2016

Saturday, June 25, 2016

Power Gating

Power Gating is a technique used in integrated circuit design to reduce power consumption, by shutting off the current to blocks of the circuit that are not is use.

Power gating is used to save the leakage power when the system is not in operation. This is accomplished by adding a switch either to VDD or VSS supply. When the design is power gated it literally means the block is powered OFF. Powering OFF a design block is the most beneficial technique of all the low power techniques because you dissipate near zero power. Near zero because the switching circuit used for implementing power gating still dissipate leakage power even in power gating mode. The control to the power gating switching circuit is generated by the Power gating control block.

When VDD is gated the power switch is called the “header” switch. Similarly if VSS is gated we call it as “footer” switch. To gate VDD we use PMOS transistor where as to gate VSS we use NMOS transistor. In reality a single switch is not sufficient to power the whole design. Hence many parallel switches are implemented to achieve low voltage ramp up time and avoid other IR drop issues.

Effect of Power Gating:

Consider a situation where header switch is turned OFF. It means the VDD supply is cut off to the power gated block. But VSS is still active. VSS tries to pull the circuit elements to VSS voltage but after a certain period of time the voltage level of the circuit reaches equilibrium and it stays at an intermediate voltage which is above the VSS voltage level. Similarly when VSS is turned OFF the voltage level of the power gated block reaches a constant intermediate value above VSS. Typically these intermediate voltage values are near to the threshold voltages of the CMOS transistors. When such voltage level signals are feed as an input to the powered up domain the circuits spends more time in threshold voltage level there by causing crow bar currents. This is the reason behind the usage of Isolation cells for the output signals of the powered down block.

Types of power gating:

Based on the scale on which the power gating is applied it can be of two types,

Fine grain power gating
Coarse grain power gating

Fine grain power gating uses a power a a A power switch in each of the standard cell to gate the power to the cell. Obviously the area penalty of such an implementation is very huge. Hence fine grain power gating is least preferred option. Also when a particular cell is power gated the output of the cell need to be isolated from other non-power gated cells. Because of such reasons designers prefer coarse grain power gating over fine grain.

In Coarse grain power gating the power is gated over a region of design (usually classified as a power domain). Because of this the area penalty is relatively very less compared to the fine grain power gating. Also the outputs of the power domain can be isolated relatively easier than isolating every cell as in fine grain power gating. Some of the challenges of coarse grain power gating are minimizing the power up ramp time and manage the IR drop. This is because careful design of parallel switches is required to drive the entire power grid of power gated block.

Factors Influencing Power Switch Network Design:

The factors which are to be considered while designing power switch network are,

Rush current
Leakage current
IR drop
Ramp up time

Rush Current: Rush current is the current drawn by the circuit during initial power up. When an electrical load is powered up it draws a huge current initially to charge up its internal capacitors. This current is many times the average current consumed by the electrical component during its normal operation. For any electrical component there is a limit to the amount of current a component can withstand. In silicon a power domain is like an electrical load to the power switch network. When a power domain is powered up from shutdown all the capacitors in the power domain starts to charge. Since all the capacitors starts to charge simultaneously the amount of charge drawn is huge which cause a sudden rush of current. This rush current can damage the power switch network. Hence we need a careful design of power switch network to mitigate the effect of rush current. Usually many parallel power switches are implemented by dividing the power domain supply grid into many small blocks each block powered by one or more power switches. By doing so the load on each of the power switch is reduced considerably and rush current can be minimized.

Leakage Current: Like any other CMOS transistor power switch also has some leakage current. The number of power switches used to implement the power switch network should be optimal. By having more power switches than required contributes to the leakage current of the power switching network. Usually the power switches are implemented by using High VT cells (MTCMOS).

IR Drop: To handle rush currents the power switches or sleep transistors are designed with high channel resistance. But this leads to IR drop across the power switch thereby degrading the actual logic cells functionality. So the power switch network should be designed to minimize the IR drop across the power switches. To solve this issue designers use two types of power switches one which is used during power up to handle the rush current and then the second one during normal operation. This type of switches are called mother and daughter type switches. The control inputs to these types of power switches are controlled such that one is active at a time.

Ramp Up Time: The time required for powering up a shut down power domain is called ramp up time. This ramp up time should be as minimum as possible. The power network design should be done such a way that the ramp up time is less which is usually done by increasing the number of power switches or sleep transistors.

Monday, June 20, 2016

Specifying the Maximum Transition Constraint

Maximum transition constraints can come from a user input, library, and library pin. User-specified maximum transition constraints are expressed with the main library derate and slew threshold of PrimeTime. The set_max_transition command sets a maximum limit on the transition time for all specified pins, ports, designs, or clocks. When specified on clocks, pins in the clock domain are constrained. Within a clock domain, you can optionally restrict the constraint further to only clock paths or data paths, and to only rising or falling transitions. During constraint checking on a pin or port, the most restrictive constraint specified on a design, pin, port, clock ( if the pin or port is in that clock domain), or library is considered. This is also true where multiple clocks launch the same path.
The set_max_transition command places the max_transition attribute, which is a design rule constraint, on a specified objects. In Prime Time, the slews and maximum transition constraint attributes are reported in the local threshold and derate of each pin or library.

To view the maximum transition constraint evaluations, use the report_constraint -max_transition cmd. Prime Time reports all constraints and slews in the threshold and derate of the pin of the cell instance, and the violations are sorted on the absolute values ( that is, they are expressed in that of design threshold and derate ). You can also use the report_constraint command to report constraint calculations only for maimum capacitance and maximum transition for a specified port or pin list. Use the object_list option to specify a list of pins or ports in the current design that you want to display constraint related information.

To see the port maximum transition limit, sue the report_port -design_rule command. To see the default maximum transition setting for the current design, sue the report_design command. To undo maximum transition limits previously set on ports , pins, designs, or clocks, use remove_max_transition.

setting a maximum transition limit.
To set a maximum transition limit of 2.0 units on the ports of OUT*, enter
pt_shell> set_max_transition 2.0 [get_ports "OUT*"]

To set the default maximum transition limit of 5.0 units on the current design, enter
pt_shell> set_max_transition 5.0 [current_design]

To set the maximum transition limit of 4.0 on all pins in the CLK1 clock domain , for rising transitions in data paths only, enter
pt_shell> set_max_transition 4.0 [get_clocks CLK1] -data_path -rise

Friday, June 17, 2016

specifying the timing derating factors

Timing derating factors model the effects of varying operating conditions by adjusting the delay values calculated for the individual timing arcs of a block. By default, the timing derating factors are 1.0 and the tool does not adjust the calculated delay values.
To set derating factors, use the set_timing_derate cmd and specify the following information:
-- the derating factor
-- whether the derating factor is for early or late delays by using the -early or -late options.
Optionally , you can apply the derating factor to
-- Specific leaf-level instance, hierarchical instance, or library cell by specifying the object.
By default, it applies to the current block.
--Rise or fall delays only by using the -rise or -fall options.
by default, it applies to both rise and fall delays.
-- clock or data paths only by using the -clock or -data options
by default, it applies to both clock and data path.
-- net delays, cell delays, or cell timing checks by using the -net_delay, -cell_delay, or cell_check option. by default, it applies to all three.
-- A specific cornet by using the -corners option.
By default , it applies to the current corner.
The following example reduces all minimum delay by 10 percent and increase all maximum delays by 20 percent for the current cornet:
icc_shell> set_timing_derate -early 0.9 -late 1.2

To report the derating factors, use the report_timing_derate cmd. By default, the cmd reports the derating factors for all corners. To report the derating factors for specific corners, use the -cornet option.

To reset the derating factors to 1.0, use the reset_timing_derate cmd. By default, the cmd resets the derating factors for the current corner for the current block and all its instances. To reset the derating factors for specific corners, use the -corners option. To reset the derating factors for specific objects, specify the objects.

Wednesday, June 15, 2016

How can I selectively reset the global timing derate values?

Questions:
I have set the following global timing derate values:
pt_shell>set_timing_derate -late 1.05
pt_shell>set_timing_derate -early 0.95
Now, I want to reset the timing derate values on a specific instance, for example, U1;
pt_shell> reset_timing_derate [get_cells U1]
why is this command not resetting the timing derate values on the instance?

Answer:
The following cmds set the timing derate values globally on the entire current design:
pt_shell>set_timing_derate -late 1.05
pt_shell>set_timing_derate -early 0.95
pt_shell> report_timing_derate

----------Clock-----------                            -----------Data-----------------
                                        Rise                        Fall                      Rise                                  Fall
                                 Early        Late        Early      Late        Early    Late                 Early          Late
---------------------------------------------------------------------------------------------------------------------------------------------------
Design: test
Net delay static          0.95        1.05         0.95      1.05        0.95    1.05                  0.95           1.05
Net delay dynamic     0.95        1.05         0.95      1.05        0.95    1.05                  0.95           1.05
Cell Delay                  0.95        1.05         0.95      1.05        0.95    1.05                  0.95           1.05
Cell Check                    ---          ---            ---          ---            ---       ---                      ---              ---

Resetting derate values on instance U1 of the design only resets the timing derate values specifically set earlier on instance U1 but does not override the global timing derate settings:
pt_shell> set_timing_derate 1.55 -cell_delay -late [get_cells U1]
pt_shell> set_timing_derate 0.55
pt_shell> report_timing_derate [get_cells U1]

---------clock------------ ------------Data------------

Rise Fall Rise Fall

Early Late Early Late Early Late Early Late

-------------------------------------------------------------------------------------------------------------------------------

Cell(leaf):U1 0.55 1.55 0.55 1.55 0.55 1.55 0.55 1.55

Cell delay

2. Reset all timing derate values set globally on the complete current design:

pt_shell> reset_timing_derate

pt_shell>report_timing_derate [get_cells U1]

-------clock-------- --------Data-------

Rise Fall Rise Fall

Early Late Early Late Early Late Early Late

---------------------------------------------------------------------------------------------------------------------

Cell(leaf):U1 --- --- --- --- --- --- --- ---

Cell delay

In conclusion, the reset_timing_derate cmd only resets the timing derate values at or below the scope where they were set. The cmd does not override the timing derate values set at a higher scope.

how to set timing derate values on library cells for min analysis?

Question:

I would like to apply-specific max and min timing derates. My worst-case library is Worst.db and my Best-case library is Best.db. Here are the commands that I used:

set_link_library "*Worst.db"

set_min_library worst -min_version Best

set_timing_derate -max -cell_delay -data -late 4.00 [get_lib_cells worst/BUF1]

set_timing_derate -min -cell_delay -data -early 1.5 [get_lib_cells Best/BUF1]

report_timing_derate

I found that the tool failed to apply the specified derate factors , as seen in reports generated from report_timing_derate and report_timing -derate . How can I fix this?

Answer:

The set_min_library cmd creates an association between the max and min libraries. when computing min library, the tool finds the corresponding cell from the associated min library. The tool expects you to specify main library to which the cells have been linked in the current design. In this case, max library must be specified in the set_timing_derate -min cmd, as follows:

set_min_derate -min -cell_delay -data -early 1.5 [get_lib_cells Worst/BUF1]

the cells link to the first library that they can be linked to. In the above setting, it is the Worst.db library. However, the min derating is applied to cells in the associated min library, thus getting the desired result.

Tuesday, June 14, 2016

Constraints for PVT Corners

Normally Best/Worst PVT means the Fastest/Slowest operating condition of the circuit: typical Best(fastest) PVT is {process: fast, voltage: low, temperature: low} while worst (slowest) PVT is {process: slow, voltage: high, temperature: high }.

However, recent technology process encounters inverse temperature effect, where cells might function faster at certain high temperature than low temperature, Thus it is suggested that you consult with the cell library provider on what PVT is fastest & slowest.

Notes:
The process fast means, that all transistors on the chip have combination of parameters which gives the minimum cell delay.

Also, We should know about RC corners. The width and thickness of wires (metal) are also may vary from chip to chip ( even on one chip:). So , for the setup check, we should use RCWorst corner for extraction and one of PVT corners that gives you the max cell delay.

Monday, June 13, 2016

Explanation of Deration and OCV

1) Derating is simply another way of adding margin to the design. This allows you to scale all delays by a certain percentage to increase margin. This is not an SDC constraint, but a variable to set within the tool run script.

2) OCV is a timing mode ( like a single or BC/WC). This allows you to use variation from the libraries in performing timing checks for ensuring worst case scenarios.

Sunday, June 12, 2016

How to fix max transition time violations?

The max transition time is one of the three Design Rules ( max fanout, max transition, max capacitance).

It is much more important than setup/hold timing.
As we all know, in STA, the delay of each std. cells is calculated from looking up the NLDM ( non-linear delay model) tables which is defined in library. These tables are two factors: input transition time, and output load. The result of table is the delay value of cell under certain transition and output load.

If the input transition or output load is out of range is within but not the values in NDLM, interpolation is utilized to calculate.
If the input transition or output load is out of range of NLDM , ext-interpolation is used to calculation. But it is natural the result would be rather in-accurate.

So the STA will be rather in-accurate. Timing analysis is un-believable.
Now, you can understand how important max tran is.
-- one more reason of fixing max transition violation is that bigger transition will result in bigger DC
power consumption.
-- The margin in 30% of max transition is allowed.
For example, if the constraint of max transition is 1 ns, then 1.3 ns is allowed.

Saturday, June 11, 2016

PVT Corners

Generally in most user environments, the process, voltage, and the temperature (PVT) point is specified by referring to a predefined operating condition in a specific timing library. The library operating condition provides the system with values for P, V, and T, and these then are used to calculate derating parameters and other aspects of the analysis.

However, these are situations when there are no predefined operating conditions in the user timing libraries or the pre-existing operating conditions are not consistent with the user's operating environment.

Defining Operating Conditions in PD using SoC Encounter

I am using SoC encounter for project. I want to introduce operating conditions into the flow.

1. How these operating conditions are defined?
2. Is it dependent on Synthesis & RTL?
3. On what basis these operating conditions are decided?
4. What is the difference between prePD and postPD netlist?

Answers:
1. Using setOptCond command.
2. No.
3. Normally best case and worst case operating conditions is used for SoC encounter do timing optimization for setup and hold.
4. PostPD netlist has complete clock tree structure and so more accurate timing analysis with wire capacitance.

On-Chip Variation Delay Analysis

During timing analysis, the tool uses the on-chip variation(OCV) mode to perform timing, which models the effects of variation in operating conditions across the chip. This mode performs a conservative timing analysis by simultaneously applying minimum and maximum delays to different paths.

For a setup check, the tool uses maximum delays for the launch clock path and data path and minimum delays for the capture clock path, as shown in the following figure.