Monday, November 30, 2015

Using the Useful Skew Technique

The useful skew technique improves timing QoR by adjusting the clock arrival times to take advantage of positive slack in the network. Use the skew_opt cmd to automatically perform useful skew analysis and generate the sueful skew constraints.

Skew_opt Details
By default, the skew_opt cmd performs the follwoing tasks:
1) Analyzes the design to determine which paths can be used for useful skew.
ICC looks for paths with positive slack where the slack could be restributed (by adjusting the clock latencies) along the input-to-output paths to improve the overrall design timing.

By default, ICC eveluates only the setup constraints. To consider both setup and hold constraints, specify the -hold option. You can also tighten the setup and hold constraints by specifying the
-setup_margin and -hold_margin options, respectively.

ICC does not adjust the latency on the follwoing types of paths:
1. Paths that loop from a register to itself.
2. Paths that contain level-sensitive latches
3. Combinational paths from in input port to an output port.
4. Paths within interface logic models
Note:
ICC does not adjust the latency on a nonstop pin.  For paths that start or end at nonstop pins ( either implicit or explicit), ICC will only adjust the latency at the other endpoints or startpoints.

By default, there is no limit to the amount of latency adjustment that can be made. To limit the amount of latency adjustment, specify the -adjustment_limit option. To further limit the amount the latency can be decreased, specify the -decrease_factor option.

To disable this analysis and use the existing clock latencies, specify the -no_optimization option.
- Determines the interclock relationships in the design.
ICC analyzes the interclock relationships in the design and uses this analysis to define the interclock delay balancing groups.
- Generates a script file that is used to adjust the clock latencies and set the interclock delay balancing constraints.
The script contains set_clock_latency and set_clock_tree_exceptions -float cmds to adjust the clock latencies and set_inter_clock_delay_options cmds to define the interclock delay balancing groups.
By default, a script is generated if any latency adjustments are identified. If you want to generate a script only if the worst negative slack (WNS) improves by a certain amount, specify the
-improvement_threshold option.
By default, the generated script file is named skew_opt.tcl. To use another file name, specify the
-output option.

- Sources the generated script file to apply the clock latency adjustments and interclock delay balancing constraints.
You can disable sourcing of some of the cmds in the generated script, or of the entire generated script.
     1) To disable sourcing of the set_clock_latency cmds, set the skew_opt_skip_ideal_clocks variable to true before running the skew_opt.
     2)To disable sourcing of the set_clock_tree_exceptions -float cmds, set the skew_opt_skip_propagated_clocks variable to true before running skew_opt.
     3) To disable sourcing of the set_inter_clock_delay_options cmds, set the skew_opt_skip_clock_balancing variable to true before running skew_opt.
     4) To disable sourcing of the entire script, specify the -no_auto_source option when you run skew_opt.





Wednesday, November 25, 2015

Analyzing the Clock Tree Results

After synthesizing the clock trees, analyze the results to verify that they meet your requirements. Typically the analysis process consists of the follwoing tasks:

1. Analyzing the clock tree reports
2. Analyzing the clock tree timing
3. Verifying the placement of the clock instances, using the ICC GUI.

If the clock trees meet your requirements, you are ready to analyze the entire design for quality of results.

If the synthesis results do not meet your requirements, ICC can help you debug the results by outputting addtional information during clock tree synthesis. Set the cts_use_debug_mode variable to true before running clock tree synthesis to output the follwoing addtional information:
1)User-defined design rule constraints ( maximum transition time, maximum capacitance, and maximum fanout)
2)user-defined clock tree timing constraints
3)user-defined level restrictions (maximum level count )
4)Clustering targets set by ICC

In addtion, you can output detailed characterization data for the clock tree references by setting the cts_do_characterization variable to true.

Using this information, you can change the clock tree definitions to improve your results.





Performing Clock Routing After Multicorner Optimization

After  finishing multicorner clock tree optimization, you can perform clock routing using either the default router, zroute, or the classic router.

Both Zroute and the classic router support the integrated clock global router and balanced-mode routing. To achieve the best correlation results, ICC uses the integrated clock global router and saves clock global routing information.

When using Zroute ( the default router ), use the route_zrt_group -all_clock_nets cmd to perform clock routing. You must also use the -reuse_existing_global_route option so that Zroute detects the clock global routing information in the Milkyway database and performs incremental global routing.
icc_shell> route_zrt_group -all_clock_nets \
       -reuse_existing_global_route true

To enable the classic router rather than Zroute, set the following command.
icc_shell> set_route_mode_options -zroute false

When using the classic router, you use the route_group -all_clock_nets cmd to perform clock routing. The route_group cmd can detect the clock global routing information in the Milkyway database and can set the global route incremental model automatically.

A Sample Script
The following sample script show an example of running multicorner clock tree optimization:

create_scenario scn1
set_operating_conditions \
           -max $opcond_slow -max_library $lib_slow1 \
           -min  $opcond_fast   -min_library  $lib_fast1
set_tlu_plus_files  \
            -max_tluplus  $tlu_high_r \
            -min_tluplus   $tlu_slow_r \
            -tech2itf_map $tlu_map

create_scenario scn2
set_operating_conditions  \
            -max $opcond_slow  -max_library  $lib_slow  \
            -min  $opcond_fast   -min_library   $lib_fast2
set_tlu_plus_files  \
            -max_tluplus  $tlu_high_r  \
            -min_tluplus   $tlu_low_r  \
            -tech2itf_map   $tlu_map
set_clock_tree_options  \
             -corner_target_skew   "scnf1:max=0.100 scn_cts:max=0.050  scn2:min=0.070"
set_clock_tree_optimization_options \
             -enable_multicorner  "scn1:max scn1:min scn2:max scn2:min "

set_scenario_options -scenarios [all_scenarios] -cts_mode true \
            -cts_corner  [min | max | min_max | none ]
compile_clock_tree
optimize_clock_tree

current_scenario scn_ct1
report_clock_tree
report_clock_tree -op min

current_scenario scn1
report_clock_tree
report_clock_tree -op min

current_scenario scn2
report_clock_tree
report_clock_tree -op min

route_zrt_group -all_clock_nets -reuse_existing_global_route true







Performing Multicorner Clock Tree Optimization

ICC can perform multicorner CTS to achieve the best QoR for your designs by processing various scenario corners simultaneously. Running multicorner clock tree optimization automatically enables the integrated clock global router and the Arnoldi delay analysis (-clock_arnoldi)  to obtain the best clock network correlation and to ensure accurate insertion delay and skew.

To run multicorner clock tree optimization, you see the following cmds:
icc_shell> set_scenario_options -scenarios [list_of_all_scenarios] \
           -cts_mode true -cts_corner [min \max \min_max \ none ]

icc_shell> compile_clock_tree
icc_shell> optimize_clock_tree


Specifying Corners Using Multicorner -Multimode Scenarios

You can use multicorner-multimode scenarios to specify clock tree optimization corners. Each multicorner-multimode scenario has two corners: One maximum and one minimum corner.
For each corner, you define an operating condition and parasitic information files,  using the 
set_operating_conditions cmd and the set_tlu_plus_files cmd respectively. For each scenario, you associate the cell libraries of both maximum and minimum corners, using the set_scaling_lib_group cmd,  not the set_min_library cmd. The set_min_library cmd, which associates a maximum library with a minimum library across all scenarios, is not scenario-specific; therefore, you use the 
set_scaling_lib_group cmd instead. The following shows an example of one scenario declaration:
icc_shell>define_scaling_lib_group -name slow $lib_slow
icc_shell>define_scaling_lib_group -name fast $lib_fast
icc_shell> create_scenario scn1
icc_shell> set_operating_conditions -max $opcond_slow -min $opcond_fast
icc_shell>set_tlu_plus_files -max_tluplus $tlu_high_r \
       -min_tluplus $tlu_low_r -tech2itf $tlu_map
icc_shell> set_scaling_lib_group -max slow -min fast

Specifying Constraints for Multicorner Optimization
ICC applies the same design rule constraints that CTS uses to all corners that you specify during multicorner CTS except the follwoing:
1. Float Pin constraints
2. Skew target goals

Setting Float Pin Constraints
When you specify a float pin value on the leaf node of a clock tree, you must scale the float pin value to apply to each scenario corner. However, achieving a single float pin phase delay across all corners is impossible because cell delays differ in speed. The float pin values that you specify using the set_clock_tree_exceptions cmd apply only to the corners of the default CTS scenario in which the CTS constraints are specified.

For other corners, specify float pin values by using one of the following two ways:
1)Scale float pin values based on the intrinsic speed of cells in different corners by using ICC automatic capability. For example, if the cells of a corner have a speed twice as fast as that of the default CTS scenario, ICC scales down the float pin values that you specify for the default CTS scenario by half for the corner.
2) Use the set_clock_tree_exceptions cmd with the -max_float_pin_scale_factor and the
 -min_float_pin_scale_factor options to specify float pin value scaling factors. You can set the scaling factors on a per scenario basis as shown in the follwoing example:
icc_shell>current_scenario scn1 
icc_shell> set_clock_tree_exceptions \
     -max_float_pin_scale_factor 1.0 -min_float_pin_scale_factor 0.6
icc_shell> current_scenario scn2
icc_shell> set_clock_tree_exceptions \
     -max_float_pin_scale_factor 0.9 -min_float_pin_scale_factor 0.5

Setting Skew Target Goals
By default, you specify the same skew target goal for all corners using the following cmd:
icc_shell> set_clock_tree_options -target_skew 0.050

You can specify various skew target goals for different corners in a multicorner-multimode design as follows:
icc_shell> set_clock_tree_optimization_options -corner_target_skew \
     "scn1:max=0.100 scan_cts:max=0.050 scn2:min=0.070"

Generating Multicorner Optimization Report
To generate a clock tree report for a particular scenario corner, you use the current_scenario cmd to set your current scenario before running the report_clock_tree cmd. For example, to report clock trees of scn1, scn2, and scn3 scenarios, enter the following:
icc_shell> set_scenario_options -scenarios scn1 \
    -cts_mode true -cts_corner min
icc_shell> current_scenario scn1
icc_shell> report_clock_tree
icc_shell> report_clock_tree -op min
icc_shell> set_scenario_options -scenarios scn2 \
       -cts_mode true -cts_corner min
icc_shell> current_scenario scn2
icc_shell> report_clock_tree
icc_shell> report_clock_tree -op min
icc_shell> set_scenario_options -scenarios scn3 \
    -cts_mode true -cts_corner min
icc_shell> current_scenario scn3
icc_shell> report_clock_tree
icc_shell> report_clock_tree -op min

Note:
When running the optimize_clock_tree cmd during multicorner CTS, ICC automatically sets the clock_arnoldi option.

Because the report_clock_tree cmd requires that you define clocks in the scenario that you want to report, you should define clocks in every scenario when using multicorner clock tree optimization.
icc_shell>create_scenario scn1
icc_shell> set_operating_conditions -max $opcond1
icc_shell>set_tlu_plus_files -max_tluplus $tlu1 -tech2itf_map $tlu_map
icc_shell>create_clock -name clock -period 4 clock
icc_shell>create_scenario scn2
icc_shell> set_operating_conditions -max $opcond2
icc_shell>set_tlu_plus_files -max_tluplus $tlu2 -tech2itf_map $tlu_map
icc_shell>create_clock -name clock -period 4 clock











       



Performing Multimode Clock Tree Synthesis

System-on-chip designs integrate multiple modes, such as scan mode, memory BIST mode, and sleep mode, in addtion to the normal  mode in a single chip.  You create a different clock tree synthesis scenario for each mode to run sequential clock tree synthesis, or you combine all scenario constraints in one SDC file to run single-mode clock tree synthesis. Both methods, which are time-consuming and error-prone, fail to produce optimal results. ICC can perform multimode clock tree synthesis by simultaneously processing multiple scenarios to build balanced clock trees under different modes.

Running multimode clock tree synthesis provides the following benefits:
1)Reduces the runtime relative to running sequential clock tree synthesis or single-mode CTS.
2)Produces better QoR because clock trees are built and optimized across all scenarios.

To enable multimode CTS, use the set_scenario_options command. AFter creating all the scenarios, set the -cts_mode option of the set_scenario_options cmd to true, changing it from its default of false, for each scenario for which you want to perform CTS.

For example, the following cmd instructs the tool to use scenario s1 for CTS.
icc_shell> set_scenario_options -cts_mode true -scenarios a1

ICC automatically invokes multimode CTS when the compile_clock_tree cmd detects that more than one scenario has the -cts_mode option set to true. To run single-mode CTS, set the -cts_mode option to true for only one scenario.

Use the following steps to perform multimode CTS:
1. Create all scenarios by using the create_scenarios command.
2. Perform placement and optimization by using the place_opt command.
3. Enable the analysis of multiple clocks that reach a register clock pin by setting the            timing_enable_multiple_clock_per_reg variable to true.
4. Activate all user-selected scenarios by using the following script:

set user_selected_cts_scenarios "scn1 scn2 scn3 scn4"
set_scenario_options -scenarios $user_selected_cts_scenarios  \
                                               -cts_mode true
set_scenario_options -scenarios $user_selected_cts_scenarios  \
                                               -cts_corner min_max

foreach s [user_selected_cts_scenarios ] {
   //read in scenario-specific constraints , exceptions, and so forth
}

Alternatively, if all scenarios are to be considered, you can activate them simultaneously by using the following script:
set_scenario_options -scenarios [list_of_all_scenarios] -cts_mode true

5. Run multimode clock tree synthesis by using the compile_clock_tree command.
6. Perform optimization by using the  optimize_clock_tree command.
7. Proceed with routing by using the route_zrt_clock_tree and route_opt commands.











Tuesday, November 24, 2015

Implementing Clock Meshes

Clock meshes are homogeneous shorted grids of metal that are driven by many clock drivers. The purpose of a clock mesh is to reduce clock skew in both nominal designs and designs across viriations such as on-chip variation (OCV) , chip-to-chip variation, and local power flutuations. A clock mesh reduces skew variation mainly by shorting the outputs of many clock drivers.

Clock Mesh Structure:


Above shows the structure of a clock mesh. The network of drivers from the clock port to the mesh driver inputs is called the premesh tree.  The network of shorted clock driver outputs is called the mesh.

Using the clock meshes provides the following benefits:
1) Small skew variation, especially for high-performance designs.
2)Consistent design performance across variations
3)Predicable results throughout both the design stage and ECO stage later.
4)Stability resulting from mesh grids being close to receivers

Using clock meshes has the following disadvantages:
1. More routing resources are required to create clock meshes.
2. Power consumption is higher during transition on parallel drivers driving the mesh.

Prerequisites for Creating Clock Meshes
Before you run clock mesh cmds, your design should meet the following requirements:
1) The design should be mesh-conducive.
A basic mesh-conducive design contains at least one high-fanout clock net that has no more than two levels below the proposed mesh. If necessary, you can use the remove_clock_gating cmd in Power Compiler or the faltten_clock_gating cmd in ICC to flatten the circuitry under the proposed mesh.
2) The design should have enough room to place mesh drivers near the mesh loads for driving the mesh optimally.
3)To analyze clock mesh circuits, you must have the NanoSim or HSIM transitor models for all the clock mesh gates. A circuit simulator is needed because static timing tools cannot handle clock meshes.
4) You should be able to run NanoSim from the shell where you invoke the ICC.

Creating H-, T-, or I-Shape Routes for Premesh Trees
The add_clock_drivers cmd can create a very regular premesh tree if you set the appropriate options, but the router cannot guarantee regular routes and low skew for the premesh tree.
You will get regular routes ad low skew if you choose the H-, L-, or T-shape route by using the route_htree cmd or by choosing Clock > Clock Mesh > Route HTree in the GUI.

For a net that has one driver and four loads , you can use the route_htree cmd to create an H-shape route. By default, the cmd enables Zroute to create H-, I-, or T-shapre routes.

For example, the following cmd uses Zroute to create an H-shape route by using the M7 metal layer for vertical wires and the M6 layer for horizontal wires. To ensure DRC convergence, Zroute might choose layers other than M6 and M7.

icc_shell> route_htree -nets [get_nets { ccc*L3_net* ccc*L2_net*}] \
           -layer {M6 M7} -orientation {H}

For a net that has noe driver and two loads, you can create an I-shape route. The following example shows how to create a rotated I-shape route:
icc_shell> route_htree -nets [get_nets { ccc*L1_net*}] \
          -layers {M6 M7} -orientation {I_90}

Both H- and I-shape routes produce low skew. Using a series of I-shape routes instead of H-shape routes requires more drivers but minimizes skew.

For a net that has one driver and three loads, you can use route_htree to create a T-shape route

H-tree Structure,



















Routing Clock Nets
After you complete the routing of mesh drivers and premesh trees, perform detail routing of the clock nets by using route_zrt_group -all_clock_nets -reuse_existing_global_route true cmd or choose
Route > Net Group Routing in the GUI.

Performing Clock Tree Optimization

Clock tree optimization improves the clock skew and clock insertion delay by applying addtional optimization iterations. Clock tree optimization is performed during the clock_opt process and can also be run as a standalone process before clock routing, after clock tree routing, or after detail routing. Typically, you would perform standalone clock tree optimization when timing optimization or incremental placement disturbs the clock skew or clock insertion delay.

To perform standalone clock tree optimization, use the optimize_clock_tree cmd or choose Clock > Optimize Clock Tree in the GUI. You can specify  a list of clock trees, ports, ot pins, but not hierarchical pins, as starting points of the clock network by using the -clock_trees option.

ICC provides the following incremental optimization capabilities:
1) Buffer relocation by using the -buffer_relocation option.
2) Buffer sizing by using the -buffer_sizing option.
3)Delay insertion by using the -delay_insertion option.
4)Gate relocation by using the -gate_relocation option.
5)Gate sizing by using the -gate_sizing option.

Note:
During clock tree optimization, ICC ignores the dont_touch attribute on cells and nets. To prevent sizing of cells during clock tree optimization, use the
set_clock_tree_exceptions -dont_size_cells command.

By default, the optimize_clock_tree command assumes that the clock trees in your design are not routed. For unrouted clock trees, the optimize_clock_tree command can perform any of the incremental optimization capabilies. The default behavior is to perform all incremental optimizations.

If your clock trees are routed, you must explicitly the routing stage of the clock trees by using the
 -routed_clock_stage option of the optimize_clock_tree cmd.

For routed clock trees, optimize_clock_tree can perform only the sizing optimizations ( default behavior is to perform both buffer sizing and gate sizing).

To run a subset of the available optimizations, you must explicitly specify the optimizations that you want. If you specify options that are not compatible with the routing status of your design, ICC generates an error message.
For example, to perform only gate sizing on a routed design, enter the following cmd:
icc_shell> optimiza_clock_tree -gate_sizing

After optimizing postroute clock trees, the optimize_clock_tree cmd performs ECO routing and extraction. The type of ECO routing performed depends on the routing stage of the clock trees.
1) For global routed clock trees, ICC performs incremental global routing.
2)For track assigned clock trees, ICC performs detail routing ( utilizing dangling wires).
3) For detail routed clock trees, ICC performs detail routing (utilizing dangling wires) and performs two search-and-repair loops. To change the number of search-and-repair loops, use the
-search_repair_loop option of the optimize_clock_tree command.

To disable ECO routing during optimize_clock_tree , specify the -no_clock_eco_route option.

By default,  clock tree optimization  uses the integrated clock global router to estimate the wire delay and capacitance for better correlation with postroute timing. For best QoR, you should use the integrated clock global router and specify the -clock_arnoldi option whenever possible.

To run multicorner clock tree optimization, you use the -enable_multicorner option to specify at least one corner before running either the clock_opt or optimize_clock_tree cmd .  ICC automatically sets the -clock_arnoldi option.
icc_shell> set_clock_tree_optimization_options -enable_multicorner all
icc_shell> clock_opt

or

icc_shell> set_clock_tree_optimization_options  -enable_multicorner all
icc_shell>optimize_clock_tree

Fixing DRC Violations
The optimize_clock_tree cmd can automatically fix DRC violations in the clock network that are not fixed by the compile_clock_tree cmd because of heuristic limitations. You enable this capability by setting the cto_enable_drc_fixing variable to true, changing it from its default to false.

Fixing DRC violations during clock tree optimization improves the correction between the following two stages:
1) Preroute , which uses virtual routing and the integrated clock global router.
The compile_clock_tree cmd uses virtual routing to estimate the wire delay and capacitance, whereas the optimize_clock_tree cmd invokes the integrated clock global router to perform global routing of clock nets.
2) Postroute, which uses the integrated clock global router and detail router.
To fix DRC violations in multicorner designs, you must also specify the maximum corner.

Running Interclock Delay Balancing
Interclock delay balancing balances the skew between a group of clock trees, either as part of the clock_opt process or as a standalone process.
By default, interclock delay balancing uses the intergrated clock global router to estimate the wire delay and capacitance for better correlation with postroute timing.
To run standalone interclock delay balancing, use the balance_inter_clock_delay cmd or choose Clock > Balance Interclock Delay in the GUI.

Regardless of which delay calculation model that you set by using the set_delay_calculation cmd, the balance_inter_clock_delay cmd uses Elmore delay model by default. If you set the use_improved_icdb variable to true, changing it from its default to false, the balance_inter_clock_delay cmd honors the Elmore or Arnoldi delay model that is set by the set_delay_calculation cmd.

Adjusting the I/O Timing
After implementing the clock trees, ICC can update the input and output delays to reflect the actual clock arrival times. When you adjust the I/O timing, ICC calculates the median insertion delay for each clock tree and applies these values as the clock latency. The Milkyway database and SDC constraints are automatically updated, so you can easily export this data to Prime Time for detailed timing analysis.

To adjust the I/O timing,
1) Run the update_clock_latency cmd.
or
2)Specify the -update_clock_latency option when you run the clock_opt cmd.

ICC adjusts the I/O timing to achieve the accuracy of the clock latency and to prevent false timing violations on I/O paths after CTS in the following ways:
1) For synthesized generated clocks, the network latency of a clock object is updated, but the source latency of the clock object is updated when its master clock is synthesized.
2)For synthesized clocks, network latency is computed by using the median value of the clock propagation delay, that is, the arrival time relative to the clock root at all boundary registers.
3)For virtual clocks defined with the same create_clock cmd, network latency is calculated using the clock propagation delay of the boundary registers clocked by the individual virtual clocks.

To adjust the I/O timing for virtual clocks, you must define the relationships between the virtual clocks and the real clocks before you adjust the I/O timing as follows:
icc_shell> set_latency_adjustment_options \
      -to_clock my_virtual_clock -from_clock my_real_clock
icc_shell> update_clock_latency

When you save your design in Milkyway format, the relationships defined by the set_latency_adjustment_options cmd are stored in the Milkyway design library.

When adjusting the I/O timing based on virtual clocks, the update_clock_latency cmd defines the clock latency for both the real clock and its associated virtual clocks as the median insertion delay of the real clock.

You can report the virtual clock definitions by using the
report_latency_adjustment_options cmd. You can remove the virtual clock definitions by using the reset_latency_adjustment_options cmd.
















High-Fanout Net Synthesis

You can use the compile_clock_tree command to perform high_fanout net synthesis by using the -high_fanout_net nets_or_driving_pins option ( or by choosing Clock > Compile Clock Tree in the GUI and specifying the high-fanout nets in the "High fanout nets" field)

Note:
In a single compile_clock_tree run, you can perform either high-fanout net synthesis ( by specifying the -high_fanout_net option) or clock tree synthesis ( by specifying no clock names or by specifying the clock trees with the -clock_trees option); you cannot perform both tasks in a single run.

When you use compile_clock_tree -high_fanout_net to perform high_fanout net synthesis, the result is a balanced buffer tree ( called a high-fanout tree), which is similar to a clock tree. When you use the create_buffer_tree cmd to perform high-fanout net synthesis, the resulting buffer tree might not be balanced.

The compile_clock_tree cmd  performs high-fanout net synthesis by balancing the arrival times from the drivers of the nets specified in -high_fanout_net to the fanouts of those nets. High-fanout net synthesis does not traverse through preexisting gates on the high fanout net, nor does it support the use of clock tree synthesis.

Important:
If a clock tree exception exists on any fanout pin of a high-fanout net, high-fanout net synthesis generates an error message and fails. You must remove the clock tree exceptions and rerun high-fanout net synthesis.

By default, high-fanout clock tree synthesis can use any of the buffers or inverters in the library. To restrict the set of buffers or inverters used by high-fanout clock tree synthesis, use the
set_clock_tree_reference command.

High-fanout clock tree synthesis determines the clock tree design rules in the same way as standard CTS. To define the clock tree design rules, use the set_clock_tree_options cmd.

By default, high-fanout clock tree synthesis uses the rising edge to determine the skew and arrival times. To use the falling edge instead, use the -sync_phase fall option when you run compile_clock_tree. To use both edges, use the -sync_phase both option when you run compile_clock_tree.


When you perform high-fanout clock tree synthesis, neither the endpoints nor the inserted buffers are fixed after high-fanout net synthesis. This is to allow psynout to optimiza the timing of the high-fanout trees.

To report the skew and path delay of the synthesized high-fanout net, use the report_clock_tree
-high_fanout_net pins_or_nets cmd. You can use the following report_clock_tree options together with the -high_fanout_net option:
-summary, -structure, -level_info, -drc_violators, -operating_condition, and -nosplit. All other report_clock_tree options are not supported with -high_fanout_net.

High-fanout clock tree synthesis has the following limitations:
1) High-fanout clock tree synthesis does not support multicorner or multimode designs.
2)High-fanout clock tree synthesis does not insert level shifters or insolation buffers in multivoltage designs. You must insert the level shifters and isolation buffers before running high-fanout clock tree synthesis.
3)you cannot use optimize_clock_tree to optimize the high-hanout trees. Use the psynopt cmd instead.



Monday, November 23, 2015

Performing Clock Tree Synthesis

CTS is the process of implementing the clock tress based on your requirements. CTS is performed during the clock_opt process and can also be run as a standalone process.

ICC CTS is bloakage_aware by default. The blockage_aware capability avoids routing and placement blockages to reduce DRC violations in designs with complex floorplans. Furthermore, it implements CTS with minimum clock insertion delays, small clock skew, low buffer count, and small clock cell area to produce the best QoR.

During CTS, ICC
1) Upsizes and moves the existing clock gates, which can improve the QoR and reduce the number of clock tree levels.
Note:
To prevent upsizing of specific cells during this process, use the
set_clock_tree_exceptions -dont_size_cell command.
2) Inserts buffers and inverters to build clock trees that meet the clock tree design rule constraints, while balancing the loads and minimizing the clock skew.
3)Fixes DRC violations beyond clock exceptions if the
cts_fix_beyond_clock_exceptions variable is set to true (the default).
4)Builds a blockage map infrastructure per voltage area to identify whether a location is blocked for routing or placement, so the legalizer can move buffers to the nearest unblocked locations toward clock sources.
5)Locates the shortest blockage-avoiding route path from a start point to an end point with minimum
delay to prevent DRC violations.

If your design has logical hierarchy, ICC uses the lowest common parent of a buffer's fanout pins to determine where to insert the buffers.
1. If the lowest common parent is not the top level of the design, the buffer is inserted in the lowest         common parent.
2. If the lowest common parent is the top level of the design, the buffer is inserted in the block that contains the driving pin of the buffer.

ICC adds new ports to the subdesigns where needed. The ports are added such that a minimum number of new ports are added.
To perform standalone CTS, use the compile_clock_tree command ( or Choose Clock > Compile Clock Tree in the GUI and specify the clock trees in the "Clock tree names" field).

Note:
If you compile one clock at a time, be aware that the order in which you compile the clocks can affected the clock tree QoR. For best results, compile the most criticle clock first.




Standalone Clock Tree Synthesis Capabilities

Using the clock_opt command is the recommended method for performing clock tree synthesis and optimization with ICC. However, in cases where finer control is required, ICC also provides the following standalone clock tree synthesis capabilities:

- Clock tree power optimization
- Clokc tree synthesis
- High-fanout net synthesis
- Clock tree optimization
- Interclock delay balancing
- I/O timing adjustment

The script below provides an example of performing clock tree synthesis and optimization by using the standalone capabilities. The following sections provide details about these capabilies.

optimize_pre_cts_power
compile_clock_tree
optimize_clock_tree
balance_inter_clock_delay
route_zrt_group -all_clock_nets -resue_existing_global_route true
update_clock_latency
set_fix_hold [all_clocks]
psynopt -area_recovery -power
(Clock Tree Synthesis and Optimization Using Standalone Capabilities)


Implementing the Clock Trees

The recommended process for implementing the clock trees in the design is to use the clock_opt command, which performs clock tree synthesis and incremental physical optimization. This process results in a timing optimization design with fully implemented clock trees.

Note:
Before implementing the clock trees, save the design. This allows you to refine the clock  tree synthesis goals and rerun clock tree synthesis with the same starting point, if necessary.

By default, ICC uses the following naming convention for buffers and inverters inserted during clock tree synthesis

reference_GxByIz

Where reference is the library reference cell of the buffer or inverter, x is the gate level, y is the buffer level, and z is the instance count.  To more easily locate the inserted buffers and inverters in your netlist, you can add a prefix to the instance names by setting the cts_instance_name_prefix variable. Similarly , you can add a prefix to any nets inserted during clock tree synthesis by setting the cts_net_name_prefix variable.

To perform clock tree synthesis, clock tree optimization, and incremental physical optimization, use the clock_opt command or choose Clock > Core CTS and Optimization in the GUI.

By default, ICC ignores the dont_touch attribute on cells and nets during clock tree synthesis and clock tree optimization.  To prevent sizing of cells during clock tree synthesis and clock tree optimization, use the set_clock_tree_exceptions -dont_size_cells command.

By default, the clock_opt command uses virtual routing during clock tree synthesis, but the optimization process uses the integrated clock global router to estimate the wire delay and capacitance. To ensure better postroute correlation, the integrated clock global router saves clock global routing information in the Milkyway database to be used by clock routing.

The clock_opt command does the following:
1. (Optional) Performs clock tree power optimization
    To perform clock tree power optimization during the clock_opt process, enable physical optimization of the integrated clock-gating cells and power-aware placement, use the -power option of the clock_opt command.

2. Synthesizes the clock trees
Before implementing the clock tress, ICC upsizes, and possible moves, the existing clock gates, which can improve the quality of results (QoR) and reduce the number of clock tree levels.
Note:
To prevent the upsizing of existing clock gates before clustering, set the cts_prects_upsize_gates variable to false. To prevent the moving of existing clock gates before clustering, set the cts_move_clock_gate variable to false.

In addition, ICC might move the existing gates, including integrated clcok-gating (ICG) cells, when this could improved QoR. To prevent ICC from moving existing gates, including integrated clock-gating cells, before clustering, set the cts_move_clock_gate variable to false.

ICC builds clock trees that meet the clock tree design rule constraints, while balancing the loads and minimizing the clock skew. In addtion, ICC optimizes the clock paths beyond exlcude pins, stop pins, and float pins to fix any design rule constraint violations.

Note:
Optimization is not performed on don't buffer nets or inside interface logic models (ILMs).

By default, the clock sink cells might be moved or sized during the legalization and optimization steps that occur after clock tree synthesis. To prevent any modification to the clock sink cells after clock tree synthesis, set the cts_fix_clock_tree_sinks variable to true. Note that fixing the clock sinks can impact the timing QoR.

You can also run clock tree synthesis as a standalone process, using the compile_clock_tree command.

3. Optimizes the clock trees
During clock tree optimization, ICC uses the optimization techniques, such as buffer relocation, buffer sizing, delay insertion, gate sizing, and gate relocation, to further improve the skew.

Note:
During clock tree optimization, ICC ignores the dont_touch attribute on cells and nets. To prevent sizing of cells during clock tree optimization, use the set_clock_tree_exceptions -dont_size_cells command.

You can also run clock tree optimization as a standalone process, using the optimize_clock_tree command.

4. (Optional) Performs interclock delay balancing
To perform interclock delay balancing during the clock_opt process, define the interclock delay balancing requirements, and use the -inter_clock_balance option of the clock_opt command.

Note:
   ICC performs interclock delay balancing by performing delay insertion at the clock root. If the clock root net has a don't buffer net exception, ICC cannot perform interclock delay balancing.
   If the clock root is defined as a port of a pad cell, the delay insertion is peformed on the net driven by the pad cell.

You can also run interclock delay balancing as a standalone process, using the balance_inter_clock_delay command.

5. Performs detail routing of the clock nets
You can also perform detail routing of the clock nets as a standalone process, using the route_zrt_group -all_clock_nets -reuse_existing_global_route true command.
To prevent routing of the clock nets, use the -no_clock_route option of the clock_opt command.

6. Performs RC extraction of the clock nets and computes accurate clock arrival times .

7. (Optional) Adjusts the I/O timing
To adjust the input and output delay based on the actual clock arrival times, use the
-update_clock_latency option of the clock_opt command. ICC uses the adjusted input and output delays during placement and timing optimization.

You can also update the I/O timing as a standalone process, using the update_clock_latency command.

8. (Optional) Optimizes the scan chains
To optimize the scan chains by reordering the chains to minimize the number of buffer crossings in the scan chain, use the -optimize_dft option of the clock_opt command.

9. Fixes the placement of the clock tree buffers and inverters.

10. Performs placement and timing optimization.
If you specify -update_clock_latency, ICC uses the adjusted input and output delays during placement and timing optimization. ICC uses propagated arrival times for all clock sinks.

You can customize the placement and timing optimization process by specifying the following options: -area_recovery, -in_place_size_only, and -size_only.  You can perform leakage optimization and -power.
You can also run only placement and timing optimization as a standalone process, using the psynopt command.
To prevent placement and timing optimization, use the -only_cts option of the clock_opt command.

To run only placement and timing optimization ( and not clock tree synthesis, clock tree optimization, or clock tree routing), use the -only_psyn option of the clock_opt command.

11. (Optional) Performs power optimization
You can perform leakage power optimization and dynamic power optimization during the clock_opt process by enabling the selected optimizations with the set_power_options command and using the -power option of the clock_opt command. To enable leakage power optimization, use the set_power_options -leakage command. To enable dynamic power optimization, use the set_power_options -dynamic command.

12. (Optional) Fixes hold time violations
To fix hold time violations during the clock_opt process, use the -fix_hold_all_clocks option of the clock_opt command.










































Verifying the Clock Trees

Before you synthesize the clock trees, use the check_clock_tree command to verify that the clock trees are properly defined.

icc_shell> checcl_clock_tree -clocks my_clk

If you do not specify the -clocks option, ICC checks all clocks in the current design.

The check_clock_tree command checks for the follwoing issues:
- Hierarchical pin defined as a clock source
- Generated clock without a valid master clock sourece
  A generated clock does not have a valid master clock source in the following situations:
    1) The master clock specified in create_generated_clock does not exist.
    2) The master clock specified in create_generated_clock does not drive the source pin of the generated clock.
    3) The source pin of the generated clock is driven by multiple clocks, and some of the master clocks are not specified with create_generated_clock.

For example,
the reg01/Q pin is driven by both clka and clkb. If only clkb is apecified as a master clock in a create_generated_clock command, gen_clkb does not have a valid clock source.

4) Clock (master or generated ) with no sink  
5) Looping clcok
6) Cascated clock with  an unsynthesized clock tree in its fanout
7)Multiple-clocks-per-register propagation not enabled, but the design constains overlapping clocks.
8)Ignored clock tree exceptions
9)Stop pin or float pin defined on an output pin.
10)Buffers with multiple timing arcs used in clock tree references
11)Situations causing empty buffer list










Multivoltage Designs --CTS

Clock tree synthesis and optimization are voltage_area_aware.  When running clock tree synthesis on multivoltage designs,

- Sink pins are seperated and clustered by voltage area so that clock subtrees are built for each voltage area.
- A guide buffer is inserted for the set of sink pins for each voltage area to ensure that any subsequent levels of clustering do not mix pins from different voltage areas.
- Buffers are not inserted between an isolation cell a nd the shut-down power domain boundary.
- Dual-power always-on clock cells can be insered or removed as needed on always-on paths in the shut-down or powered-up power domain.

After the clock subtrees are built for each voltage area, clock tree synthesis can proceed in the usual manner, joining the subtrees at the root of the clock net. In addtion to the synthesis of the initial clock tree, the proceeding behaviors are honored by all clock tree optimization techniques, such as buffer relocation, buffer sizing, gate relocation, gate sizing, and delay insertion.

Multicorner-Multimode Designs

To perform clock tree synthesis and optimization in a multicorner-multimode design, you must specify which scenario to use for clock tree synthesis and optimization by using the set_scenario_options command. To see the current clock tree synthesis scenario, run the report_scenario_options command.











Friday, November 20, 2015

Hierarchical Designs Using Interface Logic Models

You can use interface logic models (ILMs) to increase the capacity and reduce the runtime for top-level clock tree synthesis. Brfore creating ILMs for use with top-level clock tree synthesis, you must perform clock tree synthesis on the blocks.

During clock tree synthesis and optimization, ICC
1) Identifies any ILMs inside a clock tree
When a clock defined at the top level goes through an ILM, ICC insert guide buffers before the ILM clock input pin and after the ILM clock output pins. The nets between the input guide buffer and output guide buffers are marked as don't buffer nets.

2)Honors clocks or generated clocks defined on an ILM port or a pin internal to the ILM
When a clock is defined on an ILM input port or in an ILM, ICC inserts guide buffers after the ILM clock output pins. Clock nets within the ILM ( up to the guide buffers) are marked as don't buffer nets.

3) Times the clock subtrees inside the ILM to calculate the phase and transition delays for the ILM
ICC uses the timing information for the clock trees within the ILM to perform skew balancing and insertion delay minimization up to the ILM clock input pins and beyond the ILM clock output pins.

If there are mulpiple subtrees after an ILM, ICC synthesizes each subtree independently and does not balance the insertion delay between them, which can result in large skew between them. To reduce this skew, run the optimiza_clock_tree command after performing clock tree synthesis.

4) Honors explicit stop pins, exclude pins, and sink pins on an ILM port or inside an ILM.


Specifying Clock Tree Optimization Options

ICC optimizes the clock trees during the design stages .
During the optimization phases, ICC can perform several optimization tasks, which you can enable or disable by setting the appropriate options.

Design Stages Using Clock Tree Optimization












Note:
ICC uses the clock tree synthesis design rule constraints for all optimization phases, as well as for clock tree synthesis.







Setting Clock Tree Routing Options

ICC allows you to specify the follwoing options to guide the clock tree routing:
1)Which routing rule (type of wire) to use
2)Which clock shielding methodology to use
3)Which routing layers to use
4)Which nodefault routing rules to use with which cell types

Specifying Routing Rules
If you do not specify which routing rule to use for clock tree synthesis, ICC uses the default routing rule (default wires) to route the clock trees. To reduce the wire delays in the clock trees, you can use wide wires instead. Wide wires are represented by nondefault routing rules.

Before you can use a nondefault routing rule, the rule must either exist in the Milkyway design library or have been previously defined by using the define_routing_rule command.
For example, to define the the clk_rule nondefault routing rule, enter the following command:

icc_shell> define_routing_rule clk_rule \
      -widths {M1 0.28 M2 0.28 M3 0.28 M4 0.28 M5 0.28 M6 0.28 M7 0.28 } \
      -spacings { M1 0.28 M2 0.28 M3 0.28 M4 0.28 M5 0.28 M6 0.28 M7 0.28 }

To see the current routing rule definitions, run the report_routing_rules command.
To see the clock tree routing rule, use the set_clock_tree_options -routing_rule command.

You can specify the clock tree routing rule for a specific clock tree by using the -clock_trees option to specify the clock or for all clocks by omitting the -clock_trees option.

For example, to use the previously defined clk_rule nondefault routing rule for routing all clock tress, enter the following command:
icc_shell>set_clock_tree_options -routing_rule clk_rule

By default, the specified routing rule is used for all nets in the clock tree. However, wide wires are often not required on the nets closest to the clock sinks. To use default wires on the nets connected to the clock sinks and the bottom n-1 levels of the clock tree, use the -use_default_routing_for_sinks option on the command line.

Note:
If you enable default routing for sinks, it applies to all clock trees. You cannot enable this capability on a per-clock basis.  In addition, the default routing applies only to clock sinks connect to flip-flops. Clock sinks connected to macro cells are not affected by this option.

To see the nondefault routing rules defined for the clock trees in your design, run the report_clock_tree -settings command.

Shielding Clock Nets
ICC implements clock shielding using nondefault routing rules.  You can choose either to shield clock nets before routing signal nets or vice versa. The methodology of shielding clock nets before routing signal nets yields better shielding coverage but can cause more DRC violations during signal net routing compared to the methodology of routing signal nets before shielding clock nets.

Clock Shielding Methodologies,






























To define nondefault routing rules dor clock shielding, use the define_routing_rule command.
The syntax is,
define_routing_rule  rule_name
     [-snap_to_track ]
     [-shield_spacings  shield_spacing ]
     [-shield_widths  shield_widths ]

To assign nondefault routing rules to clock nets, use the set_clock_tree_options command.

The syntax is
set_clock_tree_options  [-clock_tree_name  clock_tree_name ]
       [-root  pin_name ]
       [-routing_rule  rule_name]
       [-use_default_routing_for_sinks ]

The nondefault routing rules of clock shielding apply only to nets that are assigned with nondefault routing rules. You can indicate whether to add shielding to leaf pins by using the
-use_default_routing_for_sinks option.

After assigning nondefault routing rules to clock nets, you can synthesize clock trees and route clock nets using the clock_opt command.

ICC router and extractor honor virtual shielding rules. Virtual shielding rules require that the router leaves enough routing resources for shielding to be inserted later and that the extractor considers the shielding effect before shielding metal is inserted. Virtual shielding is supported by virtual routing, global routing, track assignment, and detail routing stages.
To route signal nets, you can use standalone commands such as route_zrt_global, route_zrt_detail, and route_zrt_auto, or the route_opt command.
After routing signal nets, you can add shielding to the routed clock nets using the create_zrt_shield command. Alternatively, you can choose to add shielding to the routed clock nets before routing signal nets.
The following sample script  shows the methodology for routing signal nets before clock shielding.

#Create new nondefault routing rule named SP
define_routing_rule SP \
  -widths {M1 0.14 M2 0.14 M3 0.14 M4 0.14 M5 0.14 M5 0.14 M6 0.42} \
  -spacings {M1 0.14 M2 0.42 M3 0.42 M4 0.42 M5 0.42 M6 1.60} \
  -via_cuts  {V12 "lxl" V23 "1x1" V34 "1x1" V45 "1x1" V56 "1x1"}
  -shield_widths {M1 0.14 M2 0.14 M3 0.14 M4 0.14 M5 0.14 M6 0.42} \
  -shield_spacings {M1 0.14 M2 0.42 M3 0.42 M4 0.42  M5 0.42 M6 1.60 }

##Synthesis and route clock trees
clock_opt
set clock_nets [get_nets -of [all_fanout -clock_tree]]
#Route signal nets
route_opt
#Add shielding to clock nets
create_zrt_shield -nets $clock_nets

Specifying Routing Layers
If you do not specify which routing layers to use for clock tree synthesis, ICC can use any routing layers. For more control of the clock tree routing, you can specify prefered routing layers by using the set_clock_tree_options -layer_list command.

You can specify teh preferred clock tree routing layers for a specific clock tree by using the -clock_trees option to specify the clock or for all clocks by omitting the -clock_trees option.
For example,
icc_shell>set_clock_tree_options -clock_trees CK1 -layer_list {metal4 metal5}

When you specify the clock tree routing layers by using this command, the specified layers apply to all levels of the clock tree. For finer control of the clock tree routing layers, you can specify the layer constraints in a clock configuration file.

By default, ICC treats the minimum layer specification as a soft constraint and can use lower layers for clock tree routing, if necessary. To require clock tree routing on the specified layers, set the follwoing option before running clock tree synthesis:
icc_shell> set_route_zrt_common_options -min_layer_mode hard

Note:
If you have defined layer constraints on signal nets, you must reset this option to soft  before performing detail routing on the design.

To remove the restrictions on the clock tree routing layers, use the
reset_clock_tree_options -layer_list command.

Association of Nondefault Routing Rules With Reference Cells
Electromigration problems result from an increase  in current densities, which often occurs when strong cells drive thin nets. Electromigration can lead to opens and shorts due to metal ion displacement caused by the flow of electrons and can lead to the functional failure of the IC device.
To prevent these problems in clock networks, you can associate reference cells with compatible nondefault routing rules by using the set_reference_cell_routing_rule command.

You use the following syntax to assocoate reference cells with nondefault routing rules:

set_reference_cell_routing_rule
     -routing_rule NDR_name
     -reference list_of_reference_cells

You must specify both the -routing_rule and -reference options.  If you use the
set_reference_cell_routing_rule command, specifying only the -routing_rule option but not the
-references option, the nondefault routing rule does not apply during clock tree synthesis and optimization.

For example, to use the CLOCK_RULE nondefault routing rule for nets that are driven by instances of the BUF, BUF2, and INV1 reference clock cells, enter
icc_shell> set_reference_cell_routing_rule -routing_rule CLOCK_RULE -reference {BUF1 BUF2 INV1}

When you associate reference cells with nondefault routing rules,
1)If multiple nondefault routing rules are associated with a reference cell, clock tree synthesis and optimization uses the nondefault routing rule that provides the best result for each instance of the reference cell.
2)The compile_clock_tree command, the optimize_clock_tree command, and interclock delay balancing consider these nondefault routing rules.
3)These nondefault routing rules do not apply to nets beyond exception pins.

To report the nondefault routing rules, use the report_reference_cell_routing_rule command. If you specify the -routing_rule option, the command lists the corresponding reference cells for each specified nondefault routing rule. If you specify the -references option, the command lists the corresponding nondefault routing rules for each specified reference cell.

To reset all nondefault routing rules, use the reset_reference_cell_routing_rule command.

Inserting Boundary Cells
When you are working on a block-level design, you might want to preserve the boundary conditions of the block's clock ports (the boundary clock pins). A boundary cell is a fixed buffer that is inserted immediately after the boundary clock pins to preserve the boundary conditions of the clock pin.

To enable boundary cell insertion during clock tree synthesis,
1. Specify the buffers (or inverters) used for boundary cell insertion.
2. Enable boundary cell insertion by using the set_clock_tree_options -insert_boundary_cell true command.

If you enable boundary cell insertion, it applies to all clock trees. You cannot enable boundary cell insertion on a per-clock basis.

Note:
You cannot use boundary cell insertion together with a clock tree configuration file. If you specify both options, ICC disables the boundary cell insertion and generates a warning message.

When boundary cell insertion is enabled, ICC inserts a cell from the buffer insertion clock tree reference list immediately after the boundary clock pins. For multivoltage designs, ICC inserts the boundary cells in the default voltage area.

ICC does not insert a boundary cell when the net is either a don't buffer net or a bidirectional net or when these is a large blockage at the boundary clock pin, which would cause a large distance between the boundary cell and the clock pin.

The boundary cells are fixed for clock tree synthesis; after insertion, ICC does not move or size the boundary cells. In addtion, no cells are inserted between a clock pin and its boundary cell.

Selecting the Clock Tree Clustering
ICC performs clustering of the clock sinks to minimize wire length. If your design is sensitive to on-chip viriation (OCV), ICC can also consider on-chip variation effects during clusting.

If you are using a multicorner design flow, you can reduce skew variation by using RC constraint-based clusting. To use RC constraint-based clustering , you must use a clock configuration file to specify the clock tree structure.

Enabling on-chip-variation-aware clustering
The optional OCV-aware clustering considers the timing constraints between clock sinks to influence clustering. Sinks with timing-critical paths driven by the same gates will be clustered together. To enable the OCV-aware clustering , use the set_clock_tree_options -ocv_clustering true command. When you set this option, it applies to all clock trees in your design.

When you use timing derating, using the set_timing_derate command, OCV-aware clusting can result in better timing (worst negative slack and total negative slack) with minimal impact on the clock tree skew and insertion. However, suing OCV-aware clusting can increase the runtime and power.

Note:
You cannot use OCV-aware clustering with clock tree configuration files. If you specify the -ocv_clustering option when these options are set, ICC ignores the -ocv_clustering option. If you specify a clock tree configuration file after setting the -ocv_clustering command, ICC generates an error message and ignores both settings.

Enabling Logic-Level Babancing

If on-chip variation is an issue for your design, use the logic-level balacing mode.

Note:
If the level count or fanout varies greatly between the brances of the initial clock tree, logic-level balacing might not be able to achieve good clock tree QoR.

Be default, ICC balances the delay in each branch of the clock tree, but it does not consider the number of logic levels in each branch. ICC can take into account both delay and the number of logic levels when balacing the clock tree. This feature is called logic-level balacing.


















Logic -level balacing can use buffers, inverters, or both to balance the logic levels. If you use only inverters for logic-level balancing and the initial clock network does not have balanced logic levels, the generated clock trees might be unbalanced by one level.

Logic-level Balancing Using Inverters













To enable logic-level balacing, use the set_clock_tree_options -logic_level_balance true command.

If you enable logic-level balancing, it applies to all clock trees. You cannot enable logic-level balancing on a per-clock basic.

If a clock tree contains a subtree that is not modified during clock tree synthesis ( either a don't touch subtree or a subtree within an interface logic model), ICC traces through the subtree to determine the number of logic levels contained in the subtree. ICC considers these logic levels when constructing the clock tree. If the subtree does not have balanced logic levels, ICC generates a warning message and uses the maximum number of levels in the subtree as the number of logic levels for the subtree.

Logic-level balancing with a don't touch subtree,




















If your design contains hard macros, use the set_clock_tree_exceptions -float_pin_logic_level command to specify the number of logic levels within the hard macro. The number of logic levels must be a positive integer. If you do not specify the number of logic levels and ICC cannot derive the number of logic levels, ICC assumes that there are no logic levels within the hard macro.
Note:
When you remove float pin logic level information by using the
remove_clock_tree_exceptions -float_pin_logic_level command, ICC removes level information for the entire design. It does not remove level information for an individeal pin.

After clock tree synthesis finishes, ICC verifies that the clock tree are balanced. IF the number of logic levels varies between branches, ICC generates a warning message.

To enable OCV-aware clustering while running logic-level balancing, set both the -logic_level_balance and -ocv_clustering options to true with the set_clock_tree_options command.

Caution:
If you use logic-level balancing, do not run clock tree optimization with delay insertion. Doing so can unbalance the logic levels. If you use logic-level balancing when running the clock_opt command, delay insertion is automatically disabled during the embedded  clock tree optimization.

Enabling Region-Aware Clock Tree Synthesis
Region-aware clock tree synthesis considers region constraints to create more balanced clock tree and to avoid DRC violations in designs with complex floorplans. For designs with region constraints, using region-aware clock tree synthesis can produce better QoR, This capability is enabled by default. If you want to disable region-aware clock tree synthesis, set the cts_region_aware variable to false, changing it from its default of true.

Region-aware clock tree synthesis can identify the following region constraints:
1) Logic modules with move bounds
2)Logic modules with target library subset constraints
3)Disjoint voltage areas
4)Power guides in power-down regions

During region-aware clock tree synthesis, the tool performs the follwoing steps:

1. Enables the clock_opt or compile_clock_tree coammand to group buffers in the target library by the same operating condition, power state, and target library subset associated with move bounds.
2. Partitions a design into regions according to constraints such as voltage areas, plan groups, and move bounds: hard and exclusive .
3. Associates each buffer group with a region and vice versa.
4. Associates each power guide with the region that contains it.
5. Performs region-aware clustering.

Design Partitions





Wednesday, November 18, 2015

Specifying the Clock Tree References

ICC uses four clock tree reference lists:

1) one for clock tree synthesis
2)one for boundary cell insertion
3) one for sizing
4) one for delay insertion

By default, each clock tree reference list constains all the buffers and inverters in your technology library.

To fine-tune the results, you can restrict the set of buffers and inverters used for one or more of these operations. For Example, If your clock tree has too many levels, it could be that the clock tree synthesis reference have a loewdrive strength.

To define a clock tree reference list, use the set_clock_tree_references command ( or choose Clock > Set Clock Tree References in the GUI). When you define a clock tree reference list, ensure that the buffers and inverters that you specify have a wide range of drive strengths, so that clock tree synthesis can select the appropriate buffer or inverter for each cluster.

Note:
The clock tree synthesis reference list must include at least one inverter, or clock tree synthesis fails.
If you are using the default clock tree reference list, you must ensure that your target library contains at least one inverter that does not have a dont_use attribute. If you define a clock tree synthesis reference list, you must ensure that it contains at least one inverter.

When you run the set_clock_tree_reference command, ICC verifies that the cells you specify exist in the target libraries, and it generates a warning message if it cannot find a cell.
Note:
For multicorner-multimode designs, ICC checks only the libraries associated with the clock tree synthesis scenario. You need to ensure that the specified clock references exist in the target library specified for the clock tree synthesis scenario.

When you explicitly include a cell in a clock tree reference list, ICC can use the cell for the task associated with the reference list, even if the cell has a dont_use attribute. However, if you set the dont_use attribute on a cell after it is included in a clock tree reference list, ICC honors the dont_sue attribute.

ICC uses this clock tree trference list for all clock trees.

If you issue the set_clock_tree_references command multiple times, the new references you specify are added to existing references. References you previously listed but omitted from a later list are not deleted. To delete references, use the
reset_clock_tree_references command or choose Clock > Set Clock Tree References in the GUI and click Default.

For example, to create a clock tree synthesis reference list, enter
icc_shell> set_clock_tree_reference -references {clk1a6 clk1a9 clk1a15 clk1a27}

ICC uses this clock tree reference list for all clock trees.

Defining Clock Cell Spacing Rules
Clock cells consume more power than cells that are not in the clock network. Clock cells that are clustered together in a small area increase current densities for the power and ground rails, where a potential electromigration problem might occur. One way to avoid the problem is to set spacing requirements between clock cells. You set  the spacing requirements by defining clock cell spacing rules for inverters, buffers, and integrated clock-gating cells in the clock network.

To define clock cell spacing rules, use the set_clock_cell_spacing command and set the mandatory
-x_spacing  and -y_spacing options with a nonzero value. You can optionally restrict the clock cell spacing rules to a collection of library cells by using the -lib_cells option or to a collection of clock names by using the -clocks option.

To report clock cell spacing rules, use the report_clock_cell_spacing command.  The report categorized information into three sections:
1. Clock cell spacing rules set by the -lib_cell option only or by no option.
2. Clock cell spacing rules set by the -clocks option only.
3. Clock cell spacing rules set by both the -lib_cell and -clocks options.

To remove clock cell spacing rules, use the remove_clock_cell_spacing command. You can specify the -lib_cells and -clocks options to remove clock cell spacing constraints from the specified library cells and clocks respectively.

If you use the remove_clock_cell_spacing command with
1) The -lib_cells option
The command removes only the clock cell spacing rules defined by the
set_clock_cell_spacing -lib_cells command with the specified library cells.

2)The -clocks option
The command removes only the clock cell spacing rules defined by the
set_clock_cell_spacing -clocks command with the specified clock names.

3)Both the -lib_cells and -clocks options
The command removes the clock cell spacing rules defined by the
set_clock_cell_spacing -lib_cells -clocks command with the specified library cells and clock names.

4)No option
The command removes the clock cell spacing rules defined by the
set_clock_cell_spacing command with no option.

You use the following steps to reduce electromigration in the design:
1. Afetr obtaining a placement CEL view, set the clock spacing rules by using the set_clock_cell_spacing command.

To remove and report the clock cell spacing rules, use the remove_clock_cell_spacing and report_clock_cell_spacing commands respectively.

2. Perform clock tree synthesis by using either the  compile_clock_tree or optimize_clock_tree command.

3. Check clock cell spacing rule violations by using the check_legality -verbose command.
You should not see any violations if yous set the appropriate clock cell spacing constraints.

Note that the compile_clock_tree , optimize_clock_tree , split_clock_net, and balance_inter_clock_delay commands support clock cell spacing rules.

Specifying Clock Tree Synthesis Goals
The optimization goals used for synthesizing the design and the optimization goals used for synthesizing the clock trees might differ. Perform the following steps to ensure that you are using the proper constraints:
1. Set the clock tree design rule constraints
2. Set the clock tree timing goals

ICC prioritizes the clock tree synthesis optimization goals as follows:
1. Design Rule Constraints
    a. Meet maximum capacitance constraint
    b. Meet maximum transition time constraint
    c. Meet maximum fanout constraint
2. Clock tree timing goals
    a. Meet maximum skew target
    b. Meet minimum insertion delay target

Setting Clock Tree Design Rule Constraints

ICC supports the following design rule constraints for clock tree synthesis:
1)Maximum capacitance ( set_clock_tree_options -max_capacitance)
If you do not specify this constraint, the clock tree synthesis default is 0.6pF.

2)Maximum transition time (set_clock_tree_options -max_transition)
If you do not specify this constraint, the clock tree synthesis default is 0.5ns

3)Maximum fanout ( set_clock_tree_options -max_fanout)
If you do not specify this constraint, the clock tree synthesis default is 2000.

You can specify the clock tree design rule constraints for a specifc clock ( by using the -clock_trees option to specify the clock) or for all clocks ( by omitting the -clock_trees option).

Note:
ICC does not support the specification of per-clock design rule constraints for overlapping clock domains.

Setting Clock Tree Timing Goals

During clock tree synthesis, ICC considers only the clock tree timing goals. It does not consider the latency ( as specified by the set_clock_latency command) or uncertainy ( as specified by the set_clock_uncertainty command).

Note:
ICC can consider the clock latency specification during interclock delay balancing.

You can specify the follwoing clock tree timing goals for a clock tree:
1)Maximum skew ( set_clock_tree_options -target_skew)
During optimization, ICC computes the skew value by comparing the arrival times of all clock signals in a clock domain, including those that do not communicate through data paths ( global skew).
2)Minimum insertion delay ( set_clock_tree_options -target_early_delay)
ICC checks the minimum insertion delay after synthesizing the initial clock tree.
If the synthesized clock tree does not meet the specified minimum insertion delay, ICC inserts buffers at the clock root to match the requirement.
If you do not specify a minimum insertion delay value, ICC uses 0 as the minimum insertion delay.

You can specify the clock tree timing goals for a specific clock by using the -clock_trees option to specify the clock or for all clocks by omitting the -clock_trees option.

Setting Level Restrictions
By default, ICC allows a maximum of 20 levels in each subtree of a clock tree. If you require a different value, sue the set_clock_tree_options -max_buffer_levels command to specify the maximum number of levels per subtree.

You can specify the maximum level count for a specific clock ( by using the -clock_trees option to specify the clock) or for all clocks ( by omitting the -clock_trees option).
Note:
During clock tree synthesis, the maximum number of levels has priority over the clock tree design rule constraints.
For example, the following cmd specifies that all subtrees of the clock tree CLK are to have a maximum of four levels :
icc_shell> set_clock_tree_options -clock_trees CLK -max_buffer_levels 4

Maximum Clock Buffer Levels











 








Specifying Clock Tree Exceptions

To define clock tree exceptions, use the set_clock_tree_exceptions command or choose Clock > set Clock Tree Exceptions in the GUI. You can set clock tree exceptions on pins or hierachical pins.

If you issue the set_clock_tree_exceptions command multiple times for the same pin, the pin keeps the highest-priority exception. ICC prioritizes the clock tree pin exceptions as follows:

1 Nonstop pins
2.Exclude pins
3. Float pins
4. Stop pins

Note:
The don't touch subtree exception is compatible with the nonstop, exclude, float, or stop pin exception. You can set both exceptions on a pin, and clock tree synthesis honors both exceptions.

To remove clock tree exceptions, use the remove_clock_tree_exceptions command or choose Clock > Remove Clock Tree Exceptions in the GUI.

To see the clock tree exceptions  defined for your design, generate a clock tree exceptions report by running the report_clock_tree -exceptions command or by choosing choose Clock > Report Clock Tree in the GUI and selecting Exceptions.

Note:
If your design contains sequential cells with unconnected outputs, the clock pins of these cells are marked as implicit ignore pins. When you run clock tree synthesis, these unloaded sequential cells are deleted from the design. As a result, at the end of clock tree synthesis, you no longer see these implicit ignore pins. To prevent the removal of these sequential cells, set the physopt_delete_unloaded_sequential_cells variable to false before running clock tree synthesis.

If your design contains dangling nets, the clock tree exceptions report might show false implicit ignore pins on these nets.  If this happens, you can remove the false implicit ignore pins by saving your design and reopening it.

Specifying Nonstop Pins
Nonstop pins are pins that would normally be considered endpoints of the clock tree, but instead ICC traces through them to find the clock tree endpoints. The clock pins of sequential cells driving generated clocks are implicit nonstop pins. In addtion, ICC supports user-defined ( or explicit ) nonstop pins.

To specify a nonstop pin, use the set_clock_tree_exceptions -non_stop_pins command.

For example, to specify pin U2/CLK as a nonstop pin, enter
icc_shell> set_clock_tree_exceptions -non_stop_pins  [get_pins U2/CLK]
 To remove the nonstop pin definition from a pin,  use the remove_clock_tree_exceptions
-non_stop_pins command.

For example, to remove the nonstop pin definition from pin U2/CLK , enter
icc_shell> remove_clock_tree_exceptions -non_stop_pins  [get_pins  U2/CLk]

Specifying Exclude Pins
Exclude pins are clock tree endpoints that are excluded from clock tree timing calculations and optimizations. ICC uses exclude pins only in calculations and optimizations for design rule constraints. In addtion to the exclude pins inferred by ICC ( the inplicit exclude pins), ICC supports user-defined ( or explicit)  exclude pins. For example, you might define an exclude pin to exclude all branches of the clock tree that fan out from some combinational logic or to  exclude an implicit stop pin.

During clock tree synthesis, ICC isolates exclude pins ( both implicit and explicit) from the clock tree by iserting a guide buffer before the pin.  Beyond the exclude pin, ICC never performs skew or insertion delay optimization, bot does perform design rule fixing.

To specify an exclude pin, use the set_clock_tree_exceptions -exclude_pins command.

For example, to exclude clock sink U2/CLK, enter
icc_shell> set_clock_tree_exceptions -exclude_pins [get_pins U2/CLK]
To remove the exclude pin definition from a pin, use the remove_clock_tree_exceptions
-exclude_pins command.

For example, to remove the exclude pin definition from pin U2/CLK, enter
icc_shell> remove_clock_tree_exceptions -exclude_pins  [get_pins U2/CLK]

For another example, assue that a clock tree also drives combinational logic,

Explicit Exclude Pin


To exclude all branches of the clock tree that fan out from this point, enter

icc_shell> set_clock_tree_exceptions -exclude_pins  [get_pins  U2/A]

Specifying Float Pins
Float Pins are clock pins that have special insertion delay requirements. ICC adds the float pin delay ( positive or negative ) to the calculated insertion delay up to this pin.

To specify a float pin and its timing characteristics, use the following
set_clock_tree_exceptions options:

1.  -float_pins [get_pins pin_list]
2.  -float_pin_max_delay_fall max_delay_fall_value
3.  -float_pin_max_delay_rise max_delay_rise_value
4.  -float_pin_min_delay_fall min_delay_fall_value
5.  -float_pin_min_delay_rise  min_delay_rise_value

For example, to define an active-low float pin ( so that clock tree synthesis considers only the falling edge), enter
icc_shell>set_clock_tree_exceptions -float_pins [get_pins U1/CLK] \
      -float_pin_max_delay_fall 0.10 -float_pin_min_delay_fall 0.08

The clock tree exceptions report ( report_clock_tree -exceptions) shown the float pin values only for the active edge. For Example, the report for the float pin defined above shows

Explicit sync pins:  1
      (F) U1/CLK    ( -0.100 -0.080 )

The float pin delay values can be either positive or negative, depending on your timing requirements. To increase the path delay to a pin, specify a negative float pin delay. To decrease the path delay to a pin, specify a positive float pin delay.

## Specifying a negative float pin
icc_shell> set_clock_tree_exceptions -float_pins U1/CLK \
   -float_pin_max_delay_rise -0.5 -float_pin_max_delay_fall -0.5

##Specifying a positive float pin
icc_shell> set_clock_tree_exceptions -float_pins U4/CLK \
   -float_pin_max_delay_rise 0.5 -float_pin_max_delay_fall 0.5

Float Pin Timing




To remove the float pin definition from a pin, use the remove_the_clock_tree_exceptions -float_pins command.

Specifying Don't Touch Subtres
In some cases you will want to preserve a portion of an existing clock tree. You need to do this, for example, when two clock networks share part of some clock logic behind a multiplexer. The portion of the clock tree that is preserved is called a don't touch subtree.

To specify a don't touch subtree, specify the root pin of the don't touch subtree by using the set_clock_tree_exceptions -dont_touch_subtrees command.

Although ICC does not make any modifications to the don't touch subtree during clock tree synthesis, it does propagate the clock tree attributes and the nondefault routing rules beyond the don't touch subtree. To prevent propagation of the clock attributes and the nondefault routing rules, set the
cts_traverse_dont_touch_subtrees variable to false.

ICC considers the sinks in the don't touch subtree when balancing clock delays and computing the clock skew.

To remove the don't touch subtree exception from a pin, use the
remove_clock_tree_exceptions -dont_touch_subtrees command.

Specifying don't Buffer Nets
In some cases you might be able to improve the results by preventing ICC from buffering certain nets ( ICC still performs global prerouting on these nets).

Note:
During clock tree synthesis, the don't buffer nets exception has priority over the clock tree design rule constraints. However, the clock tree specification in the clock tree configuration file has priority over the don't buffer nets exception.

To specify nets that should not be buffered, sue the set_clock_exceptions -dont_buffer_nets command.

For example, to specify net n1 as a don't buffer net, enter
icc_shell> set_clock_tree_exceptions -dont_buffer_nets [get_nets n1]

To remove the don't buffer net exceptions, use the
remove_clock_tree_exceptions -dont_buffer_nets command.

For example, to remove the don't buffer net exception from net n1, enter
icc_shell> remove_clock_tree_exceptions -dont_buffer_nets  [get_nets n1]

Specifying Don't Size Cells
During clock tree synthesis and optimization, ICC ignores the dont_touch attribute on cells and nets.
To prevent sizing of cells on the clock path during clock tree synthesis and optimization, you must identify the cells as don't size cells.

To specify cells that should not be sized, use the set_clock_tree_exceptions -dont_size_cells command.

For example, to specify cell U1/U3 as don't size cell, enter
icc_shell> set_clock_tree_exceptions -dont_size_cells [get_cells U1/U3]
To remove the don't size cell exception, use the remove_clock_tree_exceptions -dont_size_cells command.

For example, to remove the don't size cell exception from the cell U1/U3, enter
icc_shell> remove_clock_tree_exceptions -dont_size_cells [get_cells U1/U3]

Specifying Size-Only Cells
During clock tree synthesis and optimization, size-only cells can only be sized, not moved or split.
If a size-only cell overlaps with an adjacent cell after sizing, the size-only cell might be moved during the legalization step. To specify size-only cells , use the
set_clock_tree_exceptions -size_oly_cells command.

For example, to specify cell U1/U3 as a size-only cell, enter

icc_shell> set_clock_tree_exceptions -size_only_cells [get_cells U1/U3]

To remove the size-only cell exception, use the remove_clock_tree_exceptions -size_only_cells command.

For example, to remove the size-only cell exception from cell U1/U3, enter
icc_shell> remove_clock_tree_exceptions \
 -size_only_cells [get_cells U1/U3]

Preserving  the clock Pins of Existing Hierarchies
------------------------------------------------------------
In some cases, clock tree synthesis and optimization might cluster clock sinks from hierarchies to create new clock pins. You can prevent the clustering of clock sinks in the desired hierarchies that have clock sinks in other logical hierarchies by using the
set_clock_tree_exceptions -preserve_hierarchy command, so the clock pins of the desired logical hierachies are preserved.

For example, suppose your design has two hierarchical block instances L1 and L2 in the top level, and each block has clock pins CK1, Ck2, and CKO,

Hierarchical Clock Pins,
To preserve pin CK1 in cell instance L1, enter
icc_shell> set_clock_tree_exceptions -preserve_hierarchy [get_pins I1/CK1]

To preserve pins CK1, CK2, and CKO in cell instance I2, enter
icc_shell> set_clock_tree_exceptions -preserve_hierarchy [get_cells I2]

To preserve pins CK1, CK2, and CKO in cell instances L1 and L2, enter
icc_shell>set_clock_tree_exceptions -preserve_hierachy [get_references -hierarchical REG_A]

To remove the preserve-hierarchy exception, use the
remove_clock_tree_exceptions -preserve_hierarchy command. If the specified hierarchical pins or cells contain any leaf pins or cells that are not in the clock network, the tool issues a warning message.