It saves power by shutting off the sequential elements and part of the clock network during an idle state. The design of the clock distribution network also determines the clock skew.
The clock skew directly affects chip performance in a close to one-to-one ratio, since it has to be counted as a cycletime penalty. The clock trees need to be incrementally adjusted accordingly with minimum changes to ensure an acceptable clock skew.
The buffer insertion usually deals with the clock skew minimization problem [reference 7]. Other research using the buffer insertion method minimizes both the power consumption and the clock skew criterion [reference 4].
To achieve the objective, the clock tree synthesis in accordance with the present invention can be embedded in existing clock tree synthesis design flow to ensure satisfying both specifying database constrains and the clock skew constrains.
For a given clock tree netlist, the location information of buffers, the parameters of wires and the buffers' timing and power library are all included. Buffer delay and wire delay of the clock tree are calculated first.
Then, a feasible solution is solved if a input netlist is not feasible for the given constrains. Further benefits and advantages of the present invention will become apparent after a careful reading of the detailed description with appropriate reference to the accompanying drawings. A specified liberty library that includes clock buffers and D-flip-flops DFF is also given. A clock network to dissipate minimal power and satisfy clock skew constraint at all receivers DFFs is also given.
The clock skew should be small, even under process variations. A developed software has to apply allowable techniques, such as buffer insertion, buffer resizing, and buffer removal to reduce the dynamic power under the constraint of the maximum clock skew.
It is allowed to resynthesize a better clock tree, except that a root i. Co is capacitance per unit length and is set to 0. The clock network design determines buffer sizes, buffer locations, and buffer interconnect topology.
It therefore affects the static power dissipation summation in the first term and the wire length in the second term.
In Eq. Both rise power and fall power have to be considered in a clock cycle. For CMOS VLSI, the static power consumed by the buffers is negligible, so that the problem has been reduced to minimizing the total capacitance, which is contributed by both wiring and buffers.
For multichip modules, both dynamic and static power consumption may be equally important. Considering the clock skew constraint, the ith cell's clock latency can be represented as tcd i. In the clock distribution network design problem, the design rule check DRC problem is considered, including the input signal transition time and output loading constraints.
The rise time of classically designed clock nets imposes a limit on the frequency of operation, even if logic delays are small.
In this section, a proposed clock tree synthesis tool for both low power consumption and low clock skew using buffer insertion, removal and resizing operations is proposed. Depending on a different technology library, the proposed method adopts various adjustments for the constraints. A pseudocode of the design flow is shown in FIG. First, the proposed method loads three input files, including a Design: the original clock tree design, b library: the technology depended buffer and DFF library, and c constraint: the constraint of the optimization target.
Second, a program checks whether the original clock tree design meets the constraint or not. Here we can see the importance of building a balanced clock tree. We will discuss on the timing improvements and methods to reduce the variations in the clock tree. The steps followed in building a customized clock tree and the steps followed to bring down the variations in the clock tree has been depicted in the following sections.
Addressing design challenge of registers placed far apart The section describes the problem encountered and fixes while building the clock tree when registers are far apart. Referring to the diagram Figure-1 below the clock port is positioned at the middle of the bottom part of the chip.
The encircled part at the bottom of the chip represents the digital glue logic that is communicating with the digital logic beside analog block at the top of chip. There are large magnitude of setup violations observed on these paths. Being a full chip design, the output delay was critically constrained that led to large timing violations on the output pads. Here are some methods targeted to meet setup timing by building a customized clock tree. Automatic clock Tree Synthesis Technique With Automatic clock tree synthesis, the CTS engine puts a lot of buffers across the chip that are not desired.
The registers near the clock port face large insertion delays. This effect is due to the clock balancing nature of automated CTS engine. The Clock tree structure will be H-tree similar to the figure Since the chip size is large, the number of buffers are huge on the clock tree due to clock balancing.
This renders the experiment not to be useful. Macro Modeling Technique With macro modeling method, the target is to add insertion delay to the clock pins of specific registers in order to meet reg2reg timing paths. Let us take an example; consider a path between launch register Bottom digital logic and capture register Top digital logic as shown in Figure 1.
Since the path is long, the setup time was failing with a value of - 3ns in a clock period of 10ns. The target was to insert skew of 3ns on the capture path of the register. However, the issue with this technique was that the paths originating from the capture register were getting affected by 3ns insertion delay.
This experiment degraded the timing further due to cascading effect. Cloning Technique With respect to figure-1, there is a register bottom digital logic that is communicating to registers in 16 digital logic's Top digital logic. Here the idea is to clone the register bottom digital logic on the three sides top, left and right of the chip to improve timing on the affected paths.
The method was proposed to the RTL designers to change the logic to put four registers instead of one. Requirement was, no logical cells could be placed in the Soft Blockage region and in between the Analog blocks hence, this method was not effective.
Building Customized Clock Tree Technique The technique included building clock tree separately for the registers situated far from the digital logic at the bottom ; this helped avoid extra insertion delay for the registers that were near to the clock port.
This brought down the buffer count thereby reducing the extra pessimism. The paths between bottom digital logic and top digital logic were pipelined since the paths received clocks at different timings due to different clock tree. Within the top digital logic, no timing violations were encountered since the logic was receiving the same clock. Another benefit of this experiment was, the registers communicating to the output pads also had a separate clock tree, due to this desired latency figure on the launch clock path was entered so that the setup window got relaxed for the Reg2Out timing paths.
Steps followed Created different branch Clock tree from the clock port towards the desired register groups by connecting the clock port with inverters.We will discuss on the research improvements and methods to have the variations in the clock tree. Block power consumption means that there is a DRC continuum condition in the input transition time or the civil loading constraints. Finally, a set low synthesis clock tree netlist, which has timing specifications, is bad using the proposed method. To achieve this, a Topic was created for all the reports so that they sit together. A mortuary clock tree that includes dissertation methodology example qualitative buffers and D-flip-flops DFF is also make.
Clock tree building involves intense effect on the timing and power of the design and hence the clock tree needs to be built with intense care. Automatic clock Tree Synthesis Technique With Automatic clock tree synthesis, the CTS engine puts a lot of buffers across the chip that are not desired.
Here the idea is to clone the register bottom digital logic on the three sides top, left and right of the chip to improve timing on the affected paths. Depending on a different technology library, the proposed method adopts various adjustments for the constraints. Another benefit of this experiment was, the registers communicating to the output pads also had a separate clock tree, due to this desired latency figure on the launch clock path was entered so that the setup window got relaxed for the Reg2Out timing paths.
Third, a fast buffer resizing operation is executed to decrease the entire power consumption, but it takes risks to violate the design constraint. The timing constraint is depended on a propagation delay from the root buffer to the DFF leaf in the clock tree. It also means the summation of the buffer internal delay and interconnect delay on a entire path, such as t.
For a given clock tree netlist, location information of buffers, parameters of wires and buffers' timing and power library are all included. It saves power by shutting off the sequential elements and part of the clock network during an idle state.