Australia's leading electronics news website

News

Thursday 12 July 2007

Taking a bite out of power: techniques for low-power-ASIC design

Hernan Alcerreca

until recently, low-power-digital-IC design has been an area for specialist or guru IC designers. However, most IC-design engineers will have to learn a variety of low-power-design techniques as ASICs and SOCs (systems on chips) increasingly target processes of 130 nm and below.

At 130-nm processes, foundries started to employ new techniques and materials, such as low-k dielectrics and copper, in silicon processes to increase design performance. However, smaller geometries, scaled thresholds, and unscaled voltages produced smaller, speedier ICs but produced a nasty side effect: leakage, or static power.

By the 90-nm node, power management started to become a huge concern, and, at the 65-nm node, low-power-design techniques are a must.

“As we scale technology nodes, clearly we have to lower VDD [supply voltage], because there is a quadratic relationship: The power dissipation is proportional to VDD2,” said Mike Keating, a fellow at Synopsys. “If we just scaled the devices and did not scale VDD, we’d be doubling the power density every generation. We can’t do that, so we’ve been lowering VDD.”

When the semiconductor industry lowered supply voltage over the last few nodes, each reduction also lowered the transistor threshold voltage, which keeps drain-to-source current at a level that allows ICs to charge their output capacitors and thus increase the performance of ICs in those nodes.

However, as the industry further decreased threshold voltage at each node, it forced the subthreshold leakage to also increase at each node. “As we’ve been shrinking processes, the gate-oxide thickness is so skinny now, gate leakage is increasing exponentially,” said Keating. “Somewhere around 65 and 45 nm, you end up with dynamic power equal to subthreshold current and equal to the gate-leakage current. We have a train wreck; only, in this case, we have three trains—dynamic power, subthreshold leakage, and gate leakage—headed to exactly the same spot.”

In the past, overall power density has essentially stayed the same for every process reduction. But, in 2005, the ITRS (International Technology Roadmap for Semiconductors) released a study that indicated that at the 65-nm node, dynamic-power density and leakage power would increase by 1.43 and 2.5 times, respectively.

At the 45-nm node, the ITRS predicts, dynamic-power and leakage-power density will increase to two and 6.5 times, respectively. In reality, designs in high-speed 65-nm processes lose as much as half their power to leakage.

Many in the industry believe that, by the 45-nm node, ICs will lose as much as 60% of their power to leakage (Figure 1). “Until recently, we’ve been dealing with power by simply making different trade-offs in silicon,” said Keating. “That option is sort of disappearing. Using these design techniques is no longer an option; it is a requirement.”

To deal with power management, the electronics community is employing new low-power techniques and materials on several fronts (Figure 2). Fabs have introduced multithreshold, multivoltage transistors; SOI (silicon-on-insulator) and low-k materials; body, or “back, biasing; and copper-metal and SiGe (silicon-germanium) substrates.

Meanwhile, chip architects and software designers deal with low power by performing smart-hardware-versus-software trade-offs; by implementing power-savvy operating systems, introducing more hibernation modes into system design; and by more selectively granting memory access. IC designers are also employing several techniques to lower the power of their designs. The most popular techniques for low-power design include multithreshold design, multivoltage design, clock gating, power-aware memories, and power gating.

Jerry Frenkil, chief technology officer, vice president, and general manager of Sequence Design’s Silicon Business Unit, notes that low-power design is all about reducing one or several parts of the power equation: Dynamic power plus leakage power equals the device’s overall power consumption. Dynamic power is the power a device consumes when a user is employing it for its intended purpose, and leakage power is the power that leaking transistors waste (Figure 3).

Custom and circuit designers over the years have employed several techniques to lower the power of their designs, according to Kurt Keutzer, a professor at the University of California, Berkeley, who is a co-author and editor of Closing the Power Gap Between ASIC & Custom: Tools and Techniques for Low Power Design.

However, he said, the power consumption of today’s typical ASICs may be three to seven times that of custom ICs fabricated in process technology of the same generation. He and one of the book’s co-authors, David Chinnery, estimate that, by employing low-power-design techniques, users can improve energy efficiency of their ASIC designs by a factor of two to three. “The main finding is that ASIC designers are leaving a lot of power savings on the table,” said Keutzer.

But there’s no silver bullet in low-power design. “There are a lot of techniques… Different methods attack different portions of the power equation. They usually have some overhead of some sort,” said Frenkil, also a contributor to the book. “Some may have no overhead, others may affect you in area, and others may affect you in speed. One of the critical things about low-power design is understanding the impact of what you are facing and how you are going to deal with it.” Indeed, users will have to mix and match many of these techniques to come up with a low-power methodology that works for them.

Multithreshold design

About five years ago, when excessive power consumption became a problem, foundries started to offer libraries for low-power and high-speed design. For example, TSMC (Taiwan Semiconductor Manufacturing Co) offers a standard, or nominal, library; a high-speed library; and a low-power library, each having several types of cells.

For instance, each of TSMC’s libraries includes low-threshold-voltage, high-threshold-voltage, and threshold-voltage-with-MTCMOS (multithreshold-CMOS) cells.

Multiple-cell libraries help designers deal with both leakage and dynamic power. To deal with leakage power using multiple types of cells, designers today employ multithreshold design. “Because we’ve played so many games with VDD and VTH [threshold voltage], we can’t create one library that is going to work for an entire design, because you have designs that are speed-critical, and, for the areas that are not speed-critical, you want to reduce the leakage,” said Keating.

A multicell library typically comprises at least two sets of identical cells that have different threshold voltages. Those with higher threshold voltage are slower but have less leakage; conversely, the cells with lower threshold voltage are faster but leak.

“It is a nonlinear relationship,” said Keating. “Conceding a little bit of speed, you get a very dramatic reduction in leakage.” Frenkil said that a high-threshold-voltage cell typically has 50% less leakage than a low-threshold-voltage cell with no bad side effects, such as area gain.

For most applications, designers typically use a low-threshold-voltage library for a first pass through synthesis to get maximum performance and meet timing goals. They then determine the critical paths in their design—that is, the path or paths in the design that require the highest performance.

They then try to locate areas that don’t require low-threshold-voltage cells and swap out low-voltage cells for high-voltage cells to reduce overall power and leakage of the design. Frenkil notes that this approach represents the most common use of the multithreshold-design technique because most applications have timing as a first requirement, low-threshold-voltage libraries run faster through synthesis, and synthesis tools ultimately produce smaller design areas from these libraries.

Synthesis tools tend to run longer and produce larger design areas when running heavy doses of high-threshold-voltage cells.

However, in some wireless-system applications, power is the main goal, and area increases are less of an issue. In those cases, some designers first run synthesis with high-threshold-voltage cells, find the critical path, and then swap out the high-voltage cells with low-voltage cells until they reach their performance goal.

Multivoltage design

Although multithreshold design helps engineers minimise leakage of their designs through the use of multiple libraries, another technique, multivoltage design, helps designers control dynamic power.

Similar to multithreshold design, multivoltage design enables designers to give the critical paths and blocks in their designs access to maximum voltage for the process and specification, but the designers then reduce the voltage for less power-hungry blocks.

For example, Keating said, a processor block may require a clock speed of 500 MHz, but a USB core may require only 30 MHz to comply with the USB protocol and thus require less voltage to run. So, if designers give the USB core only the power it needs, they can drastically reduce the overall power the design consumes.

To implement the method, designers traditionally put level shifters between blocks that are running at different voltages. “If you have a 0.9V region on your IC design that is sending a signal to a 1.2V region, you have to put a level shifter between the two regions so you can boost it to the swing in voltage and control timing,” Keating said.

Although a fairly simple concept, its implementation is more complex. First, designers must get used to dealing with multiple voltages on a die.

“We are really trained as engineers that a chip has just one power supply, and now you have to deal with some complications,” said Keating. There are also some fairly significant challenges on the tools front. Most commercial synthesis and physical-design tools can insert level shifters and can perform multivoltage, but creating RTL is a problem.

“HDLs don’t yet have a mechanism for describing power connectivity,” said Keating. This lack is one area that EDA vendors are addressing by trying to implement a low-power standard.

Another emerging method that started in custom design but is making its way into ASIC design is the use of parallelism along with voltage scaling. In their book, Chinnery and Keutzer describe this technique. Keutzer said that people at first dismissed it as impractical but that it is now getting serious attention.

“You parallelise to get the performance up and then scale voltage down to reduce the power and energy,” said Keutzer. “If you look at dynamic power, voltage is clearly where the biggest gains will be. So, how do you get the voltage down? Given a timing constraint—2 nsec, for example—you first overachieve your timing objective. In particular, you add parallelism to get the critical path down to 1.2 nsec.

“Then, you can scale down the voltage to relax back to the 2-nsec cycle time you need to achieve. The decrease in voltage more than compensates for the increase in area.”

Clock gating

Probably the oldest and most tried-and-true technique for reducing power is clock gating. One-third to one-half of an IC design’s dynamic power is in the chip’s clock-distribution system. “It’s a pretty simple concept: If you don’t need a clock running, shut it down,” said Keating.

Today, the two popular methods of clock gating are local and global (Figure 4). If you feed old data to the output of a flip-flop back into its input through a multiplexer, you typically need not clock again. Therefore, you can replace each feedback multiplexer with a clock-gating cell that clocks the signal off. You would then use the enable signal that controls the multiplexer to control the clock cell to clock the signal off. In the old days of digital design, designers had to manually perform this task, but any commercial synthesis tool worth its salt can now automatically do it.

“The tools are all set up for that now, so they will go in, automatically look for multiplexers, and, if there is a feedback multiplexer, they’ll replace it with a clock-gating cell,” said Keating.

“When you start talking about 32-bit registers, you can get significant savings using this technique.”

The other popular approach of clock gating, global clock gating, is to simply turn off the clock to the whole block, typically from a central-clock-generator module. This method functionally shuts down the block, unlike local clock gating, but even further reduces dynamic power because it shuts down the entire clock tree.

Power-savvy memory

Another popular technique for lowering both dynamic power and leakage is to use power-aware memories. In its simplest form, the technique involves shutting down segments of a memory array when they are not in use.

Another technique in this category is body-biasing memories. In this method, designers reverse-bias a memory when it is not in use, which essentially raises the threshold voltage and in turn slows leakage.

One more technique gaining popularity is to use multimode power for memories. In this technique, designers employ memory with several power modes. Many designs employ dual-function memories so that, when the CPU accesses a memory to read or write data to run a main application, the memory receives full access to power to perform the operation.

However, when the memory is not required to read or write, designers can program the memory to power down to a level at which the memory gets only enough power to retain its memory content.

Power gating/MTCMOS

Perhaps the hottest new methods for low-power design are power gating and MTCMOS (Figure 5). Like voltage gating, power gating involves temporarily shutting down blocks in a design when the blocks are not in use.

And, like voltage gating, the technique is complex. “The neat thing about the other techniques is that they are pretty much all transparent to the design engineer,” said Keating.

“When I’m writing my RTL, I don’t have to think about multithreshold, multivoltage, clock gating, or power-aware memories because someone else downstream has to worry about it. But with power gating, I have to worry about it at the RTL. I have to design a power controller that is going to control what blocks I need to shut down and when, and I have to think about what voltage I’m going to [need to] run different blocks.”

Traditionally, two methods for power gating are fine-grained and coarse-grained. In fine-grained power gating, designers place a switch transistor between ground and each gate. This approach allows designers to shut off the connection to ground whenever a series of functions is not in use.

“You do that [technique] with every cell in the library,” said Keating. “At first, people really liked fine-grained power gating because it is fairly easy to do power characterisation of each cell, but the problem is the area hit is very significant: two to four times larger.”

Designers can also mix and match cells, having some power-gated and others not. Cells with high threshold voltage need not use power gating. For the most part, the power penalty is just too large, and many design groups are instead using coarse-grained power gating, in which designers create a power-switch network—essentially, a group of switch transistors that in parallel turn entire blocks on and off. The technique does not have the area hit of the fine-grained technique but is harder to characterise on a cell-by-cell basis.

Sequence Design’s Frenkil said that a compromise—medium-grained power gating—is also starting to emerge in the design community. In this method, he said, “Power-gating cells will power small blocks individually. … If you look at a high-performance, 65-nm process, the leakage can easily be 40 to 50% of your total power design.

“If you are designing a high-performance chip, you have to deal with an enormous amount of leakage, so people have several separate power domains controlled individually. I’ve seen one modestly sized chip that has 20 power domains; if you scale that up to a leading-edge chip, it will have over 100 power domains.”

That number would be too hard to control with either a true fine-grained or a true coarse-grained technique. Of all the techniques, power gating has the most promise, said Frenkil. “It reduces leakage more, and it will scale well into the future, where things like back-biasing will not,” he said.

EDA vendors are feverishly attempting to automate the power-gating technique. The warring low-power standards, UPF (Unified Power Format) and CPF (Common Power Format), both aim to help design teams more effectively implement power-gating methods.

Keating notes, for example, that, in UPF design, engineers still must design the power controller in RTL, but several tools help with the insertion of the power mesh, isolation cells, and retention registers into a design.

“Instead of doing it in RTL, you can do it in a UPF command language and specify a certain number of blocks to be isolated,” said Keating.

“In one line, you can do what it would take many lines of RTL to do. The tools are smart enough to take those commands and insert them at the appropriate levels. Some get inserted during synthesis; others get inserted during place and route.”

The method requires either manual or tool-automated insertion of isolation-retention flip-flops. “When you shut down a block, and its outputs go to a block that is still powered up, you have to worry about those power-down nodes floating, and they can float to the threshold voltage and create unwanted currents downstream,” said Keating.

“You have to put isolation cells on those outputs and clamp that output to a one or a zero, so nothing gets hit by a floating current downstream.”

The method also requires the use of retention flip-flops. Keating notes that one of the problems with shutting down a block is that the block needs to restore or maintain all its states. To achieve this goal, designers can use retention flip-flops, in which the main part of the flip-flop has a low threshold voltage—that is, fast but leaky—and it sits beside a balloon register of high-threshold-voltage, low-leakage cells. “

Just before you shut down a block, you put the output of the flip-flop into a balloon register,” said Keating. “Then, everything but the balloon register gets powered down to maintain the states. When the block powers back on, the balloon register dumps everything back on the main flip-flop, which helps quickly power the block up.”

EDA to the rescue?

Frenkil notes that, although EDA vendors offer a wide range of tools to help designers implement low-power-design techniques, the EDA industry also offers power-integrity tools to help designers consider the effects of design decisions on power. Power-integrity tools perform voltage-drop analysis, voltage-derated timing analysis, noise-margin analysis, and power-bus sizing.

Many vendors offer low-power tools to attack the problem from every angle. According to Keutzer, the EDA industry has yet to adequately address some problems. For example, the industry could provide tools that ease ASIC designers’ ability to implement microarchitecture techniques, such as pipelining; to more efficiently lay out clock networks; and to more effectively use transparent latches.

However, he notes, no EDA tool can solve everyone’s power problem “It’s not about home runs; rather, it’s about a lot of singles,” said Keutzer.

Designers must become familiar with a mix of low-power-design techniques and should also investigate which tools will help them achieve their power goals. The EDA industry is trying to market a healthy field of tools to help designers control power.

Eventually, vendors hope to provide design flows to allow designers to make trade-offs among timing, power, signal integrity, and, eventually, even thermal analysis.

Top semiconductor companies, design houses, and EDA players are trying to establish a common power format. Even with the current field of EDA tools and the rough beginnings of integrated low-power flows, however, the EDA industry still has much work to do before it can solve the power problem.

References

Chinnery, David, and Kurt Keutzer, Closing the Power Gap between ASIC & Custom: Tools and Techniques for Low Power Design , ISBN: 978-0-387-25763-1, Springer, June 2007.

Pokhrel, Khem C, Physical and Silicon Measures of Low Power Clock Gating Success: An Apple to Apple Case Study, proceedings of Synopsys Users Group, San Jose, CA, 2007.

Santarini, Michael, Thermal integrity: a must for low-power-IC digital design," EDN, Sept 15, 2005, pg 37.

Leave a comment

Enter the code shown:

Newsletter sign up

Sign up to receive the latest breaking news

News barometer

More acquisitions in 2009? Will the economic downturn spark a new wave of acquisitions in the electronics industry in 2009?
 
77%
 
23%