Tao On Twitter





    Showing posts with label Innovation. Show all posts
    Showing posts with label Innovation. Show all posts

    Sunday, December 07, 2008

    A Bigger Pie : Value Addition In the ASIC Ecosystem

    There were a couple of press releases this week in which companies attempted to add value to their primary products in an interesting way.

    #1 DFT + IP

    Virage Logic's bid for LogicVision might seem puzzling at first. Why would an IP company want to take over a DFT EDA vendor? My take is that Virage Logic intends to provide customers with the IP and the means to test them. The only problem bigger than IP integration is IP test! Poor documentation, absence of proper test-friendly structures and long verification cycles are commonplace when IP vendors provide just their IP and a test document of questionable quality. Using LogicVision's test and diagnosis technology, it's not hard to envision Virage's IP coming with fully integrated self-test structures that require just a little glue logic from the IP integrator for complete testability.

    #2 Open Source + ASIC Products

    How can an ASIC/System vendor increase the demand for its products? Realizing that the software built upon the hardware is the real driver for demand, XMOS is using open source to nurture an ecosystem around its hardware platform (programmable event-driven processor arrays). It doesn't hurt that it also allows XMOS to demonstrate the flexibility of its platform when it comes to applications. Xlinkers is a user-driven open-source community for software utilizing the XMOS platform. An open source community hosting a variety of (constantly maturing) applications is a huge vote of confidence for users considering XMOS products. Not only do customers get to appreciate the flexibility of the platform but they also get a head start by reusing the developed open-source components as building blocks for their own products.



    Tags : ,

    Sunday, November 23, 2008

    Desktop Supercomputers : The CUDA-enabled EDA (Near) Future

    In my first ever post, Need for Speed (way back in Oct'2007), I proposed that specialized EDA platforms could be built using Nvidia's CUDA technology. Since then, Nvidia's CUDA technology has a steadily increasing list of EDA adopters.

    • February 2008: Gauda uses Nvidia's CUDA platform (and distributed processing) to accelerate OPC by 200x.
    • April 2008: Nascentric announced the availability of a GPU-accelerated spice simulator (OmegaSim GX) using none other than Nvidia's CUDA technology.
    • August 2008: Agilent announced the use of CUDA technology for the acceleration of signal integrity simulations in their ADS (Advanced Design System) Transient Convolution Simulator.
    Is this a flash in the pan or is CUDA-enabled EDA here to stay? Why use CUDA?
    • C-Based SDK allows for easy porting of code to the CUDA platform
    • Ecosystem of tools and applications built on the CUDA platform. Nvidia's doing its part by hosting CUDA-based code design contests. Right now, you can read and download academic papers on the utilization of the CUDA platform for statistical timing analysis and graph algorithms.
    • Cost-effective computation is perhaps the biggest thing going for the CUDA platform. Where else can you get a 100x improvement in runtime for a mere $600?
    Despite these advantages, there are some missing pieces that would hold CUDA back from really taking the EDA world by storm. My wishlist for CUDA is the following:
    • Backward compatibility is key to a low-risk path to adoption. Without it, who's going to risk porting their code base to CUDA hoping that their customers will move to CUDA-enabled platforms? With backward compatibility comes a great hook: "Use our tools on your current platform but you can get a 100x improvement in runtime just by buying a PCI card". Right now, that's not the case. Nascentric, for example, offers OmegaSim and OmegaSim-GX as two separate tools. Wouldn't it be great if the same code could run on both platforms but one runs a whole lot faster because of CUDA?
    • Native support for distributed processing could bump performance up even higher. Graphics processors are built to solve "embarrassingly" parallel problems. It's not really much of a leap to distribute the workload amongst multiple graphics processors. The CUDA pitch (do more with less money) becomes that much sweeter.

    Tags : ,

    Saturday, September 27, 2008

    Silicon Biometrics : The OCV Authentication Solution

    I never thought I'd say this ever: There's finally a reason to appreciate OCV and we can all thank Verayo for it. In one of my earlier posts, I wrote about the growing problem of counterfeit ASICs (Will The Real ASIC Please Stand Up?)The variation of on-die device and metal parameters that is the bane of designers all over the globe has turned out to be useful in the most unique way. OCV, it seems, is the silicon equivalent of a fingerprint. Let us review the salient points about OCV:

    • On-die variations are always present in all devices
    • Although on-die variations obey statistics, individual variations are essentially random
    • The OCV solution space is huge. Assume that a single transitor can be one of a types and a single net can be any one of b types. If you have x transistors and n nets, a single ASIC can be any one of a *b * c *d possibilities.
    • The probability of two ASICs having the exact same characteristics is so small, it is essentially zero.
    The existence of OCV is half the solution. It sort of like saying I can uniquely identify a grain of sand by the fact that each of its trillion odd molecules is differently positioned when compared to any other grain of sand. For the solution to be complete, there has to be a feasible way of measuring OCV for a given device or its effects. Hmm, Why do I worry about OCV? what's the worst that can happen? I'm guessing the phrase "timing violations" are flashing in six foot neon letters in your head right about now.

    By creating a circuit with lots of reconvergent logic with very low (or zero margins) margin setup and hold paths, you can observe the effects of OCV. The values captured at the output will not only change with each device but also change with the input stimulus. Depending on the stimulus, the path taken through the reconvergent cone may or may not suffer from a timing violation. By observing the outputs for a set of random stimuli, each device can be uniquely identified.

    If you want to know more about this technology, it's official name is "Physically Unclonable Functions(PUF)" . For the academically oriented, the home page of Professor Srini Devadas (MIT/ Verayo Co-founder) has download links for all his published PUF papers.

    Tags : ,

    Thursday, June 19, 2008

    (Even More) Useful Skew : Teklatech Adapts Useful Skew Concepts To Close Dynamic IR

    Zero clock skew is not a necessary thing. As long as timing can be closed, clock skew is immaterial. This kind of thinking is what gave birth to the concept of useful skew. The use of useful skew in contemporary design is mostly restricted to timing closure. In this method, the arrival of clock edges at the launch and sink registers of critical paths are changed to increase the effective clock period available to the critical path. Since different registers have different clock latencies, one side-effect of this optimization is that all your registers don't transition at once.

    But is there a benefit from having all your registers transition at different times? Enter Dynamic IR (stage left). When a large number of transitions occur in a very short period of time (right after the active clock edge, for example), the power network is not in a position to supply such a large amount of current in such a short space of time. Result: large and localized instantaneous drops.

    Teklatech's FloorDirector uses useful skew concepts to spread out the register transitions by playing around with the clock edges and thus reduces the dynamic IR drop problem without affecting timing. Elegant, huh?

    Tags : ,

    Wednesday, February 27, 2008

    Distributed EDA Meets Accelerated Hardware: Gauda's OPC Points The Way

    In Divide And Conquer, I made some arguments for the inevitable dominance of distributed algorithms in the EDA industry. In Need For Speed, I argued for, among other things, using graphics card-based acceleration for EDA (Nvidia's CUDA technology).

    What happens when these two get together? Gauda's new optical proximity correction (OPC) tool. The tool not only uses existing graphic cards (from the likes of Nvidia and the erstwhile ATI) but also uses sophisticated distributed algorithms to accelerate OPC upto 200x (really? 200x??) faster. Rather than a flash in the pan, I'd say Gauda is the pioneer in a direction that is soon to be well-traveled by the EDA biggies.

    Tags : , , ,

    Thursday, February 07, 2008

    I Coulda Been A Contender : Some VLSI Ideas I Wish I Had Had (First)

    Getting out plan for the new year got me thinking about issues in the ASIC flow and what could (or should) be developed to make the flow better. It could be a new technology, a new tool or a new flow even. The hardest part of this is to be able to break out of the box and approach issues from a new angle. Naturally, at such times, I look back at some ideas I heard about and think "I wish I'd thought of that!" It's not that these ideas made their inventors rich but they did make me sit up and take notice because of their refreshingly different viewpoint. Here's hoping they make you think, too.

    • Wire Tapering : Minimize the delay of a wire by controlling its shape. Rather than the usual rectangular shape, the optimal shape (in terms of delay) for an interconnect is exponentially tapered. The wire will be thickest near the driver and thinnest near the sink. In a regular net, the capacitance of the end section of the wire is the same as the initial section of the wire. Since the end capacitance has to be driven through a large resistance (basically the resistance of the whole net), the driver "sees" this end capacitance as a large load. By tapering the interconnect, the capacitative load decreases with distance from the driver. Thus, the overall load seen by the driver for a tapered net is lesser than a rectangular one.
    • Configurable Processors : Adapt a processor's instructions to match the application. Most processors are jacks of all trades but masters of none. Though processor cores are not particularly good at anything, using them for implementing features has its advantages. There's low risk of functional bugs within the core itself. Fixing bugs and adding features is as easy as updating the firmware. This is the low-performance/low-effort solution. Custom RTL, on the other hand, is the high-performance/high-effort solution. Implemented features have high performance but verification requires a lot of effort and bug fixes and additional features will atleast require a respin. Configurable processors allow you to get the best of both worlds. By implementing an instruction set tailored to the end application, the performance of the core is increased while verification and feature updates and bug fixes are still relatively easy.
    • Channel-less Floorplan: Why not just route top-level signals through the block? Look at the channels in your hierarchical floorplan (the gaps between the blocks). You need these channels to be able to route top-level signals from block to block. Usually, these signals don't undergo logical transformations at the top-level so it's pretty much getting the signal from point A to point B using repeaters. In a channel-less floorplan, the repeaters don't go round or over the block; They go through the block. The idea is to create feedthroughs in blocks such that a signal can use these feedthroughs to get to the other side of the block faster. The benefit is that you save on the die area that was previously dedicated to channels.
    • Asynchronous ASIC Design: What if there were no clocks? Clocks are nothing but synchronization signals. At each clock edge, you are guaranteed that the input data is stable and valid. The problem with having a clock is that your design can run only as fast as the worst path in the design. Even if 99.99999% of paths run at 1ns , the last path running at 2ns requires you to run the entire design at 500MHz. Wouldn't it be great if the performance of the entire device did not depend on the worst path in the design? That's where asynchronous designs come in. Asynchronous ASIC designs do not require a clock. Instead of clock-based synchronization, these designs use handshaking, semaphores and other methods to exchange data. Each part of the device runs as fast as it can and the effective frequency of operation is no longer determined by the worst path in the design. Other benefits of asynchronous design include lesser power dissipation (no clock trees) and lesser dynamic IR drop issues (without clocks, transitions in the design are randomized).
    • Analog Computing: Forget binary and use (semiconductor) physics. There's something unnatural about digital design. In a world where everything sentient is analog, should computing be any different? The idea behind analog computing is to use physical laws for computation. Suppose you want to build an adder, currents equivalent to the numbers to be added are passed through a resistance. The voltage drop across the resistance is proportional to the sum. The square-law characteristic of the MOS transistor in saturation is used to create computation devices such as multipliers. By piggy-backing onto the mathematics of physical effects, it is possible to use analog for computation at a fraction of the delay, area and power of digital circuits.

    Tags : ,

    Wednesday, October 24, 2007

    The Last Bastion II : Some Q&A with DeFacto President and CTO Dr. Chouki Aktouf

    In an earlier post, The Last Bastion Of DFT Tradition Falls : DeFacTo Technologies Introduces RTL Scan Insertion , I had listed some questions about HiDFT-Scan's capabilities. Dr. Chouki Aktouf (CTO, DeFacTo) was kind enough to answer these questions. Here's what Dr.Aktouf had to say about HiDFT-Scan.

    Q : The change of methodology proposed is quite drastic. What's the easiest way adopt this flow in a phased manner?

    The change in the overall test process is not that drastic; mainly you add DRC at RTL in conjunction with new checks (that now become possible) and displace gate-level scan insertion to RTL. HiDFT-Scan can be adopted progressively, IP-based for instance. The tool usage is also easy and accessible to RTL designers.

    Q: Is HiDFT-Scan physically-aware when it comes to scan insertion?

    Not physically aware, since we implement functional scan; but our scan is data-path sensitive. This ensures a good optimization during synthesis and P&R.


    Q: Other EDA vendors are very much ahead in the DFT game and have very compelling DFT technologies. Synopsys has adaptive scan. Mentor has TestKompress. Is there a way to use HiDFT-Scan with tools from these vendors?

    Absolutely. You can use HiDFT-Scan in conjunction with these tools. A typical example is TestKompress. Through HiDFT-Scan you can automatically access to TK and generate your final RTL including scan+test compression logic.

    Q: Will we get a chance to see HiDFT in action?

    If you are present at ITC, we will be pleased to talk to you at our booth #837.

    Tags : , ,

    Tuesday, October 23, 2007

    The Last Bastion Of DFT Tradition Falls : DeFacTo Technologies Introduces RTL Scan Insertion

    It had to happen. Someone would eventually come up with a way to insert scan onto RTL. Turns out that someone is DeFacTo Technologies of France. The tool is called HiDFT-Scan.

    In some ways, this is a great idea:

    • Scan insertion would be fast (as it works with RTL)
    • Synthesis step between RTL ECOs and scan insertion is avoided
    • Scan insertion is process/technology independent and can be ported easily to new nodes
    But there are some questions that need to be answered:
    • The change of methodology proposed is quite drastic. What's the easiest way adopt this flow in a phased manner?
    • Is HiDFT-Scan physically-aware when it comes to scan insertion?
    • Other EDA vendors are very much ahead in the DFT game and have very compelling DFT technologies. Synopsys has adaptive scan. Mentor has TestKompress. Will we have to abandon these vendors or will DeFacTo play well with others?
    Tags : , ,, ,