Tao On Twitter





    Tuesday, October 30, 2007

    It's a Bird, It's a Plane...It's a Proprietary Processor Architecture ! : If ARM's Microsoft, Who is Red Hat?

    How are third-party ARM-compatible processor cores like dodos? In both cases, there are none in existence and those that were alive were hunted down and killed (Here lies PicoTurbo...). Take a look around. There are no alternative vendors for your favorite processor core. That's because most of the proprietary cores are so well-protected that its almost impossible to meet the ISA spec and not violate a patent. This issue presents a problem. Processor cores are to ASICs what operating systems are to computers : a platform. Since those that control the platform make the rules and the money (just ask Bill), there's a pitched battle between the IP companies to dominate the market. While market shares of individual companies such as ARM and MIPS go up and down, proprietary processor cores as a whole dominate the ASIC landscape. At this point, you may ask yourself : What's the problem with that?

    • Freedom : Freedom to modify, share and improve processor cores and generally enrich mankind with your knowledge.
    • Safety : Safety in the knowledge that your chosen architecture is non-proprietary and, thus, is not controlled by any one entity. Further, its non-proprietary nature allows you to benefit from competition among vendors.
    • Money : The business model for processor cores companies is pretty standard. The developers of the core get a percentage of sales revenue of devices built on top of their core. If you adopt a free (as in beer and freedom) architecture, you save those dollars for yourself.
    Significant obstacles have to be overcome before any open architecture can match up to the proprietary ones.
    • Ecosystem : A solid ecosystem is essential to the success of any platform. Proprietary processors enjoy unrivaled ecosystems that are not even close to being matched by open architectures such as Sparc.
    • Support : Processor core companies know their cores inside out. They spend their lives developing and advancing processors and the associated tool chains. It's hard to find that kind of laser-beam focus and support in the case of open architectures.
    • Marketing : This is an important issue. There are no evangelists with deep pockets broadcasting the stability and maturity of open architectures. Linux has RedHat, IBM and Sun to toot its horn. What of hardware?
    • Image : The image of open source software took some time to go from hacker's science experiments to rock-solid enterprise applications. The image of open source hardware is yet to undergo a similar transition.
    The shortcomings of the open architecture can be overcome with the backing of solid profit-driven companies that align their business with open source processor architectures. Like I said, what we really need is a Red Hat of processor cores...



    Tags : , , , , ,

    Monday, October 29, 2007

    The DFT Arms Race: Technological Convergence of Test Solutions From Magma and Synopsys

    Magma's Talus ATPG has put Magma very close (in technological terms) to matching Synopsys's test solutions. The first thing that struck me about the Talus press coverage was the high level of similarity between tool features touted by Magma and Synopsys.

    #1 : Combinational On-Chip Compression
    . Synopsys projected Adaptive Scan's low-overhead fully-combinational architecture as revolutionary in comparison to sequential compression technologies from Mentor's TestKompress. Now, Magma claims their ATPG-X solution consists of a "stateless broadcaster" and a "combinational compactor" duo.

    #2 : Slack-Aware Delay Testing. Propagating faults along the longest path in transition patterns rather than the easiest path was something Synopsys had been working on for some time now. The patent for this technology was filed in 1999 and granted in 2002! Yet, both Magma and Synopsys give press releases at the same time for this feature.

    #3 : Concurrent Fault Detection. This one is just plain spooky. First, at SNUG 2007 Bangalore, we have a paper ( Concurrent Fault Detection : The New Paradigm in Compacted Patterns) on detecting multiple fault models using one pattern set using TetraMax. Now, it is claimed that "in Talus ATPG, when one fault model is targeted, other fault models are automatically simulated".

    #4 : Just about Everything Else. Imitation , it is said, is the sincerest form of flattery. That said, imitation is also the first step in having the original eat your dust. From supporting industry standard STIL to diagnosis of compressed patterns, it appears that Magma has designed Talus ATPG to capitalize on Synopsys's success with TetraMax and DFT-Max.

    The big difference, the one Magma is counting on, is the level of integration between test and place-and-route. It remains to be seen if this factor will be enough to overcome the dominance of DFT-Max and TetraMax.

    Tags : , , , ,

    Friday, October 26, 2007

    Divide and Conquer : The Case for a Distributed EDA Future

    In two of my previous posts, I had dealt with the issues of machines for EDA tools. Need for Speed was about tuning machines to accelerate EDA tool performance while It's Not What You've Got had some ideas on optimizing machine utilization in the ASIC enterprise. Now, it's EDA's turn. My argument is that, if we look at the trends of designs and machines, distributed EDA algorithms and tools will start to dominate over non-distributed ones (multi-threaded or not). In a nutshell, my argument is : "Speed is no longer the bottleneck for design execution, memory is".

    Consider the following:

    • A : Designs are constantly growing (increased gate count and placeable instances).
    • B : The data structure for a cell, net or design object is also growing (as tools start tracking and optimizing designs across multiple degrees of freedom).
    • A + B : The memory requirement for designs are growing.
    • C : Commercial servers and motherboards don't have infinite slots for RAM. I think the current figure is around 16GB per CPU.
    • A + B + C : There might be a design that will not fit on a server in one shot!
    • D : A 64GB (4-CPU) Opteron is not 8x more expensive than a 8GB (1-CPU) opteron; It's 19x more expensive!
    • A + B + C + D : Even if a design could fit on one server, it's going to be a very expensive design to execute!
    What will you do when your sign-off timing, simulation or extraction run on your mammoth cannot fit on one server?

    Distributed EDA will surely be slower than their non-distributed counterparts, but can one not build an EDA tool that runs on 19 single-CPU opterons (with a total of 19CPUs and 152GB of RAM among them) with the same efficiency as a 4-CPU opteron with 64GB of RAM? We have 19x margin in CPU resources and 2x margin in RAM resources!

    Tags : , ,

    Wednesday, October 24, 2007

    The Last Bastion II : Some Q&A with DeFacto President and CTO Dr. Chouki Aktouf

    In an earlier post, The Last Bastion Of DFT Tradition Falls : DeFacTo Technologies Introduces RTL Scan Insertion , I had listed some questions about HiDFT-Scan's capabilities. Dr. Chouki Aktouf (CTO, DeFacTo) was kind enough to answer these questions. Here's what Dr.Aktouf had to say about HiDFT-Scan.

    Q : The change of methodology proposed is quite drastic. What's the easiest way adopt this flow in a phased manner?

    The change in the overall test process is not that drastic; mainly you add DRC at RTL in conjunction with new checks (that now become possible) and displace gate-level scan insertion to RTL. HiDFT-Scan can be adopted progressively, IP-based for instance. The tool usage is also easy and accessible to RTL designers.

    Q: Is HiDFT-Scan physically-aware when it comes to scan insertion?

    Not physically aware, since we implement functional scan; but our scan is data-path sensitive. This ensures a good optimization during synthesis and P&R.


    Q: Other EDA vendors are very much ahead in the DFT game and have very compelling DFT technologies. Synopsys has adaptive scan. Mentor has TestKompress. Is there a way to use HiDFT-Scan with tools from these vendors?

    Absolutely. You can use HiDFT-Scan in conjunction with these tools. A typical example is TestKompress. Through HiDFT-Scan you can automatically access to TK and generate your final RTL including scan+test compression logic.

    Q: Will we get a chance to see HiDFT in action?

    If you are present at ITC, we will be pleased to talk to you at our booth #837.

    Tags : , ,

    Tuesday, October 23, 2007

    Triple Digits in Four Years : Open-Silicon Books 100th Design Win

    I'm happy to say that Open-Silicon has now won a total of 100 designs in a short span of just 4 years. These design wins are across the spectrum in terms of both applications (low-power wireless to high-performance cluster nodes) and processes (0.25u to 45nm). Here's what Pierre Lamond (of Sequoia Capital) had to say about the company and its achievement:

    "Open-Silicon has been an agent for change in the ASIC market from the moment they launched their innovative business model. Their ability to book 100 design wins in four years is a testament to the market's need for highly predictable and reliable custom silicon. Open-Silicon's skill at delivering on their customer's critical time-to-market requirements is what makes them one of the fastest growing companies in the market."

    Tags : ,

    The Last Bastion Of DFT Tradition Falls : DeFacTo Technologies Introduces RTL Scan Insertion

    It had to happen. Someone would eventually come up with a way to insert scan onto RTL. Turns out that someone is DeFacTo Technologies of France. The tool is called HiDFT-Scan.

    In some ways, this is a great idea:

    • Scan insertion would be fast (as it works with RTL)
    • Synthesis step between RTL ECOs and scan insertion is avoided
    • Scan insertion is process/technology independent and can be ported easily to new nodes
    But there are some questions that need to be answered:
    • The change of methodology proposed is quite drastic. What's the easiest way adopt this flow in a phased manner?
    • Is HiDFT-Scan physically-aware when it comes to scan insertion?
    • Other EDA vendors are very much ahead in the DFT game and have very compelling DFT technologies. Synopsys has adaptive scan. Mentor has TestKompress. Will we have to abandon these vendors or will DeFacTo play well with others?
    Tags : , ,, ,

    Monday, October 22, 2007

    It's Not What You've Got, It's How You Use It : Ideas For Cost-effective Server Farms


    The optimal utilization of resources is instrumental in improving the profitability of any enterprise. As a fabless ASIC company scales in size, the issue of automated/intelligent resource management becomes a bottleneck. You're too big to be able make snap decisions in response to immediate demand. There has to be a formal process of measuring resource requirements and subsequent acquisition of said resources. One such resource that requires intelligent estimation and acquisition are compute resources available to your engineers. The compute resources referred to are the sum total of compute power in the enterprise that are capable of executing EDA tools . These include both the dedicated high-end compute servers and often overlooked workstations that sit in every engineer's cubicle. It is true that machines will probably come third when it comes to expenses (after people and EDA licenses) but there is a lot of scope for more efficient machine usage because it is third in line! Some ideas to make the most of your servers:

    1. Get a queue. When resources are accessed through a load-sharing mechanism such as LSF or FlowTracer, it is a whole lot easier to analyze trends and increase utilization across all machines. Low-power desktops that do not usually perform anything more intensive than a screen refresh will, on the queue, be put to use on jobs that meet their memory limits and save your high-end servers for jobs that require them. A enterprise with a queue has the advantage of a scalable system. Your engineers see a single interface whether you have ten machines or a thousand.
    2. Choose Your Machines Carefully. With data on jobs and memory, one can can create a machine pool that reflects these requirements. Why is this important? Have a look at machine prices on Epinions. An 8GB single-CPU Opteron will cost you $900 a piece . On the other hand, a 64GB machine with 4 CPUS will cost you $17400. If most jobs take up less than 8GB of memory, you'd be better off choosing low-end servers instead of high-end ones. Why get a 64GB machine when you can get 19 8GB opterons instead?
    3. Spread it Out. Ensure that there's a mechanism to even out demand throughout the working day or week. One sign that you might need such a policy is that there don't seem to be enough machines during the day but most of your servers are idle at night! In the absence of such measures, you might end up with more machines that are acquired to meet just the peak load. By instituting a policy that ensures an even utilization across the day, you require lesser resources while utilizing those resources to the maximum.
    Tags : , , , , ,

    Sunday, October 21, 2007

    IP For Nothing and Chips for Free : Think Silicon offers free soft IP using a web interface

    One of the issues with Open Cores is its large number of GPL'ed IP. Using GPL in your design, under some interpretations, may require you to disclose your whole design. Can a company benefit from freely available high-quality IP and yet maintain its internally developed IP? Just when I thought such IP was impossible to find, Think Silicon comes in and creates a web-accessible free IP generator. To be fair, the modules offered are small but the impact is huge. See it here.

    Tags : , ,

    Tuesday, October 16, 2007

    Need For Speed : Optimized Servers and Platforms For EDA Tools

    You can never have too many Opterons. No amount of memory is ever enough. 64bit is no longer an option but a necessity. Sounds familiar? EDA tools push compute servers to their limits. Memory requirements are measured in GB and runtimes in days. Interesting, then, that the platforms that run these high-performance tools are the same machines, configurations and operating systems that are used by John Q. Server down the street for everything from email to file storage. If there's a Blackbird for gaming, should ASIC design engineers settle for anything less?

    Imagine, if you will, a platform whose components and operating system are tweaked to speed up EDA execution. A wishlist for an EDA-optimized platform:

    1. Blazing floating-point instruction sets with vector support (for those iterative crosstalk-aware timing runs or a router's calculations to simultaneously satisfy setup, hold, noise, crosstalk, DRC, Antenna, signal-EM....you get the idea)
    2. 100s of GBs of memory ( and a motherboard that supports that)
    3. Excellent graphics support (so that your place-and-route display doesn't hang so often)
    4. Optimized shared libraries that support graph partitioning algorithms and sparse matrix computations (that just about every tool will use)
    Ideally, we require a platform that accelerates EDA tool performance but does not restrict EDA tools from running on any other server. In other words, no proprietary libraries that only the accelerated server can use. Try locking people in and you just might be locking them out. All functions that are used in the accelerated server should be available for normal servers (they'd be slower, of course). One way I see is EDA tools would use a normal version of shared library objects on John Q. Server but the accelerated version would be much faster (with access to corresponding hardware) on the EDA-optimized platform.

    Some options I see of making this platform and the business of such platforms work:
    1. Create your own custom EDA compute server. A Tharas Hammer for general-purpose EDA, so to speak.
    2. Create a card for accelerating EDA tools, release a SDK for it, and have EDA vendors use it
    3. Creating your own card is still on the expensive side and possibly risky. Can we use off-the-shelf cards to build this accelerator? Use Nvidia's CUDA technology, for example?
    4. Work only on the OS and hardware. Tweak, tweak and then tweak some more. Use only off-the-shelf components and an open-source OS base to build a killer compute server. Sell the OS+Machine to all the ASIC companies at a slight premium over the standard package. Can you see all those deep red blade servers wall-to-wall in server rooms?

    Tags : , ,,