Book Review: The Pentium Chronicles

The Pentium Chronicles
The People, Passion, and Politics behind Intel's Landmark Chips
by Robert P. Colwell
John Wiley & Sons, 2006.

Bob Colwell was the lead architect of the Intel P6 project, which was eventually released as the Pentium Pro processor. The marketing name suggests only a small evolutionary improvement over the original Pentium, but the engineering reality is far different. The original Pentium (P5) was designed at Intel’s original facilities in California, by many of the same engineers that had worked on the 80486 and 80386. The P6, by contrast, was designed by a brand new team in Oregon, charged with the weighty task of securing Intel’s dominance of the microprocessor world by bringing the full range of RISC techniques to the x86 platform. The first engineer to join that project was Bob Colwell.

The P6 was ultimately released in 1995 as the Pentium Pro, which then served as the basis for the Pentium II and Pentium III processors. Intel then moved to the new Netburst architecture for the Pentium 4, but discovered that it had run into a thermal dead-end in chip design. Notebook computers now outnumbered desktops, and Intel again turned to the power-efficient P6 architecture for the Pentium M mobile processor. Not until 2006, when the Core microarchitecture was released, would the P6 truly be supplanted on Intel’s flagship chips. Eleven years is an eternity in the fast-moving computer industry. P6 thus carried Intel through an entire era, ultimately earning tens of billions of dollars.

By any criteria, then, P6 was a significant milestone in Intel history, and in the history of computing. It was a large-scale engineering project, truly breathtaking in scope. With hundreds, even thousands of engineers, how do you allocate your engineering resources so that they’re all pulling their weight and contributing to the project? How do you schedule such a large project, given the uncertain nature of creative work? How do you test the project adequately so that you do not end up with an expensive recall? How do you resolve the design issues that inevitably come up in an engineering project? How do you keep the project moving smoothly from phase to phase?

Colwell tackles these issues in The Pentium Chronicles. About half the book is spent generally discussing the management of a large engineering project. Although there are some interesting anecdotes in this first part, most of it will seem like common sense to anyone who’s worked in a tech company on a large project. The really interesting details are found in the second half of the book, in which Colwell discusses specific engineering problems, the typical workday at Intel, and interactions between coworkers. This is what made the book worthwhile for me.

You do not need to be a computer architect to appreciate the book, but you will appreciate the book more if you have a basic understanding of computer architecture. I found Colwell’s technical discussions to be written at just about the right level for a software developer who has only taken one introductory course in computer architecture. Topics are discussed in sufficient detail so as not to bore you with vague handwaving, but not so much that you need to be a chip designer to understand it.

Technological context

When Colwell arrived at Intel in 1990, the company had a dominant position in the desktop processor market. This dominant position had come about fairly recently, for IBM had a policy of dual-sourcing its components, and Intel was forced to license the 8088 and 80286 to its future archrival AMD. In 1985, Intel finally took a stand with the 80386 chip, refusing to license out the chip’s design. The proliferation of PC clones gave Intel the leverage it needed against IBM. Compaq became the lead customer for the 386, and IBM had no choice but to join in. Having seized control of its destiny, Intel then aggressively marketed the 386, flooding the computer trade press with ads featuring the numbers “286” crossed-out.

Yet at the time, it seemed that the future of computing did not lie with x86. Computer engineers at top research universities had developed a new concept for processors: the Reduced Instruction Set Computer (RISC). By discarding the complex instruction sets of legacy processors, RISC chips could use exotic architectures to execute a sequential list of instructions in parallel, speculatively, and out-of-order — producing results much faster than executing the same list of instructions sequentially and in-order.

RISC chips were thus an ideal fit for high-performance engineering workstations from Sun and Silicon Graphics, which were used for engineering design, computer graphics, and other CPU-intensive tasks. RISC chips had good floating-point performance, crucial for numerical calculations, and it was also possible to use them in a multiprocessor system for even better parallel performance. Computer companies fully expected RISC to move downmarket from workstations to desktop PCs. In 1993, Microsoft released Windows NT, its first operating system that was not tied to Intel’s x86 processors. In 1994, Apple ditched its Motorola heritage and adopted IBM PowerPC RISC chips across its Macintosh lineup.

Many RISC advocates believed that the Intel x86 instruction set was simply too complex to take advantage of advanced CPU architectures, hence the retronym CISC. The x86 instruction set had grown essentially by accretion from 1971, collecting a number of strange and complicated instructions that appeared to be irreconcilable with the RISC concept. Although the Pentium (P5) dipped its toes into the superscalar world with a 5-stage pipeline, this did not begin to approach what was already being done on RISC chips.

The P6 proved all the skeptics wrong. It turned out that many CISC instructions were so strange that they were used very infrequently by software in the real world. Thus, it was possible to optimize a small subset of the CISC instructions — and then to handle the uncommon ones with a slower design. Because of the faster clock speed, the legacy CISC instructions would still execute faster than on previous chips, but only marginally. Software developers and compiler writers would have an incentive to use the fastest instructions, and the slower CISC instructions would eventually fall into disuse. Using this key insight and after a great deal of engineering work, Intel managed to beat the RISC vendors at their own game. P6 could match the performance of RISC, and still remain fully compatible with consumer Windows as well as Windows NT.

The RISC-based engineering workstations fell before the P6 onslaught. Linux allowed you to run a UNIX environment on commodity x86 hardware. Graphics cards made x86 competitive with Silicon Graphics (SGI), driving the company into bankruptcy. In 1997, AutoCAD gave up on UNIX/RISC and went Windows/x86 only with Release 14. Companies that had been buying $20,000 engineering workstations were now buying $2,000 commodity PCs from Dell and HP. For a while, Sun managed to stave off the x86 tide by supplying high-priced servers to dot-coms, but its fat margins disappeared when the dot-com bubble burst. Sun's financial position deteriorated over a prolonged period, and this pioneer of network computing was finally acquired by Oracle in 2010.

Managing the P6 project

Colwell was the first employee on the P6 project, and thus he essentially built up the organization chart from scratch. The P6 project ultimately involved more than 450 engineers, twice the size of the P5 project, and much larger than Intel’s previous chip projects. He could not use the Pentium project as an example, for it had started in 1989 and was proceeding concurrently with the P6 project. Since half of his engineers had come fresh out of college, the hiring process was designed to test them on widely-differing areas of knowledge and expose their weak spots. Since P6 used out-of-order execution, there were often tight couplings between components that required careful attention to organization.

Validation (testing) was put into the architecture group rather than the design group, even though Colwell saw the advantages of close contact between the designers and the validators. However, he felt that this led to irresistible pressure to use validators for design during crunch periods, a self-defeating strategy that just leads to more trouble down the road. To improve the validation, unit owners (feature designers) were required to write regression tests for their own units. In this way, validation engineers could perform not only black-box testing on finished units, but also use those tests to gain insight into the internal workings of the unit, so that they can design targeted tests to probe for weak spots.

Indeed, a microprocessor project has many process similarities to software development. The design work is done on computers, and a substantial amount of coding is involved. Beyond just coding up the circuit design and the tests to exercise that design, there is also a substantial tools component. Design work can be made immeasurably easier by good in-house tools. Previous Intel design projects had not done any behavioral modeling, but P6 was too ambitious to get away with a seat-of-the-pants approach. This had the potential to be quite painful, as the corporate-mandated iHDL tool was very buggy in behavioral mode.

In contrast, the P6 team wrote its own performance tool early on in the project, the Data Flow Analyzer (DFA). The goal of P6 was to turn out a chip with twice the performance of P5 on the same process technology (feature size). P5 could get away with modeling performance in a spreadsheet, but P6 featured advanced techniques that had previously only been used on RISC. There was little published literature on what you could expect when you applied them to CISC. Thus, the team wanted to have assurance, early on, that RISC techniques could be successfully applied to P6. The DFA tool ended up being adapted for behavioral modeling as well, and Colwell ignored all corporate pressure to standardize on another tool. In his view, his job was to turn out a chip, not to throw out what worked in order to satisfy a separate tools team.

Motivating the troops

Colwell appears to be quite a perfectionist, and is quite embarrassed when he himself makes a mistake. Yet he is also very much aware that engineers are people, and must be treated as such. He distinguishes between being tired, and being burned out — because you can counter fatigue with rest, but it is much harder to recover from burnout. He placed goodies in drawers, to be awarded by fellow engineers to colleagues when they helped with something. He also sent people to engineers’ houses to mow the lawn, and invited their families to the office for dinner during crunch periods, when the engineers were working late. These are great tools that cost practically nothing but have outsized morale benefits, as people are motivated more by knowing that you care about their well-being than by monetary awards. I thought the family dinners were particularly creative, as this helped the engineers to keep their marriages intact, while also giving the kids an exciting trip to an unfamiliar destination.

Many pages are spent discussing the silly directives that came down from Intel corporate. How can you ban listening to music while you work? Andy Grove may think that bicycle racks and showers are unnecessary luxuries, but Colwell points that other engineering companies have them and can use them as recruitment tools. He is furious with Intel’s bag-search policy, and appears to have made it a personal crusade, sending out numerous emails to higher-ups. Even though he was senior enough to be offered an exemption, he refused to be bribed and would not quit until they treated his engineers with the respect that he felt they deserved. He thought the “Back to Intel Basics” late policy was completely insane, since his engineers were highly-paid professionals who were already working late into the night. His engineers were required to write a memo explaining why they were late to work. Colwell told them to write the required memo and seal it up in an envelope, so that he could toss it into the trash, unopened.

Colwell says he doesn’t think Intel is a “sweatshop” (well, maybe “a little bit”) and seems to think nothing of a crunch phase that lasts six months. Clearly, he’s a very driven taskmaster himself. But he also recognizes when Intel adopts self-defeating personnel policies – which may be part of the reason why he left the company. It didn’t help that Intel management had an “infatuation” with the 7 Habits of Highly Effective People, and the HR lingo of “commitments.” To him, this seemed to be putting process before results, and he even calls it a “cargo cult.” Just because planes land on an airstrip doesn’t mean that building an airstrip causes planes to land. Similarly, just because highly effective people have habits (seven of them, apparently) doesn’t mean that having those habits makes you effective at your job. Intel wouldn’t be the only company to go down the path of management fads ...

Technical anecdotes

There are all sorts of neat anecdotes in the book, anecdotes that will delight the computer geek. For example, the difference between circuit simulation and physical hardware is so great that the first chip produced after tapeout will run more cycles in its first few seconds of operation than all the simulations run on the beefiest simulation computers during design. Because microcode is so jealously guarded, Colwell suggested that Intel VTUNE be setup to provide only the first four microcode instructions, as a compromise between debuggability and protection of Intel intellectual property. Speaking of microcode, it was so versatile for patching design problems that a microcode czar was tasked to guard the available microcode storage space. Otherwise, it would get quickly filled up, and you'd have to start evicting earlier patches to make room for more critical ones discovered later. It used to be easy to hack around speed-path issues by throwing more voltage at the chip, but as chips became more powerful and ran hotter, this trick became unusable. So what does Intel do now? A “host of techniques, nearly all proprietary.” This isn’t the circuit design you learned in school.

Colwell claims to be able to “fill this book with accounts of customer visits,” but gives only a few for space reasons. I wish he’d actually carried out his threat, because these types of personal anecdotes are very informative, and also very revealing of the state of the computer industry in the early 1990s.

He was “amazed ... how uniformly short the planning horizons of the companies we visited seemed to be” — with the single exception of Microsoft. When he visited Microsoft, he found the only software vendor that had a long-term vision for where the industry was headed, as did Intel. But he also found himself in the crossfire between the rival Windows 95 and NT teams. At one meeting with Microsoft, the Intel people somehow managed to provoke the 95 and NT teams into a shouting match. Not wanting to get on the wrong side of anyone at a major partner, the Intel people ended up slinking out of the room. This story is very reminiscent of the stories about the development of the Apple Macintosh. Nobody likes to be told that his work would eventually be obsolete, and superseded by someone else’s project.

At Novell, in contrast, people “yawned” and told him that CPUs were already fast enough. This shocked Colwell, who proceeded to explain Moore’s Law to them. But actually, if you think about it, file-sharing is not a CPU-bound operation. Even today, you will generally saturate your network before you run out of CPU. In general, Colwell was struck by the short cycles of the software vendors, who “behaved at all times like the world would end 18 months from today and there wasn’t much point in pretending time existed beyond that.” Interestingly, John Carmack comes out for special praise as a software developer who was actually helpful and particularly insightful. Games are more processor-intensive than most productivity applications, so it is quite appropriate that the founder of modern 3D videogaming would have opinions on topics of interest to Intel.

The Compaq viewpoint was the most interesting one. They were not happy that Intel was going to put the L2 cache on the chip (initially with two dies on one package, which greatly increased the cost of the first Pentium Pro chips, as it required assembly by hand during manufacturing) – or that Intel was going to put in glueless multiprocessing. As they saw it, they had more engineering expertise than their competitors, so they could design systems with performance advantages in caching and multiprocessing. Once Intel put these features on the chip, Compaq would find itself on a level playing field with its less-talented competitors. This is an early premonition of the ruthless commoditization that would soon affect the PC industry and drove many of the players out of business. Later integration was still to come — first by integrating many components onto the motherboard that used to be expansion cards, and now we’re starting to see whole systems on a single chip (SoC).

Multiprocessing was indeed a tremendous achievement, and part of the reason that the P6 design lasted so long. According to Colwell, every multiprocessor project previously attempted by the CPU industry had either been late, or had design errata (bugs) that made initial steppings usable only in single-processor mode. By being aware of the pitfalls and paying close attention to past mistakes, Colwell was able to avoid this problem with the P6 project. There’s another anecdote that shows just how tremendous an achievement this was. When he did a presentation on the P6 at ISCCC ’95 and explained that you could get multiprocessing simply by tying the pins together on multiple chips, he heard audible gasps in the room. He also got a gasp when he pointed out that he made the presentation on a P6 — which meant that the chip was already far enough along to run Microsoft Windows and Powerpoint.

Coda

At the end of his book, Colwell addresses some Intel-related controversies. The furor over processor ID and the patent lawsuits from Digital and Cornell were reported largely in the tech press and seem to have faded away over time, but the Pentium FDIV bug made it into popular culture and deserves special mention. Interestingly, Colwell points out the Pentium division bug was caused by a late change to the FPU that had a paper “proof” of correctness and thus was not adequately tested. The kicker was that the bug was introduced in response to an initiative to reduce the surface area of the chip — but since the FPU was located in the interior of the chip, the smaller FPU did not help.

He also relishes several “I told you so” moments over Intel’s recent stumbles. For example, he was called a “Chicken Little” for pointing out that Intel owed a great deal of its success to missteps by competitors. They shouldn’t pat themselves on the back too much, for at some point, continued speed would no longer be enough to keep Intel on top. He saw the end of the megahertz race in 1998, about five years before Intel slammed into the thermal wall. He also saw Itanium as a hopelessly complicated combination of unprecedented architectural changes, and thought that it should instead have been a proof-of-concept research project.

Rare look at engineering projects

In 2007, three “tech” companies – Microsoft, IBM, and Intel – collectively spent a total of $19.3 billion on Research and Development. [Source: CIO Zone, Top 50 Technology R&D Spenders] This is larger than the NASA budget that year. Throw in the next five biggest R&D spenders – Cisco, HP, Oracle, SAP, and Google – and you get $34.6 billion, more than the annual NASA budget at the peak of the Space Race, corrected for inflation. That doesn’t even include aerospace, or biotech, or other big R&D spenders.

In other words, corporate R&D is a big deal in modern society. These are products that touch billions of lives — yet books about the tech industry tend to either be fluffy business books or deep-dive technical manuals. But sales and marketing would be out of a job if the engineers can’t produce something for them to sell. Gigantic engineering projects can be every bit as exciting as tales of business giants clashing, but there are precious few popular accounts of engineering.

In The Pentium Chronicles, we get the information straight from the source — Intel’s lead architect of the P6 microarchitecture. Colwell is fairly chatty and witty in person, but for some reason it doesn't come across at all in his prose. To use a cliché, he writes like an engineer. Fortunately, the density of information compensates for this brusqueness. Now if only we could get a two-author book: sort of a cross between The Soul of A New Machine and The Pentium Chronicles ...