There are currently a significant (though arguably falling — see section 13.6) number of different RISC architectures available, targetted at a range of markets — including servers, high-performance desktop systems, and embedded systems. They are very similar, and we will restrict our attention to just three: the Digital (now HP/Compaq) Alpha; the Sun (actually more properly the SPARC International consortium) SPARC; and the IBM/Motorola PowerPC. Because they are so similar, we will first give an overview of their common characteristics, and the (limited) differences. Then we will look at the history of each one and some specifics.
The Alpha and PowerPC architectures both date from the early 1990s (92 and 93 respectively), and SPARC is a little older (1987). SPARC was originally heavily-influenced by the Berkeley RISC project (as met in chapter 3), and uses the same register stack. PowerPC and SPARC were initially 32-bit architectures, though they have since been extended to 64 bits. Alpha has always been a 64-bit architecture (and it was the first). Looking back at the earlier RISC processors, they all seem to have some quite non-RISC instructions. For example they all have hardware square root instructions.
The addressing modes available are all variations on indexed addressing. The Alpha only has one: base register + offset. Sparc has two: base register + offset and base register + index register. PowerPC has either two or four, depending on your point of view. It supports the same modes as SPARC but in addition there are modes which simultaneously update the index register (i.e. increment or decrement it). This can be useful for processing arrays in loops, and has been popular in the past — though most modern architectures have not included it.
All machines have 32 general purpose registers: initially in SPARC and PowerPC these were 32 bit but are now 64 bit (Alpha always has been). In all but the PowerPC Register zero is hardwired to zero (actually, it is sometimes in the PowerPC too). They all have 32 floating point registers, and support IEEE standard floating point. They all have four different instruction modes: register-register (i.e. arithmetic/logical); register-immediate (i.e. data transfer — loads and stores); conditional branch; and unconditional branch and call.
SPARC has the delayed and annulling branches we saw in chapter 5, with one branch delay slot to be filled. Neither Alpha nor PowerPC have these. In this respect we have picked a somewhat unrepresentative subset because quite a few current RISC architectures still do have delayed branches. However they generally only have one branch delay slot regardless of the pipeline length. The advantage of this is that a bit of time pressure is taken off handling branches, reducing the time penalty. The disadvantage is that to take real advantage of this, you must find an instruction to fill the branch delay slot. It is likely that if the original SPARC designers knew what was going to happen to microprocessor implementations, they would have left this out.
SPARC and PowerPC both manage conditional branches with (traditional) condition codes. SPARC has very traditional negative, zero, overflow and carry bits which are set (optionally) by arithmetic/logical instructions. There is no explict compare operation: if you wish to compare two values, you subtract them and store the result in register zero (which is hardwired to zero, so the result is discarded). In PowerPC, the condition bits are less than, greater than, equal and overflow. There is an explicit compare instruction, and they can also be set by arithmetic/logical operations. Instead of having just one set of these codes, there are eight: so PowerPC can use the multiple condition codes mechanism for handling branches we saw in chapter 5. This is in addition to branch history mechanisms supplied by the actual implementations. Alpha uses a different approach: instead of separate condition bits, it has a set of special compare instructions. These compare two registers, and set a third to either 1 or 0, depending on whether the condition was true or false. Since you can use as many registers as you wish (within reason) to store conditions, Alphas can also use the multiple condition codes technique.
The Digital Equipment Corporation Alpha architecture was a significant departure from their previous VAX architecture. They moved from the architypal CISC architecture to one that is clearly in the mainstream of RISCs. The architecture was intended to have a 25 year design life, though it will not last that long because Compaq announced that future Alpha development had ceased shortly before their takeover by HP (or merger, depending on your viewpoint). The VAX had a similar planned lifespan, and survived for about 15 years. It is unusual for a processor manufacturer to entirely abandon an architecture – it can severely impact your customer base. (This is in contrast to processor customers — e.g. Apple — who do it reasonably often.) However, the VAX was becoming untenable, in performance terms — the Byzantine architecture was very difficult to implement using advanced pipelined, or superscalar techniques. Before OK'ing the decision, DEC managament required assurance that VAX users had an upgrade path. (In fact there were two: one based on recompilation; the other on a mix of machine code translation and software emulation.)
Although serious, such a radical architecture change did not have the kind of implications it would have for, say, Intel. The issue isn't really the size of the installed base (though there are obviously many more PCs than VAXs), but the sophistication of the users — VAXs were not sitting in people's living rooms (actually, some are but these are not `normal' people). Consequently, the idea of changing radically was not so appallingly unthinkable — in fact, from the point of view of most users, very little changed (the operating systems available were essentially identical, for example). A radical departure did free DEC's designers from some serious legacy constraints — for example, the need to maintain the illusion of sequential execution in all cases.
Although a radical departure from the VAX, Alpha was not really DEC's first RISC. When attempting to produce a single-chip VAX, the microVAX, DEC became aware that it wasn't all going to fit (with the technology of the time: the mid 80s). After doing some studies, they identified 175 out of 304 basic instructions, and 6 out of 14 datatypes, as being common. A single-chip implementation was produced which `left out' the others. Any calls to these instructions, or attempts to use the missing datatypes, caused traps to software implementations. In most cases, there was no observable impact on the speed of software — a significant indicator of the validity of the RISC concept, as vast amounts of resources had been devoted to implementing these instructions and data types on earlier VAXs. Some of DEC's senior engineers were articulate defenders of CISC as a concept: however, this result was probably the effective end of the idea that CISC was superior to RISC.
The first Alpha — the 21064 — shipped in 1992, and was the most powerful microprocessor available. It had clock rates of 100MHz – 200MHz, 4 functional units, and was 2-way superscalar — the first commercial superscalar processor. The integer (2 units) and load/store pipelines took 7 cycles, and the FP unit 10 cycles. It had on-chip 8KB instruction and data caches. It also had supporting hardware for an additional off-chip level 2 cache.
Instructions are all 32 bits, and addresses are 64bits. Actually in the early processors, 64-bit virtual addresses are mapped to 43-bit actual addresses, of which only the bottom 34 bits were implemented. However, a physical address space of 16Gbytes was certainly adequate in 1992, and 43 bits gives 8 Terra Bytes, which is going to be OK for a while. However, bearing in mind the projected lifespan of the architecture, it was felt that ultimately much larger (64-bit) address spaces should be available — 25 years ago 20 bit addresses were considered adequate, and this turned out to be shortsighted.
The 21064 was superceeded by the improved 21064a in 1994. This had bigger caches 16KB each, and higher clock rates (225MHz-300MHz). DEC's strategy was to release major upgrades about every few years, with improved (and cheaper) implementations of the predecessors a short while before. This is more or less in line with their strategies for earlier machines — the VAX and PDP-11.
The next major upgrade was the 21164, launched in 1994. This had clock rates from 266MHz–333MHz, and was now 4-way superscalar (though it still had four functional units). The data/instruction caches were back to 8KB each, but there was an on-chip 96KB Level 2 cache. Also, one cycle had been shaved off the FP pipeline. The transistor count was 9.3 million, which looks very small compared with 221 million for Itanium 2, but at the time it was a lot (most of it in the Level 2 cache, of course). There were two variants of the 21164. The 21164a was just a speeded-up version, with clock rates of 400MHz-600MHz. The 21164PC was aimed at desktop systems. It lost the on-board Level 2 cache, and the instrution cache went back to 16KB. Clock rates were as for the 21164a (and the transistor count dropped to 3.5 million because of the missing level 2 cache).
The 21264 came out in 1998, and was followed by `a' and `b' variants. It had clock rates of 575MHz–1.25GHz, 6 functional units, and no on-board level 2 cache. However, the level 1 caches became 64KB each, and more complex. Transistor count is 15.2 million. A 1GHz 21264b was used for the comparative tests with Itanium in chapter 12. The sale of Digital to Compaq at around this time, and the departure of many senior engineers (mainly to AMD), did not help the development of this chip. The 21364 will be the last Alpha, with clock rates upto 1.7GHz and on-chip support for multiprocessor systems (maximum of 64 processors) with little extra hardware. It also has an advanced Level 2 cache of 1.75MB. Initially, a 21464 processor was planned for 2005, but Compaq have cancelled it.
The PowerPC is a development of a series of IBM architectures from the 1980s and earlier, with influences from Motorola's 8800 RISC processors (now abandoned). The first powerPC processor was the 601 in 1993. It had clock rates of 60–120MHz, a combined 32KB Level 1 cache (which was unusual) and 2.8 million transistors. The 601 was the first commercial microprocessor that speculatively executed instructions. The 601 was essentially an `intermediate' release, though it was used quite successfully by Apple in the first-generation Power Macintoshes. The 601 was followed by the 604, then the 604e (the last and fastest of which were called the `Mach 5' processors) with clock rates of 100–350MHz. The 604 had split instruction and data caches (16KB each, 32KB in the 604e). There was also a 603 and a 603e, intended for portable systems.
The best-known PowerPCs are the G3 (Apple's name: `Generation 3' – real name 740 and 750), the G4 (`Generation 4' — real name 74xx) and the G5 (`Generation 5' — real name 970). In 2003, these had clock rates of 200MHz–1GHz for the G3, 350MHz–1.5GHz for the G4 and 1.6–2.5GHz for the G5. Various developments have taken place during the development of these processors. Initially, they only had on-chip Level 1 cache (the same size as the 604e), but newer G4s have also had on-chip Level 2 caches (256KB).
Later G3s were significant for being the first commercial microprocessors to use copper for the on-chip wiring layers, which is much faster than aluminium, which had been used earlier. Copper has significantly lower resistance than aluminium, which in practice means signals can cross the chip more quickly. Current G4s have about 33 million transistors. G4s also introduced AltiVec — a set of vector instructions and registers intended to support multimedia-type operations. Such sets of instructions are becoming common — the best known being Intel's MMX and SSE. However, SPARC has support and so does alpha — though it is limited. We will be coming back to this later in the module. PowerPC chips are currently made only by IBM – Motorola have lost interest completely. Alledgedly, they did not do much to increase the G4 clock rate — and only Motorola make the G4.
Motorola were supposed to develop the G5 (and then a G6 and a G7), but instead left the chip manufacture business in 2004. IBM on the other hand have attempted to push development ahead. They produced several quicker versions of the G3, and have essentially integrated the PowerPC into their own, separate, POWER architecture. Supposedly, they did not join G4 development and production because they disagreed with the inclusion of AltiVec, regarding it as an unecessary corruption of the RISC architecture. If true, this is in retrospect unfortunate. However (presumably at Apple's request) they later came round. The Power4 was the first chip that had two processor cores, each with its own separate Level 1 and 2 cache, and a shared off-chip 32Mbyte Level 3 cache. The clock rate was 1GHz. After that, they announced the Power4 Desktop, or GigaProcessor UltraLite (GPUL), or 970, or G5…This is 8-way superscalar, and clocked initially at up to 2GHz (now 2.5GHz). It has 52 million transistors. There are planned versions with multiple on-chip processors cores (between 2 and 4). Note that it includes AltiVec.
The PowerPC architecuture does not depart from the standard RISC model in many ways. Somewhat unusually, it does not hardwire register zero to the value zero. This means that an extra register is available, at the cost of all those convenient uses for a hardwired zero (like not needing a special compare instruction). There is an exception to this: if you use register zero as the base register in memory address computations, then it is zero (though not if you use it as an index register). Presumably the designers of the PowerPC thought that the use of a hardwired zero in memory address computations was sufficiently common and useful to justify it, but not otherwise.
A significant difference is the way return addresses for procedure/function calls are handled. Rather than simply putting the return address in an ordinary register and letting the called procedure/function save it (if necessary), there is a special link register. The reason for this is that lots of procedure/function calls do not call anything else, and in such cases this register need not be saved somewhere else. Because it is not a normal register, accessing it can be made quicker — speeding up procedure/function call returns. There is also another special register — the count register — which is used as a loop index for for loops. The reason for this is that it is possible to speed up conditional branches that use it. This is because the value of the count register is known at the start of the loop body, so you will know quite early on if the branch at the end of the loop body will be taken or not. Also, testing the value of the count register will automatically decrement it.
The SPARC — Scalable Processor ARCitecture — is the oldest of the three that we have considered, and owes most to the early RISC processors. Originally developed by Sun as a replacement for the Motorola 680x0 processors they were using in workstations, it has now become an open IEEE-standard architecture, supported by the SPARC International consortium, with various interested groups as members, including four processor manufacturers, including: BridgePoint, who mainly deal in legacy support and upgrades; T.Sqware, who are mainly concerned with networking; and Fujitsu. However, the best known is Sun themselves, who currently make the UltraSPARC-series processors.
The current state of the art are the UltraSPARC T1, with clock rates up to 1.8GHz, 64KB level 1 instruction cache and 64KB level 1 data cache, and the UltraSPARC T1, which is an interesting development. It is Sun's first multi-core SPARC processor, with 8 cores per die. Furthermore, each core is 4-way multi-threaded — so each die can execute up to 32 threads simultaneously. This is something of a departure for Sun, who are recognising that there is a large (and growing) market for network-facing high-demand servers which have, for example, many web server threads running and not much else. In the future (not before 2008), Sun will replace UltraSPARC with a new processor codenamed `Rock', though not much other concrete information is available.
Because it is the oldest, and because it was directly inspired by the Berkeley RISC project, SPARC has most in common with early RISC architectures. It uses the register window technique first seen in the RISC project. The actual number of registers to include is not specified by the architecture: most implementations have chosen 128, though there are some that have only 40 and others that have 520. More registers means more procedure calls before it is necessary to write the register contents to memory. In practice, the smaller register sets mean that this saving (and subsequent restoring) has to go on quite often; while the very big sets are often largely unused – hence the general concensus of about 128. Register windows went into decline, though IA-64 has resurrected them. The other hangover from earlier RISCs is that SPARC still has delayed branches (as well as the annulling variety) though it does also have a non-delayed unconditional branch instruction. Like PowerPC, SPARC also has multimedia-support instructions, called VIS.
The number of RISC alternatives to IA-32 (and maybe IA-64) seems to be shrinking. ARM and MIPS now only deal with IP Cores, and do not actually make processors themselves. A number of manufacturers (including ARM and to some extent MIPS, as well as others) seem to be taking refuge in the embedded processor market. (This is admittedly big, but it is not `high profile' in the way desktop systems are: we are not going to see Hitachi and Mitsubishi making TV ads suggesting the decision on which washing machine you buy next should be influenced by who makes the microprocessor(s) in it.) Alpha has now been essentially killed off (though Intel have reportedly bought some of the technology, and are employing some of the remaining engineers). HP's PA-RISC architecture is dead (though incorporated into IA-64). Sun and IBM are restricted to specialised markets. Only PowerPC has any serious consumer user-base, but in 2005 Apple announced a transition from PowerPC to future versions of Intel processors (though note that (a) this does not mean that future Apple software would run on `ordinary' PCs). The main reason given was disappointment with the progress of PowerPC's development. However, IBM continue to push the architecture, and Sony, Microsoft, and Nintendo are all releasing game consoles based on PowerPC, so it is by no means dead.
Intel, and to a lesser extent, AMD are dominating the market simply because of the volume of sales. They can afford to devote more resources to processor development than anyone else. It is not beyond the bounds of possibility that they will force all other high-performance architectures out of existence. This would not be a Good Thing. Note that this is by no means guaranteed to happen. Neither of them have yet made any serious inroads into the high-value (i.e. server) 64-bit market, but they are seriously trying. What's more, they both (but especially Intel) have enough resources to keep trying for a long time.