Sunday, September 20, 2009

 

Itanium



Itanium

Itanium


Itanium 2 processor

Produced

From mid 2001 to present

Common manufacturer(s)

Intel

Max. CPU clock

733 MHz to 1.66 GHz

FSB speeds

300 MHz to 667 MHz

Instruction set

Itanium

Cores

1 or 2

Socket(s)

PAC611
PAC418 (original Itanium)

Core name(s)

McKinley

Madison
Hondo
Deerfield

Montecito

 

Itanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The processors are marketed for use in enterprise servers and high-performance computing systems. The architecture originated at Hewlett-Packard (HP), and was later jointly developed by HP and Intel.

Itanium's architecture differs dramatically from the x86-64 architectures used in other Intel processors. The Itanium architecture is based on explicit instruction-level parallelism, in which the compiler makes the decisions about which instructions to execute in parallel. By contrast, other superscalar architectures depend on elaborate processor circuitry to keep track of instruction dependencies during runtime. This alternative approach helps current Itanium processors execute up to six instructions per clock cycle.

After a protracted development process marked by many delays, the first Itanium processor, codenamed Merced, was released in 2001, and more powerful Itanium processors have been released periodically. HP produces most Itanium-based systems, but several other manufacturers also offer systems based on Itanium. As of 2008[update], Itanium is the fourth-most deployed microprocessor architecture for enterprise-class systems, behind x86-64, IBM POWER, and SPARC.[1] Intel released the current Itanium version, codenamed Montvale, in November 2007.[2] The follow-on, a quad-core processor codenamed Tukwila, was itself originally planned for release in 2007 but is now announced to ship to OEMs in the first quarter of 2010.[3]

Itanium has proven to be one of the biggest technological flops in the history of computing. Its sales were so disastrously beneath expectations that the appellation of "Itanic" has been applied to the franchise, invoking the ill-fated ocean liner RMS Titanic. Journalist John C. Dvorak, commenting in 2009 on the history of the Itanium processor, said "This continues to be one of the great fiascos of the last 50 years."[4] Tech columnist Ashlee Vance commented that the delays and underperformance "turned the product into a joke in the chip industry."[5]

 

Itanium (Merced): 2001

Itanium (Merced)


Itanium processor

Produced

From June 2001 to June 2002

Common manufacturer(s)

Intel

Max. CPU clock

733 MHz to 800 MHz

FSB speeds

266 MT/s

Instruction set

Itanium

Socket(s)

PAC418

Core name(s)

Merced

By the time Itanium was released in June 2001, its performance was not superior to competing RISC and CISC processors.[23] Itanium competed at the low-end (primarily 4-CPU and smaller systems) with servers based on x86 processors, and at the high end with IBM's POWER architecture and Sun Microsystems' SPARC architecture. Intel repositioned Itanium to focus on high-end business and HPC computing, attempting to duplicate x86's successful "horizontal" market (i.e., single architecture, multiple systems vendors). The success of this initial processor version was limited to replacing PA-RISC in HP systems, Alpha in Compaq systems and MIPS in SGI systems, though IBM also delivered a supercomputer based on this processor.[24] POWER and SPARC remained strong, while the 32-bit x86 architecture continued to grow into the enterprise space. With economies of scale fueled by its enormous installed base, x86 has remained the preeminent "horizontal" architecture in enterprise computing.

Only a few thousand systems using the original Merced Itanium processor were sold, due to relatively poor performance, high cost and limited software availability.[25] Recognizing that the lack of software could be a serious problem for the future, Intel made thousands of these early systems available to independent software vendors (ISVs) to stimulate development. HP and Intel brought the next-generation Itanium 2 processor to market a year later.

Itanium processor family

Original logo

Version 2 logo

2006 logo

2008 logo

2009 new logo

 

Itanium 2: 2002–present

Itanium 2 in 2003

The Itanium 2 processor was released in 2002, and was marketed for enterprise servers rather than for the whole gamut of high-end computing. The first Itanium 2, code-named McKinley, was jointly developed by HP and Intel. It relieved many of the performance problems of the original Itanium processor, which were mostly caused by an inefficient memory subsystem. McKinley contained 221 million transistors, of which 25 million were for logic, measured 19.5 mm by 21.6 mm (421 mm2) and was fabricated in a 180 nm, bulk CMOS process with six layers of aluminium metallization.[26]

In 2003, AMD released the Opteron, which implemented its 64-bit architecture (x86-64). Opteron gained rapid acceptance in the enterprise server space because it provided an easy upgrade from x86. Intel responded by implementing x86-64 in its Xeon microprocessors in 2004.[11] Intel released a new Itanium 2 family member, codenamed Madison, in 2003. Madison used a 130 nm process and was the basis of all new Itanium processors until Montecito was released in June 2006.

In March 2005, Intel announced that it was working on a new Itanium processor, codenamed Tukwila, to be released in 2007. Tukwila would have four processor cores and would replace the Itanium bus with a new Common System Interface, which would also be used by a new Xeon processor.[27] Later that year, Intel revised Tukwila's delivery date to late 2008.[28]

In November 2005, the major Itanium server manufacturers joined with Intel and a number of software vendors to form the Itanium Solutions Alliance to promote the architecture and accelerate software porting.[29] The Alliance announced that its members would invest $10 Billion in Itanium solutions by the end of the decade.[30]

In 2006, Intel delivered Montecito, a dual-core processor that roughly doubled performance and decreased energy consumption by about 20 percent.[31]

Intel released the current Itanium version, codenamed Montvale, in November 2007.[2]. In May 2009 the schedule for Tukwila, its followon, was revised again, with release to OEMs planned for the first quarter of 2010.[3]

In comparison with its Xeon family of server processors, Itanium is not a high-volume product for Intel. Intel does not release production numbers, but one industry analyst estimated that the production rate was 200,000 processors per year in 2007.[32] According to Gartner Inc., the total number of Itanium servers sold by all vendors in 2007 was about 55,000. This compares with 417,000 RISC servers (spread across all RISC vendors) and 8.4 million x86 servers. From 2001 through 2007, IDC reports that a total of 184,000 Itanium-based systems have been sold. For the combined POWER/SPARC/Itanium systems market, IDC reports that POWER captured 42% and SPARC captured 32%, while Itanium-based system revenue reached 26% in the second quarter of 2008.[33] According to an IDC analyst, in 2007 HP accounted for perhaps 80% of Itanium systems revenue.[34] According to Gartner, in 2008 HP accounted for 95% of Itanium sales.[5]

Architecture

Intel Itanium Architecture

Designer

HP and Intel

Bits

64-bit

Introduced

2001

Design

EPIC

Type

Register-Register

Endianness

Selectable

Registers

 

  • 128 64-bit general purpose registers
  • 128 82-bit floating-point registers
  • 64 1-bit predicate registers

 

 

Intel has extensively documented the Itanium instruction set and microarchitecture,[35] and the technical press has provided overviews.[9][16] The architecture has been renamed several times during its history. HP originally called it PA-WideWord. Intel later called it IA-64, then Itanium Processor Architecture (IPA),[36] before settling on Intel Itanium Architecture, but it is still widely referred to as IA-64. It is a 64-bit register-rich explicitly-parallel architecture. The base data word is 64 bits, byte-addressable. The logical address space is 264 bytes. The architecture implements predication, speculation, and branch prediction. It uses a hardware register renaming mechanism rather than simple register windowing for parameter passing. The same mechanism is also used to permit parallel execution of loops. Speculation, prediction, predication, and renaming are under control of the compiler: each instruction word includes extra bits for this. This approach is the distinguishing characteristic of the architecture.

The architecture implements 128 integer registers, 128 floating point registers, 64 one-bit predicates, and eight branch registers. The floating point registers are 82 bits long to preserve precision for intermediate results.

Instruction execution

Each 128-bit instruction word contains three instructions, and the fetch mechanism can read up to two instruction words per clock from the L1 cache into the pipeline. When the compiler can take maximum advantage of this, the processor can execute six instructions per clock cycle. The processor has thirty functional execution units in eleven groups. Each unit can execute a particular subset of the instruction set, and each unit executes at a rate of one instruction per cycle unless execution stalls waiting for data. While not all units in a group execute identical subsets of the instruction set, common instructions can be executed in multiple units.

The execution unit groups include:

The compiler can often group instructions into sets of six that can execute at the same time. Since the floating-point units implement a multiply-accumulate operation, a single floating point instruction can perform the work of two instructions when the application requires a multiply followed by an add: this is very common in scientific processing. When it occurs, the processor can execute four FLOPs per cycle. For example, the 800 MHz Itanium had a theoretical rating of 3.2 GFLOPS and the fastest Itanium 2, at 1.67 GHz, was rated at 6.67 GFLOPS.

Memory architecture

From 2002 to 2006, Itanium 2 processors shared a common cache hierarchy. They had 16 KB[38] of Level 1 instruction cache and 16 KB of Level 1 data cache. The L2 cache was unified (both instruction and data) and is 256 KB. The Level 3 cache was also unified and varied in size from 1.5 MB[38] to 24 MB. The 256 KB L2 cache contains sufficient logic to handle semaphore operations without disturbing the main arithmetic logic unit (ALU).

Main memory is accessed through a bus to an off-chip chipset. The Itanium 2 bus was initially called the McKinley bus, but is now usually referred to as the Itanium bus. The speed of the bus has increased steadily with new processor releases. The bus transfers 2x128 bits per clock cycle, so the 200 MHz McKinley bus transferred 6.4 GB/s[39] and the 533 MHz Montecito bus transfers 17.056 GB/s[39].[40]

 

Architectural changes

Itanium processors released prior to 2006 had hardware support for the IA-32 architecture to permit support for legacy server applications, but performance for IA-32 code was much worse than for native code and also worse than the performance of contemporaneous x86 processors. In 2005, Intel developed the IA-32 Execution Layer (IA-32 EL), a software emulator that provides better performance. With Montecito, Intel therefore eliminated hardware support for IA-32 code.

In 2006, with the release of Montecito, Intel made a number of enhancements to the basic processor architecture including:[41]

Hardware support

Systems

Server manufacturers' Itanium products

Company

latest product

name

from

to

name

CPUs

Compaq

2001

2001

Proliant 590

1-4

IBM

2001

2005

x455

1-16

Dell

2001

2005

PowerEdge 7250

1-4

Unisys

2002

2009

ES7000/one

1-32

HP

2001

now

Integrity

1-128

SGI

2001

now

Altix 4000

1-2048

Hitachi

2001

now

BladeSymphony
1000

1-8

Bull

2002

now

NovaScale

1-32

NEC

2002

now

Express5800
/1000

1-32

Fujitsu

2005

now

PRIMEQUEST

1-32

As of 2009, several manufacturers offer Itanium systems, including HP, SGI, NEC, Fujitsu, Hitachi, and Groupe Bull. In addition, Intel offers a chassis[42] that can be used by system integrators to build Itanium systems. HP, the only one of the industry's top four server manufacturers to offer Itanium-based systems today, manufactures at least 80% of all Itanium systems. HP sold 7200 systems in the first quarter of 2006.[43] The bulk of systems sold are enterprise servers and machines for large-scale technical computing, with an average selling price per system in excess of US$200,000. A typical system uses eight or more Itanium processors.

Chipsets

The Itanium bus interfaces to the rest of the system via a chipset. Enterprise server manufacturers differentiate their systems by designing and developing chipsets that interface the processor to memory, interconnections, and peripheral controllers. The chipset is the heart of the system-level architecture for each system design. Development of a chipset costs tens of millions of dollars and represents a major commitment to the use of the Itanium. IBM created a chipset in 2003, and Intel in 2002, but neither of them has developed chipsets to support newer technologies such as DDR2 or PCI Express.[44] Currently, modern chipsets for Itanium supporting such technologies are manufactured by HP, Fujitsu, SGI, NEC, and Hitachi.

The upcoming Itanium processor (Tukwila) has been designed to share a common chipset with the Intel Xeon processor EX (Intel's Xeon processor designed for four processor and larger servers). The goal is to streamline system development and reduce costs for server OEMs, many of whom develop both Itanium- and Xeon-based servers.

Processors

Released processors

The Itanium processors show a progression in capability. Merced was a proof of concept. McKinley dramatically improved the memory hierarchy and allowed Itanium to become reasonably competitive. Madison, with the shift to a 130 nm process, allowed for enough cache space to overcome the major performance bottlenecks. Montecito, with a 90 nm process, allowed for a dual-core implementation and a major improvement in performance per watt. Montvale added three new features: core-level lockstep, demand-based switching and front-side bus frequency of up to 667 MHz.

 

Codename
process

released

Clock

L2 Cache/
core
[38]

L3 Cache/
core
[38]

Bus

dies/
device

cores/
die

watts/
device

comments

Itanium

Merced
180 nm

2001-06

733 MHz

96 KB

none

266 MHz

1

1

116

2MB off-die L3 cache

800 MHz

none

130

4MB off-die L3 cache

Itanium 2

McKinley
180 nm

2002-07-08

900 MHz

256 KB

1.5 MB

400 MHz

1

1

130

HW branchlong

1 GHz

3 MB

130

Madison
130 nm

2003-06-30

1.3 GHz

3 MB

130

1.4 GHz

4 MB

130

1.5 GHz

6 MB

130

2003-09-08

1.4 GHz

1.5 MB

130

2004-04

1.4 GHz

3 MB

130

1.6 GHz

3 MB

130

Deerfield
130 nm

2003-09-08

1.0 GHz

1.5 MB

62

Low voltage

Hondo
130 nm

2004-Q1

1.1 GHz

4 MB

400 MHz

2

1

260

32 MB L4

Fanwood
130 nm

2004-11-08

1.6 GHz

3 MB

533 MHz

1

1

130

1.3 GHz

3 MB

400 MHz

62?

Low voltage

Madison 9M
130 nm

2004-11-08

1.6 GHz

9 MB

400 MHz

130

2005-07-05

1.67 GHz

6 MB

667 MHz

130

2005-07-18

1.67 GHz

9 MB

667 MHz

130

Montecito
90 nm

2006-07-18

1.4 GHz

256 KB+
1 MB

12 MB

400 MHz

1

2

104

Virtualization,
Multithread,
no HW IA-32

1.6 GHz

12 MB

533 MHz

1

2

104

Montvale
90 nm

2007-10-31

1.66 GHz

4-18 MB

400-667 MHz

1

1-2

75-104

Core-level lockstep,
demand-based switching

Future processors

As of February 2009[update], some information is known for the following:

Labels:


Comments: Post a Comment

Subscribe to Post Comments [Atom]





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]