genius IT ians™: Hyper Threading

Sunday, September 20, 2009

Hyper Threading

Hyper-threading

Hyper-threading is Intel's term for its simultaneous multithreading implementation in their Pentium 4, Atom, and Core i7 CPUs. Hyper-threading (officially termed Hyper-Threading Technology or HTT) is an Intel-proprietary technology used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors. A processor with hyper-threading enabled is treated by the operating system as two processors instead of one. This means that only one processor is physically present but the operating system sees two virtual processors, and shares the workload between them. Hyper-threading requires only that the operating system support multiple processors, but Intel recommends disabling HT when using operating systems that have not been optimized for the technology.^[1]

Performance

Hyper-threading scheme

The advantages of hyper-threading are listed as: improved support for multi-threaded code, allowing multiple threads to run simultaneously, improved reaction and response time.

According to Intel the first implementation only used 5% more die area than the comparable non-hyperthreaded processor, but the performance was 15–30% better.

Intel claims up to a 30% speed improvement compared with an otherwise identical, non-simultaneous multithreading Pentium 4. Intel also claims significant performance improvements with a hyper-threading-enabled Pentium 4 processor in some artificial intelligence algorithms. The performance improvement seen is very application-dependent, however, and some programs actually slow down slightly when Hyper Threading Technology is turned on. This is due to the replay system of the Pentium 4 tying up valuable execution resources, thereby starving the other thread. (The Pentium 4 Prescott core gained a replay queue, which reduces execution time needed for the replay system, but this is not enough to completely overcome the performance hit.) However, any performance degradation is unique to the Pentium 4 (due to various architectural nuances), and is not characteristic of simultaneous multithreading in general.

Details

Intel Pentium 4 @ 3.80Ghz with Hyper-Threading Technology.

Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously. When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction, or data dependency.)

This technology is transparent to operating systems and programs. All that is required to take advantage of hyper-threading is symmetric multiprocessing (SMP) support in the operating system, as the logical processors appear as standard separate processors.

It is possible to optimize operating system behavior on multi-processor hyper-threading capable systems. For example, consider an SMP system with two physical processors that are both hyper-threaded (for a total of four logical processors). If the operating system's process scheduler is unaware of hyper-threading it will treat all four processors as being the same. If only two processes are eligible to run it might choose to schedule those processes on the two logical processors that happen to belong to one of the physical processors; that processor would become extremely busy while the other would be idle, leading to poorer performance than is possible with better scheduling. This problem can be avoided by improving the scheduler to treat logical processors differently from physical processors; in a sense, this is a limited form of the scheduler changes that are required for NUMA systems.

Security

In May 2005 Colin Percival presented a paper, Cache Missing for Fun and Profit, demonstrating that a malicious thread operating with limited privileges can monitor the execution of another thread through their influence on a shared data cache, allowing for the theft of cryptographic keys. Note that while the attack described in the paper was demonstrated on an Intel Pentium 4 with HyperThreading processor, the same techniques could theoretically apply to any system where caches are shared between two or more non-mutually-trusted execution threads; see also side channel attack.

Past

Older Netburst Pentium 4 based CPUs use hyper-threading, but Intel's processors based on the Core microarchitecture do not. However, Intel is using the feature in the newer Atom and Core i7 processors.

Inefficiencies

More recently hyper-threading has been criticised as being energy-inefficient. For example, specialist low-power CPU design company ARM has stated SMT can use up to 46% more power than dual CPU designs. Furthermore, they claim SMT increases cache thrashing by 42%, whereas dual core results in a 37% decrease.^[2] These considerations are claimed to be the reason Intel dropped SMT from the following microarchitecture.

Present & Future

Intel released the Nehalem (Core i7) in November 2008 in which hyper-threading makes a return. Nehalem contains 4 cores and effectively scales 8 threads.^[3]

The Intel Atom is an in-order single-core processor with hyper-threading, for low power mobile PCs and low-price desktop PCs.

Labels: microprocessors