Introduction
Intel's upcoming Penryn
and Nehalem processors
unmasked. Integrated
memory controllers and
CPU-mounted graphics
Integrated memory
controllers, new system
architectures, boosted Core
microarchitecture
performance were just some
of the topics covered. Read
on to find out how Intel is
planning to execute its CPU
roadmap for the next 18
months, and why some of it
might come as a shock.
Penryn: Core 2 evolved
First up on the agenda was a
discussion on the
improvements that Intel's
upcoming Penryn core will
provide over the present
Core 2 microarchitecture,
currently represented by
Merom, Conroe, and
Clovertown/Kentsfield cores.
Penryn - a bit of
background on why
45nm is lovely
HEXUS reported on Intel's
progression to 45nm
technology at the back end
of February 2007
45nm, why the need?
Serving as a quick recap, Intel
announced that it was basing
its evolution to the Core 2
microarchitecture,
codenamed Penryn, on a
45nm manufacturing process
that took advantage of a
breakthrough in transistor
design.
In a nutshell, the smaller
process uses a high-k metal
gate silicon process that
replaces the traditional
silicon dioxide insulating
layer, between substrate and
transistor, with a hafnium-
based high-k gate oxide
which allows for a thicker
(and better-insulating) layer
to be used. This, in turn, leads
to lower electrical leakage; a
crucial requirement with
ultra-small manufacturing
processes. The new metal
gate replaces the traditional
polysilicon version and
provides for a better
electromagnetic field,
helping switching times.
The upshot? Considerably
less current leakage and a
faster switching time, which
translate to a more energy-
efficient design that will
have an innate propensity to
clock higher.
Penryn additions to Intel
Core microarchitecture
A smaller manufacturing
process isn't all that's new to
Penryn, though.
Let's trot out the old PDF foil
and go through what's being
bolted on to an already
decent architecture.
First off, it's important to
note that the Penryn is a
complete family of
processors, encompassing
workstation/server, desktop,
and mobile parts, so what's
applicable to one sub-family
is generally applicable across
the board.
Penryn refers to the next
evolution of dual- and quad-
core CPUs that are run off the
present LGA775 form factor.
Images courtesy of Intel.
Now, the Core
microarchitecture's key
performance-defining
benefits are shown on the
left-hand side. We've
covered them in some detail
previously.
Penryn's additions are shown
on the right, so let's go
through them and attempt to
delineate their usefulness in
improving performance.
Fast Radix-16 Divider
Penryn incorporates a new
algorithm, Radix-16, for
dividing instructions and
commands at 4 bits at a time,
compared to 2 bits for the
incumbent Conroe/
Kentsfield. The divide
instruction is pervasive
across applications, used in
both floating point and
integer calculations, so a
double-fast algorithm adds
some more juice to the CPU's
computational speed.
Enhanced Intel
Virtualisation Technology
More prevalent in the
workstation/server
community, Intel VT
technology - where multiple,
hardware-isolated partitions
can run on the same machine
- is boosted with a reduction
in the time taken to
transition between virtual
machines on a purely
hardware level. Intel quotes
a boost of up to 75 per cent.
Larger cache sizes
Large amounts of on-chip
cache is a good thing. The
ability to load and locally
store an application's
working is an effective, if
transistor-costly, method of
increasing performance, as
on-chip cache speeds are an
order of magnitude faster
than accessing external
memory on a regular basis.
In particular, dual-core
Penryns will pack up to 6MiB
of L2 cache and quad-core
models up to 12MiB. In
transistor terms that's around
840m for a QC part; it's just as
well Intel is packing them
into a space-saving 45nm
process, then.
Split load cache
enhancement
Cache is cache, right?
However, the effectiveness
of cache is directly related to
just how well data can be
crammed into it. Should tags
not correctly align with the
cache line (too big, perhaps),
which contains an index of
what's in the cache, transfers
to the execution core can be
an inefficient process.
Penryn has a split-load cache,
which, as the name suggest,
is able to split the data and
associated tags up to better
fit into the cache's lines.
Higher bus and core speeds,
heat
Just as adding more cache is
an established method of
increasing overall
performance, fattening the
FSB pipe, which delivers data
from the system's memory to
the processor, increases
memory bandwidth and,
ceteris paribus, performance.
Intel's raising the FSB speed
to an effective 1600MHz,
although that's only
applicable to selected Xeon-
based SKUs which already
run at 1333MHz FSB. Selected
desktop processors should
see a hike to 1333MHz FSB,
too, and will be officially
supported on Intel's Bearlake
motherboards. We'd expect
Penryn-based processors to
work in most present
performance-oriented
motherboards, including
NVIDIA's nForce 680i SLI
with, presumably, a simple
BIOS update.
Thinking about it in a
performance sense, Intel has
to increase the FSB, solely
because of the memory
contention that a 'four-core'
CPU with a shared FSB places
on the system. Intel has been
able to mask contention
shortcomings with an
intelligent architecture, but
it's, frankly, an inelegant
interface for a multi-core
processor.
Intel is being coy about the
initial range of clockspeeds
for Penryn, but we believe
that they will debut - in a
server/workstation and
desktop market - with a
3.46GHz maximum clock.
Later revisions, of course,
will see that pushed up
towards 4GHz.
With respect to desktop
SKUs, dual-core models are
slated to consume 65W TDP,
matching the incumbent
line-up. Quad-core, sporting
up to 12MiB of cache, will
consume either 95W or
130W. Server parts will
continue to ship at
50W/80W/120W, with the
increased transistor count
counteracted by the energy-
efficient manufacturing
process. Mobile parts, too,
have been designed to fit
into current thermal
envelopes, and all Penryn
SKUs will be a drop-in
upgrade from present
models.
SSE4 and Super Shuffle
Engine
SSE4 was designed to be
debuted with the Nehalem
core (juicy information
morsels for this on the
following page). SSE4 adds a
bunch of multimedia-related
optimisations that will be
manifested in a desktop
environment by better
media-encoding
performance.
Super Shuttle Engine sounds
like a Japanese-esque
nomenclature for an
advancement that adds a
128-bit-wide shuffle unit. In
plain English it's useful for a
number of imaging and
video programs that use
what are termed shuffle-like
operations such as pack, shift
and unpack. It'll be
interesting to put this to the
test.
Deep Power Down
Technology and Enhanced
Intel Dynamic Acceleration
The Intel Core 2
microarchitecture introduced
enhanced power-saving
states that gated the CPU
down during idle periods.
The DPDT is an extension
that further pushes down
energy requirements during,
you guessed it, idle periods.
EIDA is an interesting
inclusion. Should the current
application be single-
threaded, whereby there's
no intrinsic advantage of
having multiple cores
working concurrently, EIDA
pushes up the single-core
frequency to above
specifications. That could
mean a 2.93GHz part auto-
overclocking to, say, 3.2GHz.
Sounds like a good bet for
isolated gaming, where the
majority of titles are still
single-threaded.
Summary
We've trotted out a number
of enhancements that
Penryn possesses over and
above current dual- and
quad-core Core 2-based CPUs,
but, really, they're
architectural bolt-ons that,
on a clock-for-clock basis,
will provide somewhere in
the region of 20 to 30 per
cent extra performance.
There's nothing radically
new here, just as we
suspected, and Penryn
constitutes a natural
progression for Core 2.
Widespread availability is
scheduled for 2H 2007, so
expect Penryn-powered
boxes for Thanksgiving and
Christmas.
Does it have enough oomph
to battle AMD's Barcelona?
We'll find out.
Nehalem is up next and it
packs in some shocks. Read
on.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment