Tuesday, March 9, 2010

Intel's upcoming Penrynand Nehalem processorsunmasked. Integratedmemory controllers andCPU-mounted graphics

Introduction
Intel's upcoming Penryn
and Nehalem processors
unmasked. Integrated
memory controllers and
CPU-mounted graphics
Integrated memory
controllers, new system
architectures, boosted Core
microarchitecture
performance were just some
of the topics covered. Read
on to find out how Intel is
planning to execute its CPU
roadmap for the next 18
months, and why some of it
might come as a shock.
Penryn: Core 2 evolved
First up on the agenda was a
discussion on the
improvements that Intel's
upcoming Penryn core will
provide over the present
Core 2 microarchitecture,
currently represented by
Merom, Conroe, and
Clovertown/Kentsfield cores.
Penryn - a bit of
background on why
45nm is lovely
HEXUS reported on Intel's
progression to 45nm
technology at the back end
of February 2007
45nm, why the need?
Serving as a quick recap, Intel
announced that it was basing
its evolution to the Core 2
microarchitecture,
codenamed Penryn, on a
45nm manufacturing process
that took advantage of a
breakthrough in transistor
design.
In a nutshell, the smaller
process uses a high-k metal
gate silicon process that
replaces the traditional
silicon dioxide insulating
layer, between substrate and
transistor, with a hafnium-
based high-k gate oxide
which allows for a thicker
(and better-insulating) layer
to be used. This, in turn, leads
to lower electrical leakage; a
crucial requirement with
ultra-small manufacturing
processes. The new metal
gate replaces the traditional
polysilicon version and
provides for a better
electromagnetic field,
helping switching times.
The upshot? Considerably
less current leakage and a
faster switching time, which
translate to a more energy-
efficient design that will
have an innate propensity to
clock higher.
Penryn additions to Intel
Core microarchitecture
A smaller manufacturing
process isn't all that's new to
Penryn, though.
Let's trot out the old PDF foil
and go through what's being
bolted on to an already
decent architecture.
First off, it's important to
note that the Penryn is a
complete family of
processors, encompassing
workstation/server, desktop,
and mobile parts, so what's
applicable to one sub-family
is generally applicable across
the board.
Penryn refers to the next
evolution of dual- and quad-
core CPUs that are run off the
present LGA775 form factor.
Images courtesy of Intel.
Now, the Core
microarchitecture's key
performance-defining
benefits are shown on the
left-hand side. We've
covered them in some detail
previously.
Penryn's additions are shown
on the right, so let's go
through them and attempt to
delineate their usefulness in
improving performance.
Fast Radix-16 Divider
Penryn incorporates a new
algorithm, Radix-16, for
dividing instructions and
commands at 4 bits at a time,
compared to 2 bits for the
incumbent Conroe/
Kentsfield. The divide
instruction is pervasive
across applications, used in
both floating point and
integer calculations, so a
double-fast algorithm adds
some more juice to the CPU's
computational speed.
Enhanced Intel
Virtualisation Technology
More prevalent in the
workstation/server
community, Intel VT
technology - where multiple,
hardware-isolated partitions
can run on the same machine
- is boosted with a reduction
in the time taken to
transition between virtual
machines on a purely
hardware level. Intel quotes
a boost of up to 75 per cent.
Larger cache sizes
Large amounts of on-chip
cache is a good thing. The
ability to load and locally
store an application's
working is an effective, if
transistor-costly, method of
increasing performance, as
on-chip cache speeds are an
order of magnitude faster
than accessing external
memory on a regular basis.
In particular, dual-core
Penryns will pack up to 6MiB
of L2 cache and quad-core
models up to 12MiB. In
transistor terms that's around
840m for a QC part; it's just as
well Intel is packing them
into a space-saving 45nm
process, then.
Split load cache
enhancement
Cache is cache, right?
However, the effectiveness
of cache is directly related to
just how well data can be
crammed into it. Should tags
not correctly align with the
cache line (too big, perhaps),
which contains an index of
what's in the cache, transfers
to the execution core can be
an inefficient process.
Penryn has a split-load cache,
which, as the name suggest,
is able to split the data and
associated tags up to better
fit into the cache's lines.
Higher bus and core speeds,
heat
Just as adding more cache is
an established method of
increasing overall
performance, fattening the
FSB pipe, which delivers data
from the system's memory to
the processor, increases
memory bandwidth and,
ceteris paribus, performance.
Intel's raising the FSB speed
to an effective 1600MHz,
although that's only
applicable to selected Xeon-
based SKUs which already
run at 1333MHz FSB. Selected
desktop processors should
see a hike to 1333MHz FSB,
too, and will be officially
supported on Intel's Bearlake
motherboards. We'd expect
Penryn-based processors to
work in most present
performance-oriented
motherboards, including
NVIDIA's nForce 680i SLI
with, presumably, a simple
BIOS update.
Thinking about it in a
performance sense, Intel has
to increase the FSB, solely
because of the memory
contention that a 'four-core'
CPU with a shared FSB places
on the system. Intel has been
able to mask contention
shortcomings with an
intelligent architecture, but
it's, frankly, an inelegant
interface for a multi-core
processor.
Intel is being coy about the
initial range of clockspeeds
for Penryn, but we believe
that they will debut - in a
server/workstation and
desktop market - with a
3.46GHz maximum clock.
Later revisions, of course,
will see that pushed up
towards 4GHz.
With respect to desktop
SKUs, dual-core models are
slated to consume 65W TDP,
matching the incumbent
line-up. Quad-core, sporting
up to 12MiB of cache, will
consume either 95W or
130W. Server parts will
continue to ship at
50W/80W/120W, with the
increased transistor count
counteracted by the energy-
efficient manufacturing
process. Mobile parts, too,
have been designed to fit
into current thermal
envelopes, and all Penryn
SKUs will be a drop-in
upgrade from present
models.
SSE4 and Super Shuffle
Engine
SSE4 was designed to be
debuted with the Nehalem
core (juicy information
morsels for this on the
following page). SSE4 adds a
bunch of multimedia-related
optimisations that will be
manifested in a desktop
environment by better
media-encoding
performance.
Super Shuttle Engine sounds
like a Japanese-esque
nomenclature for an
advancement that adds a
128-bit-wide shuffle unit. In
plain English it's useful for a
number of imaging and
video programs that use
what are termed shuffle-like
operations such as pack, shift
and unpack. It'll be
interesting to put this to the
test.
Deep Power Down
Technology and Enhanced
Intel Dynamic Acceleration
The Intel Core 2
microarchitecture introduced
enhanced power-saving
states that gated the CPU
down during idle periods.
The DPDT is an extension
that further pushes down
energy requirements during,
you guessed it, idle periods.
EIDA is an interesting
inclusion. Should the current
application be single-
threaded, whereby there's
no intrinsic advantage of
having multiple cores
working concurrently, EIDA
pushes up the single-core
frequency to above
specifications. That could
mean a 2.93GHz part auto-
overclocking to, say, 3.2GHz.
Sounds like a good bet for
isolated gaming, where the
majority of titles are still
single-threaded.
Summary
We've trotted out a number
of enhancements that
Penryn possesses over and
above current dual- and
quad-core Core 2-based CPUs,
but, really, they're
architectural bolt-ons that,
on a clock-for-clock basis,
will provide somewhere in
the region of 20 to 30 per
cent extra performance.
There's nothing radically
new here, just as we
suspected, and Penryn
constitutes a natural
progression for Core 2.
Widespread availability is
scheduled for 2H 2007, so
expect Penryn-powered
boxes for Thanksgiving and
Christmas.
Does it have enough oomph
to battle AMD's Barcelona?
We'll find out.
Nehalem is up next and it
packs in some shocks. Read
on.

32nm microprocessors

Revolutionizing How
We Use Technology —
Today and Beyond
In another world's first, Intel
has demonstrated its 32nm
logic process with a
functional SRAM packing
more than 1.9 billion second
generation high-k metal gate
transistors. It's a monumental
step towards delivering
32nm microprocessors in
2009 —and a great leap
towards developing
significant density,
performance, and power
improvements beyond
today's 45nm technology.
And, what are these 1.9
billion transistors? They're
the tiny switches that
process the ones and zeroes
that make up our digital
world. They enable Intel to
continue to deliver record-
breaking PC, laptop and
server processor speeds. And
they're all packed onto a
single memory cell nearly
half the size of the 45nm cell
— which means, for example,
that Intel will be able to
deliver more cores on the
same die and more cache for
even greater performance in
the future.
WE'RE DEVELOPING BEYOND
THE SPEED OF MOORE'S LAW
Moore's Law states that the
number of transistors on a
chip doubles about every
two years. And, Intel has kept
up with that pace. In fact, this
SRAM milestone is several
months ahead of schedule.
Intel's unique position allows
use of "Design for
Manufacturability" (DFM)
techniques to co-optimizing
product design and
manufacturing disciplines.
Intel's DFM was key to the
early ramp of 45nm logic
technology —and one of the
catalysts for bringing
revolutionary innovations to
market faster than ever
before.
THE FUTURE OF 32NM
MICROPROCESSORS —YOUR
FUTURE
The digital age is
transforming the way we
live, work, and communicate.
And with this breakthrough
in 32nm logic technology,
you can expect more in the
future—a lot more. Like
faster processor speeds,
greater computing
capability, improved
functionality, and more
sophisticated applications.
While others aren't
scheduled to deliver on this
technology until much later,
you'll be seeing it just
around the corner from Intel.
Our 32nm microprocessors
are right on track to make a
breakthrough debut in 2009.
Introduction to Intel's
32nm process technology
File Type/Size: PDF 141KB
Press release: Get more on
the world's first 32nm logic
technology
INDUSTRY PAPERS AND
PRESENTATIONS
Intel disclosed numerous
technical details about its
32nm process at the
International Electron
Devices Meeting (IEDM)
conference in December,
2008

The Next GenerationCUDA Architecture

THE NEXT GENERATION
CUDA ARCHITECTURE,
CODE NAMED FERMI
THE SOUL OF A
SUPERCOMPUTER IN THE
BODY OF A GPU
The next generation CUDA
architecture, code named
“ Fermi”, is the most
advanced GPU computing
architecture ever built. With
over three billion transistors
and featuring up to 512 CUDA
cores, Fermi delivers
supercomputing features and
performance at 1/10th the
cost and 1/20th the power of
traditional CPU-only servers.
SEE VIDEO OF JEN-HSUN
HUANG ANNOUNCING
FERMI HEAR DETAILS ABOUT
THE NEW GPU
ARCHITECTURE SEE DETAILS OF THE KEY
ARCHITECTURAL
FEATURES
Fermi makes GPU and CPU co-
processing pervasive by
addressing the full-spectrum
of computing applications.
Designed for C++ and
available with a Visual Studio
development environment,
it makes parallel
programming easier and
accelerates performance on
a wider array of applications
than ever before – including
dramatic performance
acceleration in ray tracing,
physics, finite element
analysis, high-precision
scientific computing, sparse
linear algebra, sorting, and
search algorithms.
Fermi features several major
innovations:
• 512 CUDA cores
• NVIDIA Parallel DataCache
technology
• NVIDIA GigaThread™
engine
• ECC support
Watch the interactive
presentation above to learn
more.
See What the Experts are
Saying
"Fermi surpasses anything
announced by NVIDIA's
leading GPU competitor
(AMD)"
Tom Halfhill
Senior Analyst and Senior
Editor, Microprocessor Report
Looking Beyond Graphics
"I believe history will record
Fermi as a significant
milestone"
Dave Patterson
Director, Parallel Computing
Research Laboratory,
U.C.Berkeley
Co-author of Computer
Architecture: A Quantitative
Approach
The Top 10 Innovations in
the New NVIDIA Fermi
Architecture, and the Top 3
Next Challenges
"Fermi is the world's first
complete GPU computing
architecture."
Peter Glaskowsky
Technology Analyst,
Envisioneering Group
NVIDIA’s Fermi: The First
Complete GPU Computing
Architecture
"The convergence of new,
fast GPUs optimized for
computation as well as 3-D
graphics acceleration and
industry-standard software
development tools marks the
real beginning of the GPU
computing era. Gentlemen,
start your GPU computing
engines."
Nathan Brookwood
Principal Analyst & Co-
Founder, Insight64
NVIDIA Solves the GPU

Tuesday, March 2, 2010

WiMAX: Connect in more places, more often

Built for the future, Intel® WiMAX technology will allow you to connect in more places, more often, without being restricted to hotspots. When built into notebooks and mobile devices, you'll be able to extend your connected experience beyond Wi-Fi.

Connecting notebooks of the future with WiMAX

With the Intel® WiMAX/WiFi Link 5050 Series module solution, available in notebooks with Intel® Centrino® 2 processor technology, Intel is providing advancements in wireless mobile technology for the future of notebooks and a wide range of consumer devices.

Intel® Core™2

Intel® Core™2 Duo processor

Investing in new PCs with Intel® Core™2 processor family can mean big savings for your business. Delivering faster performance, greater energy efficiency, and more responsive multitasking, desktop PCs with Intel® Core™2 processor family can help your whole company be more productive.

By combining breakthrough processing speeds with advanced power saving features, desktop PCs with Intel® Core™2 processor family let you get more done in less time than ever before reducing energy costs by an average of 50 percent.¹ Processors built with Intel's unique 45nm technology offer excellent performance as well as unique energy-saving features that help PCs meet ENERGY STAR² requirements. That means reduced power consumption for desktop PCs and lower energy costs for your company.

Product information

Features and benefits

Get the best overall performance with Intel® Core™2 Duo processor you'll get an arsenal of performance-rich technologies, including up to 6MB of shared L2 cache and up to 1333 MHz Front Side Bus.

Featured white paper

Discover next-generation Intel® Core™ microarchitecture built on 45nm high-k metal gate silicon technology.

Compare products

Compare these desktop products:

Enjoy 3X faster multitasking performance with multi-core processing combines two independent processor cores in one physical package.¹ Processors run at the same frequency and share up to 6MB of L2 cache and up to 1333 MHZ Front Side Bus for truly parallel computing with over.

Improve execution time and energy efficiency with more instructions per clock cycle enabled by Intel® Wide Dynamic Execution.

Get smarter, more energy-efficient performance enabled by Intel® Intelligent Power Capability.

Improve system performance enabled by Intel® Smart Memory Access, optimizing the use of the available data bandwidth.

Get higher-performance, more efficient cache subsystem enabled by Intel® Advanced Smart Cache, optimized for multi-core and dual-core processors.

Accelerate a broad range of applications, , including video, speech and image, photo processing, encryption, financial, engineering and scientific applications, enabled by Intel® Advanced Digital Media Boost.

ATI Radeon HD 5870: DirectX 11, Eyefinity, And Serious Speed

Originally, I titled this piece ATI Radeon HD 5870: Learning From Nvidia's Mistakes. That was an unfair way to kick things off, I decided. But I still want to explain my justification for that idea. When Nvidia launched the GeForce GTX 260 and GTX 280 boards more than a year ago, the company knew it had the fastest board on the market and wasn’t afraid to charge a premium for it; $650, to be exact.

How utterly devastating, then, when the Radeon HD 4870 launched a couple of weeks later, besting the $400 GeForce GTX 260 with a $300 price point. It’s not that ATI had snatched away the performance crown—Nvidia still had the fastest card around. But enthusiasts (especially those who actually bought one of the GeForce GTX 200-series boards) were certainly left feeling gouged when the cards immediately fell to more competitive prices. Good way to earn extra margin on a big GPU. Bad way to encourage brand loyalty.

Without spoiling too much of today’s story, ATI seems to have learned a thing or two from the green faux pas. It’s launching a flagship just under $400 (Ed.: as of November 30th, Radon HD 5870s, when in stock, sell for $410) and a second-in-command board based on the same design at $259 (Ed.: as of November 30th, the least-expensive Radeon HD 5850s sell for $310). That’s still a lot of money, but the two cards are being positioned as GeForce GTX 295 and GeForce GTX 285 killers. Could these boards really knock down Nvidia’s fastest pair at even lower prices?

Experiment: Does Intel’s Turbo Boost Trump Overclocking?

I still remember the PC I owned back in 1998. It was based on a Pentium II 233 with Intel’s Deschutes core, dropped into an Asus P2B motherboard. That system was fast, but I was a bored engineering student and wanted to do more with it. I started with aftermarket air cooling. And although I don’t remember how much overclocking headroom I was able to realize, I do remember that it wasn’t enough. At one point, I pried the plastic cartridge away from the slot-mounted processor and started experimenting with Peltier coolers for better cooling performance. When the proverbial smoke cleared, I was running at a stable 400 MHz—as fast as the most expensive model available at the time, for significantly less money.

Of course, the overclocks today are a lot more significant than 166 MHz. But the principle remains the same: take a processor running at its default settings and squeeze additional value out of it by trying to match the performance of higher-end and more expensive models. With a little effort, it’s actually quite easy to get a sub-$300 Core i7-920 beyond the performance levels of a $1,000 Core i7-975 Extreme without obliterating its reliability.

What About “Auto-Overclocking?”

Overclocking, in general, has always been a bit of a sore subject with AMD and Intel, which officially have to discourage the practice with threats of voided warranties should your CPU show signs of manipulation. Publically, however, both vendors try to appear enthusiast-friendly by giving away overclocking software, facilitating aggressive BIOSes, and selling CPUs with unlocked clock multipliers. Despite those off-the-record endorsements, though, power users simply accept that there’s no such thing as a free lunch, and killing a CPU with too much voltage is sometimes just part of the game.

But with the introduction of Turbo Boost technology in Intel’s LGA 1366-based Core i7 and the subsequent debut of an even more aggressive implementation in the LGA 1156-based Core i5 and Core i7 processors, Intel took it upon itself to implement a form of intelligent overclocking based on a handful of different factors: voltage, amperage, temperature, and operating system P-state requests directly related to CPU utilization.

Zoom

In monitoring each of those parameters, Intel’s onboard power control unit is able to augment performance by increasing clock rate in situations where the processor’s maximum TDP isn’t being reached. By essentially shutting down unused cores, thereby dropping power consumption, more headroom is freed up in single-threaded workloads, a little less when two threads are active, still less with three cores utilized, and so on. Thus, Intel’s “automatic overclocking” exists as an elegant, more granular way to increase performance without taking power consumption over the maximum TDP rating of any given CPU (130W in the case of Intel’s Bloomfields and 95W in the case of the Lynnfields).

Can You Do Better?

The question we asked ourselves—especially after seeing that the Core i7-860 and -870 would accelerate a fantastic 667 MHz in single-threaded apps—was whether it was still worth it for the power user to go all-out with processor overclocking and risk nuking a perfectly good CPU, or simply let Intel’s version of the technology handle business? I hoped that I wasn’t just getting lazy in my old(er) age, and that there’d still be palpable gains to taking the enthusiast’s path to better performance. But I also wasn’t ready to dismiss the efforts Intel’s engineers made in optimizing Nehalem for balanced performance in single- and multi-threaded software.

Intel Tera-scale Computing Research Program

Intel Tera-scale Computing Research Program

The Intel® Tera-scale Computing Research Program is a worldwide effort to advance computing technology for the next decade. future applications more compelling and immersive.

The Single-chip Cloud Computer
Intel Labs has created an experimental “Single-chip Cloud Computer,” a research microprocessor containing the most Intel Architecture cores ever integrated on silicon CPU chip – 48 cores. It incorporates technologies intended to scale multi-core processors to 100 cores and beyond, such as an on-chip network, advanced power management technologies and support for “message-passing.” Architecturally, the chip resembles a cloud of computers integrated into silicon. Click here to learn more.

Tera-scale Computing Research Vision
By scaling multi-core architectures to 10s to 100s of cores and embracing a shift to parallel programming, we aim to improve performance, increase energy-efficiency, and make
"Tera" means 1 trillion, or 1,000,000,000,000. Our vision is to create platforms capable performing of trillions of calculations per second (teraflops) on trillions of bytes of data (terabytes).

ATI Radeon™ HD 5750 Graphics

Expand. Accelerate. Dominate Your Games.

Now, more people than ever can experience real innovation in DirectX® 11 gaming with ATI Radeon™ HD 5700 Series graphics processors. Loaded with advanced technology, these GPUs have the power and premium features you need for fully immersive gameplay. Expand your visual real estate across up to three displays and get lost in the action with revolutionary ATI Eyefinity Technology.1 Using ATI Stream technology, accelerate even the most demanding applications and do more than ever with your PC.2 The first GPUs in this class to offer full support for DirectX 11, these GPUs enable rich, realistic visuals and explosive HD gaming performance so you can dominate the competition.3
Features & Benefits
  • Get unrivalled visual quality and intense gaming performance and for today and tomorrow with support for Microsoft® DirectX® 11
  • With ATI Eyefinity technology get the ultimate immersive gaming experience innovative 'wrap around' multi-display capabilities1,4
  • Tap into the massive parallel processing power of your GPU with ATI Stream technology and tackle demanding tasks like video transcoding with incredible speed2,5
  • Experience the speed, responsiveness and performance of ultra-high bandwidth GDDR5 memory2,5
  • ATI CrossFireX™ technology in dual-mode offers advanced scalability6