Exercise 5.8
Derive the cost of your processor time by dividing its
price over its expected productive lifetime.
Find, calculate, or measure the time required to access a byte
at each level of your system's memory hierarchy, and calculate
the monetary cost of each access.
Also calculate the cost of each byte in the memory hierarchy
(you can approximate the cost of a processor's internal caches
by assuming a fixed price for each processor's transistor, and
working from the published transistor counts for the processor.)
List your results in a table, and discuss whether the prices you
found are fair, or whether a given memory technology is over or
under-valued.
Deviating somehow from the exercise's letter we can still
get a feeling of how different memory types compare in practice.
I've calculated some numbers for a fairly typical configuration,
based on some currently best-selling middle-range components:
an AMD Athlon XP 3000+ processor,
a 256 MB PC2700 DDR memory module, and
a 250 GB 7200 RPM Maxtor hard drive.
The results appear in the table on the right.
I obtained the component prices from TigerDirect.com on
January 19, 2006.
I calculated the cost of the cache memory by multiplying the processor's price
by the die area occupied by the corresponding cache divided by the
total size of the processor die (I measured the sizes on a die photograph).
The worst case latency column lists the time it would
take to fetch a byte, under the worst possible scenario:
for example, a single byte
from the same bank and following a write for the DDR RAM;
with a maximum seek, rotational latency, and controller overhead for
the hard drive.
On the other hand, the sustained throughout column lists
numbers where the devices operate close to ideal conditions
for pumping out bytes as fast as possible:
8 bytes delivered at double the bus speed for the DDR RAM;
the maximum sustained outer diameter data rate for the hard drive.
In all cases,
the ratio between bandwidth implied by the worst case latency and the
sustained bandwidth is at least one order of magnitude,
and it is this difference that allows our machines to deliver the
performance we expect.
In particular, the ratio is 27 for the level 1 cache, 56 for the level 2 cache,
76 for the DDR RAM, and 1.8 million for the hard drive.
Note that as we move away from the processor there are more tricks
we can play to increase the bandwidth,
and we can get away with more factors that increase the latency.
The byte cost for each different kind of memory varies by three orders of magnitude:
with one dollar we can buy KBs of cache memory,
MBs of DDR RAM, and GBs of disk space.
However,
as one would expect, cheaper memory has a higher latency and a lower throughput.
Things get more interesting when we examine the productivity of various
memory types.
Productivity is typically measured as output per unit of input;
in our case I calculated it as read operations per second and $ cost
for one byte.
As you can see, if we look at the best case scenarios (the device operating at its maximum bandwidth),
the hard drive's bytes are the most productive.
In the worst case (latency-based) scenarios the productivity performance of the disk
is abysmal, and this is why disks are nowadays furnished with abundant amounts of cache memory
(8 MB in our case).
The most productive device in the worst case latency-based measurements is the DDR RAM.
These results are what we would expect from an engineering point of view:
the hard disk, which is workhorse used for storing large amounts of data with the minimum cost,
should offer the best overall productivity under ideal (best case) conditions,
while the DDR RAM, which is used for satisfying a system's general purpose storage requirements,
should offer the best overall productivity even under worst case conditions.
Also note the low productivity of the level 1 and level 2 caches.
This factor easily explains why processor caches are relatively small:
they work admirably well, but they are expensive for the work they do.
|
Component | Nominal size | Worst case latency | Sustained throughput (MB/s) | $1 buys | Productivity (Bytes read / s / $) |
Worst case | Best case |
L1 D cache | 64 KB | 1.4ns | 19022 | 10.7 KB | 7.91·1012 | 2.19·1014 |
L2 cache | 512 KB | 9.7ns | 5519 | 12.8 KB | 1.35·1012 | 7.61·1013 |
DDR RAM | 256 MB | 28.5ns | 2541 | 9.48 MB | 3.48·1014 | 2.65·1016 |
Hard drive | 250 GB | 25.6ms | 67 | 2.91 GB | 1.22·1011 | 2.17·1017 |
Performance and cost of various memory types.
|