版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)
文檔簡介
2014/11/13計算機學(xué)院專業(yè)基礎(chǔ)課計算機組成Cache航空航天大學(xué)計算機學(xué)院航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材:CS61C的11講和12講層次概述直接
Cache直接
Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例212014/11/13Great
Idea
#3:
Principle
of
Locality/Memory
Hierarchymemory
hierarchy/層次Storage
in
a
ComputerProcessorHolds
data
in
register
files
(~
100
bytes)Registers
accessed
on
sub-nanosecond
timescaleMemory
(“main
memory”)More
capacity
than
registers
(~
Gbytes)Access
time~
50-100
nsHundreds
of
clock
cycles
per
memory
access?!7/05/2012Summer
2012--
Lecture
#11422014/11/13101100001000100PerformanceYear“Moore’s
Law”Processor-MemoryPerformance
Gap(grows
50%/year)Processor-Memory
Gap7/05/201251989
In CPU
with
cache
on
chip1998Pentium
III
has
two
cache
levels
on
chipμProc55%/year(2X/1.5yr)DRAM7%/year(2X/10yrs)Summer
2012--
Lecture
#11Principle
of
Locality
(1/3)Principle
of
Locality:
Programs
access
only
asmall
portion
of
the
full
address
space
at
anyinstant
of
timeRecall:
Address
space
holds
both
code
and
dataLoops
and
sequential
instructionexecution
meangenerally
localized
code
accessStack
and
Heap
try
to
keep
your
data
togetherArrays
and
structs
naturally
group
data
you
wouldaccess
together7
05
2012Summer
2012--
Lecture
#116locality/局部性32014/11/13Principle
of
Locality
(2/3)Temporal
Locality
(locality
in
time)
Go
back
to
the
same
book
on
desk
multiple
timesIf
a
memory
location
is
referenced
then
it
will
tendto
be
referenced
againsoonSpatial
Locality
(locality
in
space)
When
go
to
book
shelf,
grab
many
books
on
J.D.Salinger
since
library
stores
related
books
togetherIf
a
memory
location
is
referenced,
the
locationswith
nearby
addresses
will
tend
to
be
referencedsoon7
05
2012
S 2012
L
#11
7temporal
locality/時間局部性spatial
locality/空間局部性Principle
of
Locality
(3/3)We
exploit
the
principle
of
locality
inhardware
via
a
memory
hierarchy
where:–
Levels
closer
to
processor
are
faster(and
more
expensive
per
bit
so
smaller)–
Levels
farther
from
processor
are
larger(and
less
expensive
per
bit
so
slower)Goal:
Create
the
illusion
of
memory
beingalmost
as
fast
as
fastest
memory
and
almostas
large
as
biggest
memory
of
the
hierarchy7/05/2012Summer
2012--
Lecture
#11842014/11/137/05/2012Summer
2012--
Lecture
#119Smaller,Faster,More
expensiveBigger,Slower,Che
rMemory
Hierarchy
SchematicProcessor
Level
1
Level
2Level
nLevel
3..
.Cache
ConceptIntroduce
intermediate
hierarchy
level:
memorycache,
which
holds
a
copy
of
a
subset
of
mainmemory–
As
apun,
often
use
$(“cash”)
to
abbreviate
cache(e.g.
D$
=
D ache,
L1$
=
Level
1
Cache)Modern
processors
have
separate
caches
forinstructions
and
data,
as
well
as
several
levels
ofcaches
implemented
in
different
sizesImplemented
with
same
IC
processing
technologyasC nd
integrated
on-chip
–
faster
but
moreexpensive
than
main
memory7/05/2012 Summer
2012--
Lecture
#11
1052014/11/13Memory
Hierarchy
TechnologiesCaches
use
static
RAM
(SRAM)+
Fast
(typical
access
timesof
0.5
to
2.5
ns)–
Low
density
(6transistor
cells),
higherpower,
expensive($2000
to
$4000
per
GB
in
2011)Static:
content
will
last
as
long
as
power
is
onMain
memory
uses
dynamic
RAM
(DRAM)+
High
density
(1
transistor
cells),
lowerpower,
che
r($20
to
$40
per
GB
in
2011)–
Slower
(typical
access
timesof50
to
70ns)Dynamic:
needs
to
be
“refreshed”
regularly
(~
every
8ms)7/05/2012 Summer
2012--
Lecture
#1111Inclusive:
data
inL1$?
data
in
L2$?
data
in
MM?data
in
SMMemory
Transfer
in
the
HierarchyProcessorL1$L2$Main
MemorySecondary
MemoryBlock:
Unit
oftransfer
betweenmemory
and
cache7/05/20126Summer
2012--
Lecture
#11122014/11/13cache
main
memory–
By
the
cache
controller
hardwareManaging
theHierarchyregisters
memory–
By
compiler
(or
assemblylevel
programmer)main
memory
disks
(secondary
storage)By
the
OS
(virtual
memory,
which
is
a
later
topic)Virtual
to
physical
address
map assisted
bythe
hardware
(TLB)By
theprogrammer
(files)7/05/2012 Summer
2012--
Lecture
#1113We
are
hereTypical
Memory
HierarchySecond
Level
Cache
(SRAM)On-Chip
ComponentsControlDatapathSecondary
Memory
(Diskor
Flash)RegFileMain
Memory
(DRAM)Instr
DataCache
Cache?’s100’shighest1’s10K’s10’sM’s100’sG’s1,000,000’sT’slowestSpeed:(cycles)Size:(bytes)Cost/bit:7/05/2012Summer
2012--
Lecture
#111472014/11/13航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接
Cache直接
Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例Cache
ManagementLibrary
ogy:
anization
isnecessary!What
is
the
overall
anization
of
blocks
weimpose
on
ourcache?Where
do ut
a
block
of
data
from
memory?How
do
we
know
if
a
block
is
already
in
cache?How
do
we
quickly
find
a
block
when
we
need
it?When
do
we
replace
something
in
the
cache?7/05/2012Summer
2012--
Lecture
#111682014/11/137/05/2012Summer
2012--
Lecture
#1117General
Notes
on
CachesRecall:
Memory
is
byte-addressedWe
haven’t
specified
the
size
of
our
“blocks,”but
will
be
multiple
of
word
size
(32-bits)How
do
we
access
individual
words
or
byteswithin
a
block?
OFFSETCache
is
smaller
than
memoryCan’t
fit
all
blocks
at
once,
so
multiple
blocks
inmemory
map
to
the
same
cache
slot
(row)
INDEXNeed
some
way
of
identifying
whi
e
oryblock
is
currently
in
the
row
TAGDirect-Mapped
Caches(1/3)Ea emory
block
is
mapped
to
exactly
onerow
in
the
cache
(direct-mapped)Use
simple
hash
functionEffect
of
block
size:Spatial
locality
dictates
our
blocks
consist
ofadjacent
bytes,
which
differ
in
address
by1Offset
field:
Lowest
bits
of
memory
addresscanbe
used
to
index
to
specific
bytes
within
a
blockBlock
size
needs
to
be
a
power
oftwo
(in
bytes)7/05/2012Summer
2012--
Lecture
#111892014/11/13Direct-Mapped
Caches(2/3)Effect
of
cache
size:
(total
stored
data)Determines
number
of
blocks
cache
holds
If
could
hold
all
of
memory,
would
use
remaining
bits
(minus
offset
bits)
to
select
appropriate
row
of
cacheIndex
field:
Apply
hash
function
to
remainingbitsto
determine
which
row
the
block
goes
in(block
address)
modulo
(#
of
blocks
in
the
cache)Tag
field:
Leftover
upperbitsof
memory
addressdetermine
which
portion
of
memory
the
blockcame
from
(identifier)7/05/2012 Summer
2012--
Lecture
#11
19TIO
Address
BreakdownMemory
address
fields:31
0Tag
Index
OffsetT
bits
I
bits
O
bitsMeaning
of
the
field
sizes:O
bits
?2O
bytes/block
=
2O-2
words/blockI
bits
?2I
rows
in
cache
=
cache
size
/
block
sizeT
bits=
A
–
I
–
O,
whereA
=
#
of
address
bits(A
=
32
here)7/05/201210Summer
2012--
Lecture
#11202014/11/13Direct-Mapped
Caches
(3/3)What’s
actually
in
the
cache?Each
row
contains
theactual
data
block
to
store(B
bits
=
8
×
2O
bits)In
addition,
must
save
Tag
field
of
address
asidentifier
(T
bits)Valid
bit:
Indicates
whether
the
block
in
that
rowis
validor
notTotal
bits
in
cache
=
#
rows
×
(B
+
T
+
1)=
2I
×
(8×2O
+
T+
1)
bits7/05/2012Summer
2012--
Lecture
#1121Cache
Example
(1/2)Cache
parameters:–
Address
space
of
64B,
block
size
of
1
word,cache
size
of
4
wordsTIO
Breakdown:–
1
word
=
4
bytes,
so
O
=
log2(4)
=
2Cache
size
/
block
size
=
4,
so
I
=
log2(4)
=
2A
=
log2(64)
=
6
bits,
so
T
=
6
–
2
–
2
=
2Bits
in
cache
=
22
×
(8×22
+
2
+1)
=140
bits7/05/2012 Summer
2012--
Lecture
#1122Memory
Addresses:Block
address112014/11/13Cache
Example
(2/2)7/05/2012Summer
2012--
Lecture
#1123Main
Memory:On
a
memory
request:2)Check
if
Valid
bit
istrue
in
that
row
of
cache3)
If
valid,
then
check
ifTag
matches00011011TagDataCache:Index
ValidWhich
blocks
map
toeach
rowof
the
cache?(see
colors)Main
Memory
shownin
blocks,
so
offsetbits
not
shown
(x’s)0000000001010101101010101111111100011011000110110001101100011011XXXX
XX
XXXXXX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX001011(let’ssay
two
)1)
Take
Index
field
10Cache
rows
exactlymatch
the
Index
fieldDirect-Mapped
Cache
Internals4words/block,
cache
size
=
1
Ki
words7/05/2012Summer
2012--
Lecture
#1124
Data
(words)
ByteoffsetIndex
Valid
Tag012...25325425531
30
.
.. 13
12
11
...
4
3
2
1
020
8IndexTagHitData32Block
offset20and122014/11/13Caching
Terminology
(1/2)When
reading
memory,
3
things
can
happen:Cache
hit:Cache
block
is
valid
and
contains
the
properaddress,
so
read
the
desired
wordCache
miss:Nothing
in
that
row
of
the
cache
(not
valid),so
fetch
from
memoryCache
miss
with
block
replacement:Wrong
block
is
in
the
row,
so
discard
it
and
fetchdesired
data
from
memory257
05
2012 Summer
2012--
Lecture
#11terminology/術(shù)語miss/缺失Caching
Terminology
(2/2)How
effective
is
your
cache?Want
to
max
cache
hitsand
mincache
missesHit
rate
(HR):
Percentage
of
memory
accesses
in
aprogram
or
set
of
instructions
that
result
in
a
cache
hitMiss
rate
(MR):
Like
hit
rate,
but
for
cachemisses
MR
=
1
–
HRHow
fast
is
your
cache?Hit
time
(HT):
Time
to
access
cache
(including
Tagcomparison)Miss
penalty
(MP):
Time
to
replace
a
block
inthecache
from
a
lower
level
in
the
memory
hierarchy7
05
2012 Summer
2012--
Lecture
#11
26penalty/代價132014/11/13Sources
of
Cache
Misses:
The
3CsCompulsory:
(cold
start
or
process
migration,1st
reference)access
to
block
impossible
to
avoid;Effect
is
small
for
long
running
programsCapacity:Cache
cannot
contain
all
blocksaccessed
by
theprogram:
(collision)Multiple
memory
locations
mapped
to
the
samecache
location7
05
2012Summer
2012--
Lecture
#1127compulsory/強制航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接
Cache直接
Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例28142014/11/132900Mem(0)4
miss15
miss3
hit4
hit00Mem(0)00Mem(1)00Mem(2)00Mem(3)01Mem(4)00Mem(1)00Mem(2)00Mem(3)01Mem(4)00Mem(1)00Mem(2)00Mem(3)01Mem(4)00Mem(1)00Mem(2)00Mem(3)014111500Mem(0)00Mem(1)00Mem(0)00Mem(1)00Mem(2)00Mem(0)00Mem(1)00Mem(2)00Mem(3)8
requests,
6
misses
(HR
=
0.25,
MR
=0.75)Time地址空間:16B,block
size:1B,cache
size:4BTIO=2-2-000000001001000110100001101001111Direct-Mapped
Cache
Example(modified
by
GXP)Consider
the
sequence
of
memory
address
accessesStart
with
an
empty
cache
-
all
blocks
0
1
2
3
4
3
4
15initially
marked
as
not
valid0
miss
1
miss
2
miss
3
missTaking
Advantage
of
Spatial
LocalityLet
cache
block
hold
more
than
one
byte3000Mem(1)Mem(0)0
miss00Mem(1)Mem(0)00Mem(1)Mem(0)00Mem(3)Mem(2)3
hit00Mem(1)Mem(0)00Mem(3)Mem(2)4
miss0100Mem(1)5Mem(0)00Mem(3)Mem(2)43hit01Mem(5)Mem(4)00Mem(3)Mem(2)4
hit01Mem(5)Mem(4)00Mem(3)Mem(2)01Mem(5)Mem(4)00Mem(3)Mem(2)15miss11Start
with
an
empty
cache
-
all
blocks
initially
marked
as
not
valid8
requests,
4
misses
(HR
=
0.5,15
14MR
=
0.5)地址空間:16B,block
size:2B,cache
size:4BTIO=2-1-10000
0001
0010
0011
0100
0011
0100
11111
hit
2
miss0
1
2
3
4
3
4
15152014/11/130516
32Miss
rate
(%)64
128
256Block
size
(bytes)4
KB16
KB64
KB256
KBEffect
of
Block
and
Cache
Sizeson
Miss
Rate10Cache
SizeMiss
rate
goes
upifthe
block
size es
a
significant
fractionof
the
cache
size
because
the
numberof
blocksthat
can
be
heldin
the
same
size
cache
is
smaller
(increasing
capacity
misses)7/05/2012 Summer
2012--
Lecture
#11
31航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接
Cache直接
Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例32162014/11/13Cache一致性問題時間事件Cache內(nèi)容位置X的主存內(nèi)存011A讀X112A將0寫入X0133Cache
Reads
and
WritesWant
to
handle
reads
and
writes
quickly
whilemaintaining
consistency
between
cache
andmemory
(i.e.bot ow
about
all
updates)Policies
for
cache
hits
and
misses
are
independentHere
we
assume
the
use
of
separateinstruction
and
d aches
(I$
and
D$)Read
from
bothWrite
only
to
D$
(assume
no
self-modifying
code)7
05
2012Summer
2012--
Lecture
#1134consistency/一致性172014/11/13Handling
Cache
HitsRead
hits
(I$
and
D$)–
Fastest
possible
scenario,
so
want
more
of
theseWrite
hits
(D$)Write-Through
Policy:
Always
write
data
tocache
and
to
memory
(through
cache)Forces
cache
and
memory
to
always
be
consistentSlow!
(every
memory
access
is
long)Include
a
Write
Buffer
that
updates
memory
in
parallelwith
processorAssume
present
in
all
schemeswhen
writing
tomemory7/05/2012 Summer
2012--
Lecture
#11
35Handling
Cache
HitsRead
hits
(I$
and
D$)–
Fastest
possible
scenario,
so
want
more
of
theseWrite
hits
(D$)Write-Back
Policy:
Write
dataonly
to
cache,
thenupdate
memory
when
block
is
removedAllows
cache
and
memory
to
be
inconsistentMultiple
writes
collected
in
cache;
single
write
tomemory
per
blockDirty
bit:
Extra
bit
per
cacherow
that
is
set
if
block
waswritten
to
(is
“dirty”)
and
needs
to
be
written
back7/05/2012 Summer
2012--
Lecture
#11
36182014/11/13Handling
Cache
MissesMiss
penalty
grows
as
block
size
doesRead
misses
(I$
and
D$)–
Stall
execution,
fetch
block
from
memory,
put
incache,
send
requested
word
to
processor,
resumeWrite
misses
(D$)Write
allocate:
Fetch
block
from
memory,
put
incache,
execute
a
write
hitWorks
with
either
write-through
or
write-backEnsures
cache
is
up-to-date
after
write
miss7/05/2012Summer
2012--
Lecture
#1137Handling
Cache
MissesMiss
penalty
grows
as
block
size
doesRead
misses
(I$
and
D$)–
Stall
execution,
fetch
block
from
memory,
put
incache,
send
requested
word
to
processor,
resumeWrite
misses
(D$)No-write
allocate:
Skip
cache
altogether
andwrite
directly
to
memoryCache
is
never
up-to-date
after
write
missEnsures
memory
is
always
up-to-date197/05/2012Summer
2012--
Lecture
#11382014/11/13SummaryMemory
hierarchy
exploits
principle
of
localityto
deliver
lots
of
memory
at
fast
speedsDirect-Mapped
Cache:
Each
blockin
memorymaps
to
exactly
one
row
in
the
cacheIndex
to
determine
which
rowOffset
to
determine
which
byte
within
blockTag
to
identify
if
it’s
the
block
you
wantCache
read
and
write
policies:Write-back
and
write-through
for
hitsWrite
allocate
and
no-write
allocate
for
misses7/05/2012 Summer
2012--
Lecture
#11
40提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接
Cache直接
Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例航空航天大學(xué)計算機學(xué)院202014/11/13Great
Idea
#3:
Principle
of
Locality/Memory
HierarchyCache
PerformanceTwo
things
hurt
the
performance
of
acache:–
Miss
rate
and
miss
penaltyAverage
Memory
Access
Time
(AMAT):
average
timeto
access
memory
considering
both
hits
and
missesAMAT
=
Hit
time
+
Miss
rate
×
Miss
penalty(abbreviated
AMAT
=
HT
+
MR
×
MP)217/09/2012Summer
2012--
Lecture
#12432014/11/137/09/2012Summer
2012--
Lecture
#1244AMAT
Example
UsageProcessor
specs:
200ps
clock,
MP
of
50
clockcycles,
MR
of
0.02
misses/instruction,
and
HTof
1
clock
cycleAMAT
=
1
+
0.02
×
50
=
2
clock
cycles
=
400
psWhich
improvement
would
be
best?190
ps
clock
380
psMP
of
40
clock
cycles
360
psMR
of
0.015
misses/instruction
350
psCache
Parameter
ExampleWhat
is
the
potential
impact
of
much
largercache
on
AMAT?
(same
block
size)Increase
HRLonger
HT:
smaller
is
faster–
At
some
point,increase
in
hit
time
for
a
largercache
may
e
the
improvement
in
hitrate,yielding
a
decrease
in
performanceEffect
on
TIO?
Bits
in
cache?
Cost?227/09/2012Summer
2012--
Lecture
#12452014/11/13Effect
of
Cache
Performance
on
CPIRecall:
CPU
PerformanceCPU
Time
=
Instructions
×
CPI
×
Clock
Cycle
Time(IC)
(CC)Include
memory
accesses
in
CPI:CPIstall
=
CPIbase
+
Average
Memory-stallCyclesCPU
Time=
IC
×
CPIstall
×
CCSimplified
model
for
memory-stall
cycles:Memory-stall
cycles–
We
will
discuss
more
complicated
models
soon7/09/2012 Summer
2012--
Lecture
#1246CPI
ExampleProcessor
specs:
CPIbase
of
1,
a
100
cycle
MP,36%
load/store
instructions,
and
2%
I$
and
4%D$
MRsHow
many
times
per
instruction
do
we
access
theI$?
The
D$?MP
is
assumed
the
same
for
both
I$
andD$Memory-stall
cycles
will
be
sum
of
stall
cycles
forboth
I$
and
D$237/09/2012Summer
2012--
Lecture
#12472014/11/13CPI
ExampleProcessor
specs:
CPIbase
of
1,
a
100
cycle
MP,36%
load/store
instructions,
and
2%
I$
and
4%D$
MRsMemory-stall
cycles=
(100%×
2%
+
36%
×
4%)
×
100
=
3.44I$
D$CPIstall
=
1
+
3.44
=
4.44
(more
than
3x
CPIbase!)What
if
the
CPIbase
is
reduced
to
1?What
if
the
D$
miss
rate
went
up
by1%?7/09/2012 Summer
2012--
Lecture
#12
487/09/2012Summer
2012--
Lecture
#1250The
3Cs
Revisited:
Design
SolutionsCompulsory:Increase
block
size
(increases
MP;
too
large
blockscould
increase
MR)Capacity:Increase
cache
size
(may
increase
HT):Increase
cache
sizeIncrease
associativity
(mayincrease
HT)242014/11/13航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接
Cache直接
Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例Multiple
Cache
LevelsWith
advancing
technology,have
moreroomondie
for
bigger
L1
cachesand
for
L2
(andinsome
cases
even
L3)
cache–
Normally
lower-level
caches
are
unified(i.e.
holds
both
instructions
and
data)Multilevel
caching
is
a
way
to
reduce
misspenaltySo
what
does
this
look
like?257/09/2012Summer
2012--
Lecture
#12522014/11/13ReturnMultilevel
Cache
Diagram7/09/2012Summer
2012--
Lecture
#1253L2$Main
Memory...L1$CPU
MemoryAccessMissMissHitHitLegend:Request
for
dataReturn
ofdataStoreStorePath
of
data
back
to
CPUMultilevel
Cache
AMATAMAT
=
L1
HT
+
L1
MR
×
L1
MPNow
L1
MP
depends
on
other
cache
levelsL1
MP
=
L2HT
+
L2
MR
×
L2MPIf
more
levels,
then
continue
thischain(i.e.
MPi
=
HTi+1
+
MRi+1
×MPi+1)Final
MP
is
main
memory
access
timeFor
two
levels:AMAT
=
L1
HT
+
L1
MR
×
(L2
HT
+
L2
MR
×
L2
MP)267/09/2012Summer
2012--
Lecture
#12542014/11/13Multilevel
Cache
AMAT
ExampleProcessor
specs:
1
cycle
L1
HT,
2%
L1MR,5
cycle
L2
HT,
5%
L2
MR,
100
cycle
mainmemoryHT–
Here
assuming
unified
L1$Without
L2$:AMAT1
=
1+
0.02
×
100
=
3With
L2$:AMAT2
=
1
+
0.02
×
(5
+
0.05
×
100)
=
1.27/09/2012Summer
2012--
Lecture
#1255Local
vs.
Global
Miss
RatesLocal
miss
rate:
Fraction
of
references
toonelevel
of
a
cache
that
misse.g.
L2$
local
MR
=
L2$
misses/L1$
missesSpecific
to
level
of
caching
(as
used
in
AMAT)Global
miss
rate:
Fraction
of
all
referencesthat
miss
in
all
levels
of
a
multilevel
cacheProperty
of
the
overall
memory
hierarchyGlobal
MR
is
the
product
of
all
local
MRsStart
at
Global
MR
=
Ln
misses/L1
accessesand
expandSo
by
definition,
global
MR
≤
any
local
MR277/09/2012Summer
2012--
Lecture
#12562014/11/13Memory
Hierarchy
withTwo
Cache
Levels57CPUL1$L2$MM1000mem
refs40
mem
refs20
mem
refs1
cycle 10
cycles 100
cyclesFor
every
1000
CPU
to
memory
references–40
will
miss
in
L1$;
what
is
the
local
MR?
0.04–20
will
miss
in
L2$;
what
is
the
local
MR?
0.5–Global
miss
rate?
0.027/09/2012 Summer
2012--
Lecture
#12Design
ConsiderationsL1$
focuses
on
low
hit
time
(fast
access)minimize
HT
to
achieve
shorterclock
cycleL1
MP
significantly
reduced
by
presence
of
L2$,
socan
be
smaller/faster
even
with
higher
MRe.g.
smaller
$
(fewer
rows)L2$,
L3$
focus
on
low
miss
rateAs
much
as
possible
avoid
reaching
to
mainmemory
(heavy
penalty)e.g.
larger
$
with
larger
block
sizes
(same
#
rows)287/09/2012Summer
2012--
Lecture
#12592014/11/13航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接
Cache直接
Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例Reducing
Cache
MissesAllow
more
flexible
block
placement
in
cache:Direct-mapped:
Memory
block
maps
to
exactlyone
cache
blockFully
associative:
Memory
block
can
go
inanyslotN-way
set-associative:
Divide
$
into
sets,
each
ofwhich
consists
of
n
slots
to
place
memory
blockMemory
block
maps
to
a
set
determined
byIndexfield
andis
placed
in
any
of
the
n
slotsof
that
setHash
function:
(block
address)
modulo
(#
sets
in
thecache)7/09/2012 Summer
2012--
Lecture
#12
61292014/11/13Block
Placement
SchemesPlace
memory
block12
in
a
cache
that
holds
8blocksDirect-mapped:
Can
only
go
in
row
(12
mod8)
=
4Fully
associative:
Can
go
in
any
of
the
slots(1
set/row)2-way
set
associative:
Can
go
ineither
slot
of
set(12
mod
4)
=
07/09/2012 Summer
2012--Lecture
#1262Effect
of
Associativity
on
TIO(1/2)Here
we
assume
a
cache
of
fixed
size(C)Offset:
#
of
bytes
in
a
block
(same
as
before)Index:
Instead
of
pointing
to
a
row,
nowpoints
to
a
set,
so
I
=
C/B/associativity?
Fullyassociative
(1
set):
0
Index
bits!?
Direct-mapped
(associativity
of
1):
max
Indexbits?
Set
associative:
somewhere
in-betweenTag:
Remaining
identifier
bits
(T
=
A
–
I
–
O)307/09/2012Summer
2012--
Lecture
#12632014/11/13Summer
2012--
Lecture
#1264Fully
associative(only
one
set)Decreasing
associativityDirect
mapped(only
one
way)7/09/2012Effect
of
Associativity
on
TIO
(2/2)For
a
fixed-size
cache,
each
increase
by
a
factor
oftwo
in
associativity
doubles
the
number
of
blocksper
set
(i.e.
the
number
of
slots)
and
halves
thenumber
of
sets
–decreasing
the
size
of
the
Indexby
1
bit
and
increases
the
size
of
the
Tag
by
1
bitUsed
for
tagcomparison Selects
the
set Selects
the
word
in
theblockTag
Index
Block
offset
Byte
offsetIncreasing
associativity65Set
Associative
Example
(1/2)Cache
parameters:6-bit
addresses,
block
size
of
1
word,cache
size
of
4
words,
2-way
set
associativeHow
many
sets?C/B/associativ
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2024年甲乙雙方關(guān)于新一代智能電氣安裝工程全面合作合同
- 2024招投標(biāo)管理部門風(fēng)險防控及合同履行責(zé)任書3篇
- 浙江工商大學(xué)《地貌學(xué)》2023-2024學(xué)年第一學(xué)期期末試卷
- 2024蘇州二手房買賣與家居智能化改造服務(wù)合同3篇
- 貨代公司知識培訓(xùn)課件
- 商品基礎(chǔ)知識培訓(xùn)課件
- 稅務(wù)工作總結(jié)稅收違法違章行為查處整改
- 2024智能供應(yīng)鏈管理系統(tǒng)建設(shè)與運營合同
- 房屋租賃行業(yè)市場營銷策略總結(jié)
- 西南財經(jīng)大學(xué)《商務(wù)實踐活動一》2023-2024學(xué)年第一學(xué)期期末試卷
- 檢驗科lis系統(tǒng)需求
- 疏散樓梯安全要求全解析
- 汽車擾流板產(chǎn)品原材料供應(yīng)與需求分析
- 中東及非洲空氣制水機行業(yè)現(xiàn)狀及發(fā)展機遇分析2024-2030
- DL∕T 1631-2016 并網(wǎng)風(fēng)電場繼電保護配置及整定技術(shù)規(guī)范
- PLC控制系統(tǒng)合同(2024版)
- 煤礦立井井筒及硐室設(shè)計規(guī)范
- 房地產(chǎn)項目開發(fā)合作協(xié)議書
- JJG(交通) 171-2021 超聲式成孔質(zhì)量檢測儀檢定規(guī)程
- QCT457-2023救護車技術(shù)規(guī)范
- 《中國大熊貓》課件大綱
評論
0/150
提交評論