l航空航天計算機學(xué)院_第1頁
l航空航天計算機學(xué)院_第2頁
l航空航天計算機學(xué)院_第3頁
l航空航天計算機學(xué)院_第4頁
l航空航天計算機學(xué)院_第5頁
已閱讀5頁,還剩38頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

2014/11/13計算機學(xué)院專業(yè)基礎(chǔ)課計算機組成Cache航空航天大學(xué)計算機學(xué)院航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材:CS61C的11講和12講層次概述直接

Cache直接

Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例212014/11/13Great

Idea

#3:

Principle

of

Locality/Memory

Hierarchymemory

hierarchy/層次Storage

in

a

ComputerProcessorHolds

data

in

register

files

(~

100

bytes)Registers

accessed

on

sub-nanosecond

timescaleMemory

(“main

memory”)More

capacity

than

registers

(~

Gbytes)Access

time~

50-100

nsHundreds

of

clock

cycles

per

memory

access?!7/05/2012Summer

2012--

Lecture

#11422014/11/13101100001000100PerformanceYear“Moore’s

Law”Processor-MemoryPerformance

Gap(grows

50%/year)Processor-Memory

Gap7/05/201251989

In CPU

with

cache

on

chip1998Pentium

III

has

two

cache

levels

on

chipμProc55%/year(2X/1.5yr)DRAM7%/year(2X/10yrs)Summer

2012--

Lecture

#11Principle

of

Locality

(1/3)Principle

of

Locality:

Programs

access

only

asmall

portion

of

the

full

address

space

at

anyinstant

of

timeRecall:

Address

space

holds

both

code

and

dataLoops

and

sequential

instructionexecution

meangenerally

localized

code

accessStack

and

Heap

try

to

keep

your

data

togetherArrays

and

structs

naturally

group

data

you

wouldaccess

together7

05

2012Summer

2012--

Lecture

#116locality/局部性32014/11/13Principle

of

Locality

(2/3)Temporal

Locality

(locality

in

time)

Go

back

to

the

same

book

on

desk

multiple

timesIf

a

memory

location

is

referenced

then

it

will

tendto

be

referenced

againsoonSpatial

Locality

(locality

in

space)

When

go

to

book

shelf,

grab

many

books

on

J.D.Salinger

since

library

stores

related

books

togetherIf

a

memory

location

is

referenced,

the

locationswith

nearby

addresses

will

tend

to

be

referencedsoon7

05

2012

S 2012

L

#11

7temporal

locality/時間局部性spatial

locality/空間局部性Principle

of

Locality

(3/3)We

exploit

the

principle

of

locality

inhardware

via

a

memory

hierarchy

where:–

Levels

closer

to

processor

are

faster(and

more

expensive

per

bit

so

smaller)–

Levels

farther

from

processor

are

larger(and

less

expensive

per

bit

so

slower)Goal:

Create

the

illusion

of

memory

beingalmost

as

fast

as

fastest

memory

and

almostas

large

as

biggest

memory

of

the

hierarchy7/05/2012Summer

2012--

Lecture

#11842014/11/137/05/2012Summer

2012--

Lecture

#119Smaller,Faster,More

expensiveBigger,Slower,Che

rMemory

Hierarchy

SchematicProcessor

Level

1

Level

2Level

nLevel

3..

.Cache

ConceptIntroduce

intermediate

hierarchy

level:

memorycache,

which

holds

a

copy

of

a

subset

of

mainmemory–

As

apun,

often

use

$(“cash”)

to

abbreviate

cache(e.g.

D$

=

D ache,

L1$

=

Level

1

Cache)Modern

processors

have

separate

caches

forinstructions

and

data,

as

well

as

several

levels

ofcaches

implemented

in

different

sizesImplemented

with

same

IC

processing

technologyasC nd

integrated

on-chip

faster

but

moreexpensive

than

main

memory7/05/2012 Summer

2012--

Lecture

#11

1052014/11/13Memory

Hierarchy

TechnologiesCaches

use

static

RAM

(SRAM)+

Fast

(typical

access

timesof

0.5

to

2.5

ns)–

Low

density

(6transistor

cells),

higherpower,

expensive($2000

to

$4000

per

GB

in

2011)Static:

content

will

last

as

long

as

power

is

onMain

memory

uses

dynamic

RAM

(DRAM)+

High

density

(1

transistor

cells),

lowerpower,

che

r($20

to

$40

per

GB

in

2011)–

Slower

(typical

access

timesof50

to

70ns)Dynamic:

needs

to

be

“refreshed”

regularly

(~

every

8ms)7/05/2012 Summer

2012--

Lecture

#1111Inclusive:

data

inL1$?

data

in

L2$?

data

in

MM?data

in

SMMemory

Transfer

in

the

HierarchyProcessorL1$L2$Main

MemorySecondary

MemoryBlock:

Unit

oftransfer

betweenmemory

and

cache7/05/20126Summer

2012--

Lecture

#11122014/11/13cache

main

memory–

By

the

cache

controller

hardwareManaging

theHierarchyregisters

memory–

By

compiler

(or

assemblylevel

programmer)main

memory

disks

(secondary

storage)By

the

OS

(virtual

memory,

which

is

a

later

topic)Virtual

to

physical

address

map assisted

bythe

hardware

(TLB)By

theprogrammer

(files)7/05/2012 Summer

2012--

Lecture

#1113We

are

hereTypical

Memory

HierarchySecond

Level

Cache

(SRAM)On-Chip

ComponentsControlDatapathSecondary

Memory

(Diskor

Flash)RegFileMain

Memory

(DRAM)Instr

DataCache

Cache?’s100’shighest1’s10K’s10’sM’s100’sG’s1,000,000’sT’slowestSpeed:(cycles)Size:(bytes)Cost/bit:7/05/2012Summer

2012--

Lecture

#111472014/11/13航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接

Cache直接

Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例Cache

ManagementLibrary

ogy:

anization

isnecessary!What

is

the

overall

anization

of

blocks

weimpose

on

ourcache?Where

do ut

a

block

of

data

from

memory?How

do

we

know

if

a

block

is

already

in

cache?How

do

we

quickly

find

a

block

when

we

need

it?When

do

we

replace

something

in

the

cache?7/05/2012Summer

2012--

Lecture

#111682014/11/137/05/2012Summer

2012--

Lecture

#1117General

Notes

on

CachesRecall:

Memory

is

byte-addressedWe

haven’t

specified

the

size

of

our

“blocks,”but

will

be

multiple

of

word

size

(32-bits)How

do

we

access

individual

words

or

byteswithin

a

block?

OFFSETCache

is

smaller

than

memoryCan’t

fit

all

blocks

at

once,

so

multiple

blocks

inmemory

map

to

the

same

cache

slot

(row)

INDEXNeed

some

way

of

identifying

whi

e

oryblock

is

currently

in

the

row

TAGDirect-Mapped

Caches(1/3)Ea emory

block

is

mapped

to

exactly

onerow

in

the

cache

(direct-mapped)Use

simple

hash

functionEffect

of

block

size:Spatial

locality

dictates

our

blocks

consist

ofadjacent

bytes,

which

differ

in

address

by1Offset

field:

Lowest

bits

of

memory

addresscanbe

used

to

index

to

specific

bytes

within

a

blockBlock

size

needs

to

be

a

power

oftwo

(in

bytes)7/05/2012Summer

2012--

Lecture

#111892014/11/13Direct-Mapped

Caches(2/3)Effect

of

cache

size:

(total

stored

data)Determines

number

of

blocks

cache

holds

If

could

hold

all

of

memory,

would

use

remaining

bits

(minus

offset

bits)

to

select

appropriate

row

of

cacheIndex

field:

Apply

hash

function

to

remainingbitsto

determine

which

row

the

block

goes

in(block

address)

modulo

(#

of

blocks

in

the

cache)Tag

field:

Leftover

upperbitsof

memory

addressdetermine

which

portion

of

memory

the

blockcame

from

(identifier)7/05/2012 Summer

2012--

Lecture

#11

19TIO

Address

BreakdownMemory

address

fields:31

0Tag

Index

OffsetT

bits

I

bits

O

bitsMeaning

of

the

field

sizes:O

bits

?2O

bytes/block

=

2O-2

words/blockI

bits

?2I

rows

in

cache

=

cache

size

/

block

sizeT

bits=

A

I

O,

whereA

=

#

of

address

bits(A

=

32

here)7/05/201210Summer

2012--

Lecture

#11202014/11/13Direct-Mapped

Caches

(3/3)What’s

actually

in

the

cache?Each

row

contains

theactual

data

block

to

store(B

bits

=

8

×

2O

bits)In

addition,

must

save

Tag

field

of

address

asidentifier

(T

bits)Valid

bit:

Indicates

whether

the

block

in

that

rowis

validor

notTotal

bits

in

cache

=

#

rows

×

(B

+

T

+

1)=

2I

×

(8×2O

+

T+

1)

bits7/05/2012Summer

2012--

Lecture

#1121Cache

Example

(1/2)Cache

parameters:–

Address

space

of

64B,

block

size

of

1

word,cache

size

of

4

wordsTIO

Breakdown:–

1

word

=

4

bytes,

so

O

=

log2(4)

=

2Cache

size

/

block

size

=

4,

so

I

=

log2(4)

=

2A

=

log2(64)

=

6

bits,

so

T

=

6

2

2

=

2Bits

in

cache

=

22

×

(8×22

+

2

+1)

=140

bits7/05/2012 Summer

2012--

Lecture

#1122Memory

Addresses:Block

address112014/11/13Cache

Example

(2/2)7/05/2012Summer

2012--

Lecture

#1123Main

Memory:On

a

memory

request:2)Check

if

Valid

bit

istrue

in

that

row

of

cache3)

If

valid,

then

check

ifTag

matches00011011TagDataCache:Index

ValidWhich

blocks

map

toeach

rowof

the

cache?(see

colors)Main

Memory

shownin

blocks,

so

offsetbits

not

shown

(x’s)0000000001010101101010101111111100011011000110110001101100011011XXXX

XX

XXXXXX

XX

XX

XX

XX

XX

XX

XX

XX

XX

XX001011(let’ssay

two

)1)

Take

Index

field

10Cache

rows

exactlymatch

the

Index

fieldDirect-Mapped

Cache

Internals4words/block,

cache

size

=

1

Ki

words7/05/2012Summer

2012--

Lecture

#1124

Data

(words)

ByteoffsetIndex

Valid

Tag012...25325425531

30

.

.. 13

12

11

...

4

3

2

1

020

8IndexTagHitData32Block

offset20and122014/11/13Caching

Terminology

(1/2)When

reading

memory,

3

things

can

happen:Cache

hit:Cache

block

is

valid

and

contains

the

properaddress,

so

read

the

desired

wordCache

miss:Nothing

in

that

row

of

the

cache

(not

valid),so

fetch

from

memoryCache

miss

with

block

replacement:Wrong

block

is

in

the

row,

so

discard

it

and

fetchdesired

data

from

memory257

05

2012 Summer

2012--

Lecture

#11terminology/術(shù)語miss/缺失Caching

Terminology

(2/2)How

effective

is

your

cache?Want

to

max

cache

hitsand

mincache

missesHit

rate

(HR):

Percentage

of

memory

accesses

in

aprogram

or

set

of

instructions

that

result

in

a

cache

hitMiss

rate

(MR):

Like

hit

rate,

but

for

cachemisses

MR

=

1

HRHow

fast

is

your

cache?Hit

time

(HT):

Time

to

access

cache

(including

Tagcomparison)Miss

penalty

(MP):

Time

to

replace

a

block

inthecache

from

a

lower

level

in

the

memory

hierarchy7

05

2012 Summer

2012--

Lecture

#11

26penalty/代價132014/11/13Sources

of

Cache

Misses:

The

3CsCompulsory:

(cold

start

or

process

migration,1st

reference)access

to

block

impossible

to

avoid;Effect

is

small

for

long

running

programsCapacity:Cache

cannot

contain

all

blocksaccessed

by

theprogram:

(collision)Multiple

memory

locations

mapped

to

the

samecache

location7

05

2012Summer

2012--

Lecture

#1127compulsory/強制航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接

Cache直接

Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例28142014/11/132900Mem(0)4

miss15

miss3

hit4

hit00Mem(0)00Mem(1)00Mem(2)00Mem(3)01Mem(4)00Mem(1)00Mem(2)00Mem(3)01Mem(4)00Mem(1)00Mem(2)00Mem(3)01Mem(4)00Mem(1)00Mem(2)00Mem(3)014111500Mem(0)00Mem(1)00Mem(0)00Mem(1)00Mem(2)00Mem(0)00Mem(1)00Mem(2)00Mem(3)8

requests,

6

misses

(HR

=

0.25,

MR

=0.75)Time地址空間:16B,block

size:1B,cache

size:4BTIO=2-2-000000001001000110100001101001111Direct-Mapped

Cache

Example(modified

by

GXP)Consider

the

sequence

of

memory

address

accessesStart

with

an

empty

cache

-

all

blocks

0

1

2

3

4

3

4

15initially

marked

as

not

valid0

miss

1

miss

2

miss

3

missTaking

Advantage

of

Spatial

LocalityLet

cache

block

hold

more

than

one

byte3000Mem(1)Mem(0)0

miss00Mem(1)Mem(0)00Mem(1)Mem(0)00Mem(3)Mem(2)3

hit00Mem(1)Mem(0)00Mem(3)Mem(2)4

miss0100Mem(1)5Mem(0)00Mem(3)Mem(2)43hit01Mem(5)Mem(4)00Mem(3)Mem(2)4

hit01Mem(5)Mem(4)00Mem(3)Mem(2)01Mem(5)Mem(4)00Mem(3)Mem(2)15miss11Start

with

an

empty

cache

-

all

blocks

initially

marked

as

not

valid8

requests,

4

misses

(HR

=

0.5,15

14MR

=

0.5)地址空間:16B,block

size:2B,cache

size:4BTIO=2-1-10000

0001

0010

0011

0100

0011

0100

11111

hit

2

miss0

1

2

3

4

3

4

15152014/11/130516

32Miss

rate

(%)64

128

256Block

size

(bytes)4

KB16

KB64

KB256

KBEffect

of

Block

and

Cache

Sizeson

Miss

Rate10Cache

SizeMiss

rate

goes

upifthe

block

size es

a

significant

fractionof

the

cache

size

because

the

numberof

blocksthat

can

be

heldin

the

same

size

cache

is

smaller

(increasing

capacity

misses)7/05/2012 Summer

2012--

Lecture

#11

31航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接

Cache直接

Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例32162014/11/13Cache一致性問題時間事件Cache內(nèi)容位置X的主存內(nèi)存011A讀X112A將0寫入X0133Cache

Reads

and

WritesWant

to

handle

reads

and

writes

quickly

whilemaintaining

consistency

between

cache

andmemory

(i.e.bot ow

about

all

updates)Policies

for

cache

hits

and

misses

are

independentHere

we

assume

the

use

of

separateinstruction

and

d aches

(I$

and

D$)Read

from

bothWrite

only

to

D$

(assume

no

self-modifying

code)7

05

2012Summer

2012--

Lecture

#1134consistency/一致性172014/11/13Handling

Cache

HitsRead

hits

(I$

and

D$)–

Fastest

possible

scenario,

so

want

more

of

theseWrite

hits

(D$)Write-Through

Policy:

Always

write

data

tocache

and

to

memory

(through

cache)Forces

cache

and

memory

to

always

be

consistentSlow!

(every

memory

access

is

long)Include

a

Write

Buffer

that

updates

memory

in

parallelwith

processorAssume

present

in

all

schemeswhen

writing

tomemory7/05/2012 Summer

2012--

Lecture

#11

35Handling

Cache

HitsRead

hits

(I$

and

D$)–

Fastest

possible

scenario,

so

want

more

of

theseWrite

hits

(D$)Write-Back

Policy:

Write

dataonly

to

cache,

thenupdate

memory

when

block

is

removedAllows

cache

and

memory

to

be

inconsistentMultiple

writes

collected

in

cache;

single

write

tomemory

per

blockDirty

bit:

Extra

bit

per

cacherow

that

is

set

if

block

waswritten

to

(is

“dirty”)

and

needs

to

be

written

back7/05/2012 Summer

2012--

Lecture

#11

36182014/11/13Handling

Cache

MissesMiss

penalty

grows

as

block

size

doesRead

misses

(I$

and

D$)–

Stall

execution,

fetch

block

from

memory,

put

incache,

send

requested

word

to

processor,

resumeWrite

misses

(D$)Write

allocate:

Fetch

block

from

memory,

put

incache,

execute

a

write

hitWorks

with

either

write-through

or

write-backEnsures

cache

is

up-to-date

after

write

miss7/05/2012Summer

2012--

Lecture

#1137Handling

Cache

MissesMiss

penalty

grows

as

block

size

doesRead

misses

(I$

and

D$)–

Stall

execution,

fetch

block

from

memory,

put

incache,

send

requested

word

to

processor,

resumeWrite

misses

(D$)No-write

allocate:

Skip

cache

altogether

andwrite

directly

to

memoryCache

is

never

up-to-date

after

write

missEnsures

memory

is

always

up-to-date197/05/2012Summer

2012--

Lecture

#11382014/11/13SummaryMemory

hierarchy

exploits

principle

of

localityto

deliver

lots

of

memory

at

fast

speedsDirect-Mapped

Cache:

Each

blockin

memorymaps

to

exactly

one

row

in

the

cacheIndex

to

determine

which

rowOffset

to

determine

which

byte

within

blockTag

to

identify

if

it’s

the

block

you

wantCache

read

and

write

policies:Write-back

and

write-through

for

hitsWrite

allocate

and

no-write

allocate

for

misses7/05/2012 Summer

2012--

Lecture

#11

40提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接

Cache直接

Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例航空航天大學(xué)計算機學(xué)院202014/11/13Great

Idea

#3:

Principle

of

Locality/Memory

HierarchyCache

PerformanceTwo

things

hurt

the

performance

of

acache:–

Miss

rate

and

miss

penaltyAverage

Memory

Access

Time

(AMAT):

average

timeto

access

memory

considering

both

hits

and

missesAMAT

=

Hit

time

+

Miss

rate

×

Miss

penalty(abbreviated

AMAT

=

HT

+

MR

×

MP)217/09/2012Summer

2012--

Lecture

#12432014/11/137/09/2012Summer

2012--

Lecture

#1244AMAT

Example

UsageProcessor

specs:

200ps

clock,

MP

of

50

clockcycles,

MR

of

0.02

misses/instruction,

and

HTof

1

clock

cycleAMAT

=

1

+

0.02

×

50

=

2

clock

cycles

=

400

psWhich

improvement

would

be

best?190

ps

clock

380

psMP

of

40

clock

cycles

360

psMR

of

0.015

misses/instruction

350

psCache

Parameter

ExampleWhat

is

the

potential

impact

of

much

largercache

on

AMAT?

(same

block

size)Increase

HRLonger

HT:

smaller

is

faster–

At

some

point,increase

in

hit

time

for

a

largercache

may

e

the

improvement

in

hitrate,yielding

a

decrease

in

performanceEffect

on

TIO?

Bits

in

cache?

Cost?227/09/2012Summer

2012--

Lecture

#12452014/11/13Effect

of

Cache

Performance

on

CPIRecall:

CPU

PerformanceCPU

Time

=

Instructions

×

CPI

×

Clock

Cycle

Time(IC)

(CC)Include

memory

accesses

in

CPI:CPIstall

=

CPIbase

+

Average

Memory-stallCyclesCPU

Time=

IC

×

CPIstall

×

CCSimplified

model

for

memory-stall

cycles:Memory-stall

cycles–

We

will

discuss

more

complicated

models

soon7/09/2012 Summer

2012--

Lecture

#1246CPI

ExampleProcessor

specs:

CPIbase

of

1,

a

100

cycle

MP,36%

load/store

instructions,

and

2%

I$

and

4%D$

MRsHow

many

times

per

instruction

do

we

access

theI$?

The

D$?MP

is

assumed

the

same

for

both

I$

andD$Memory-stall

cycles

will

be

sum

of

stall

cycles

forboth

I$

and

D$237/09/2012Summer

2012--

Lecture

#12472014/11/13CPI

ExampleProcessor

specs:

CPIbase

of

1,

a

100

cycle

MP,36%

load/store

instructions,

and

2%

I$

and

4%D$

MRsMemory-stall

cycles=

(100%×

2%

+

36%

×

4%)

×

100

=

3.44I$

D$CPIstall

=

1

+

3.44

=

4.44

(more

than

3x

CPIbase!)What

if

the

CPIbase

is

reduced

to

1?What

if

the

D$

miss

rate

went

up

by1%?7/09/2012 Summer

2012--

Lecture

#12

487/09/2012Summer

2012--

Lecture

#1250The

3Cs

Revisited:

Design

SolutionsCompulsory:Increase

block

size

(increases

MP;

too

large

blockscould

increase

MR)Capacity:Increase

cache

size

(may

increase

HT):Increase

cache

sizeIncrease

associativity

(mayincrease

HT)242014/11/13航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接

Cache直接

Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例Multiple

Cache

LevelsWith

advancing

technology,have

moreroomondie

for

bigger

L1

cachesand

for

L2

(andinsome

cases

even

L3)

cache–

Normally

lower-level

caches

are

unified(i.e.

holds

both

instructions

and

data)Multilevel

caching

is

a

way

to

reduce

misspenaltySo

what

does

this

look

like?257/09/2012Summer

2012--

Lecture

#12522014/11/13ReturnMultilevel

Cache

Diagram7/09/2012Summer

2012--

Lecture

#1253L2$Main

Memory...L1$CPU

MemoryAccessMissMissHitHitLegend:Request

for

dataReturn

ofdataStoreStorePath

of

data

back

to

CPUMultilevel

Cache

AMATAMAT

=

L1

HT

+

L1

MR

×

L1

MPNow

L1

MP

depends

on

other

cache

levelsL1

MP

=

L2HT

+

L2

MR

×

L2MPIf

more

levels,

then

continue

thischain(i.e.

MPi

=

HTi+1

+

MRi+1

×MPi+1)Final

MP

is

main

memory

access

timeFor

two

levels:AMAT

=

L1

HT

+

L1

MR

×

(L2

HT

+

L2

MR

×

L2

MP)267/09/2012Summer

2012--

Lecture

#12542014/11/13Multilevel

Cache

AMAT

ExampleProcessor

specs:

1

cycle

L1

HT,

2%

L1MR,5

cycle

L2

HT,

5%

L2

MR,

100

cycle

mainmemoryHT–

Here

assuming

unified

L1$Without

L2$:AMAT1

=

1+

0.02

×

100

=

3With

L2$:AMAT2

=

1

+

0.02

×

(5

+

0.05

×

100)

=

1.27/09/2012Summer

2012--

Lecture

#1255Local

vs.

Global

Miss

RatesLocal

miss

rate:

Fraction

of

references

toonelevel

of

a

cache

that

misse.g.

L2$

local

MR

=

L2$

misses/L1$

missesSpecific

to

level

of

caching

(as

used

in

AMAT)Global

miss

rate:

Fraction

of

all

referencesthat

miss

in

all

levels

of

a

multilevel

cacheProperty

of

the

overall

memory

hierarchyGlobal

MR

is

the

product

of

all

local

MRsStart

at

Global

MR

=

Ln

misses/L1

accessesand

expandSo

by

definition,

global

MR

any

local

MR277/09/2012Summer

2012--

Lecture

#12562014/11/13Memory

Hierarchy

withTwo

Cache

Levels57CPUL1$L2$MM1000mem

refs40

mem

refs20

mem

refs1

cycle 10

cycles 100

cyclesFor

every

1000

CPU

to

memory

references–40

will

miss

in

L1$;

what

is

the

local

MR?

0.04–20

will

miss

in

L2$;

what

is

the

local

MR?

0.5–Global

miss

rate?

0.027/09/2012 Summer

2012--

Lecture

#12Design

ConsiderationsL1$

focuses

on

low

hit

time

(fast

access)minimize

HT

to

achieve

shorterclock

cycleL1

MP

significantly

reduced

by

presence

of

L2$,

socan

be

smaller/faster

even

with

higher

MRe.g.

smaller

$

(fewer

rows)L2$,

L3$

focus

on

low

miss

rateAs

much

as

possible

avoid

reaching

to

mainmemory

(heavy

penalty)e.g.

larger

$

with

larger

block

sizes

(same

#

rows)287/09/2012Summer

2012--

Lecture

#12592014/11/13航空航天大學(xué)計算機學(xué)院提綱內(nèi)容主要取材CS61C的11講和12講層次概述直接

Cache直接

Cache舉例Cache讀和寫Cache性能多級Cache組相連Cache改進Cache性能多級Cache性能實戰(zhàn)當(dāng)代Cache舉例Reducing

Cache

MissesAllow

more

flexible

block

placement

in

cache:Direct-mapped:

Memory

block

maps

to

exactlyone

cache

blockFully

associative:

Memory

block

can

go

inanyslotN-way

set-associative:

Divide

$

into

sets,

each

ofwhich

consists

of

n

slots

to

place

memory

blockMemory

block

maps

to

a

set

determined

byIndexfield

andis

placed

in

any

of

the

n

slotsof

that

setHash

function:

(block

address)

modulo

(#

sets

in

thecache)7/09/2012 Summer

2012--

Lecture

#12

61292014/11/13Block

Placement

SchemesPlace

memory

block12

in

a

cache

that

holds

8blocksDirect-mapped:

Can

only

go

in

row

(12

mod8)

=

4Fully

associative:

Can

go

in

any

of

the

slots(1

set/row)2-way

set

associative:

Can

go

ineither

slot

of

set(12

mod

4)

=

07/09/2012 Summer

2012--Lecture

#1262Effect

of

Associativity

on

TIO(1/2)Here

we

assume

a

cache

of

fixed

size(C)Offset:

#

of

bytes

in

a

block

(same

as

before)Index:

Instead

of

pointing

to

a

row,

nowpoints

to

a

set,

so

I

=

C/B/associativity?

Fullyassociative

(1

set):

0

Index

bits!?

Direct-mapped

(associativity

of

1):

max

Indexbits?

Set

associative:

somewhere

in-betweenTag:

Remaining

identifier

bits

(T

=

A

I

O)307/09/2012Summer

2012--

Lecture

#12632014/11/13Summer

2012--

Lecture

#1264Fully

associative(only

one

set)Decreasing

associativityDirect

mapped(only

one

way)7/09/2012Effect

of

Associativity

on

TIO

(2/2)For

a

fixed-size

cache,

each

increase

by

a

factor

oftwo

in

associativity

doubles

the

number

of

blocksper

set

(i.e.

the

number

of

slots)

and

halves

thenumber

of

sets

–decreasing

the

size

of

the

Indexby

1

bit

and

increases

the

size

of

the

Tag

by

1

bitUsed

for

tagcomparison Selects

the

set Selects

the

word

in

theblockTag

Index

Block

offset

Byte

offsetIncreasing

associativity65Set

Associative

Example

(1/2)Cache

parameters:6-bit

addresses,

block

size

of

1

word,cache

size

of

4

words,

2-way

set

associativeHow

many

sets?C/B/associativ

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論