[컴퓨터 구조] Lecture 18: Memory Hierarchy

Notice

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

SYDev

[컴퓨터 구조] Lecture 18: Memory Hierarchy - Part2 본문

3학년 1학기 전공/컴퓨터 구조

[컴퓨터 구조] Lecture 18: Memory Hierarchy - Part2

시데브 2024. 6. 7. 21:44

경희대학교 김정욱 교수님의 컴퓨터 구조 강의 내용을 기반으로 한 정리글

Misses in Direct-Mapped Cache

Example - 0, 8, 0, 6, 8

cache memory의 block size = 4 -> modulo 4 연산으로 cache memory에 main memory의 block 매핑

0 modulo 4 = 0
6 modulo 4 = 2
8 modulo 4 = 0

0, 8, 0, 6, 8
0 -> miss -> cache memory의 index 0 위치에 memory[0] 데이터 저장
8 -> miss -> cache memory의 index 0 위치(switch)에 memory[8] 데이터 저장
0 -> miss -> cache memory의 index 0 위치(switch)에 memory[0] 데이터 저장
6 -> miss -> cache memory의 index 2 위치에 memory[6] 데이터 저장
8 -> miss -> cache memory의 index 0 위치(switch)에 memory[8] 데이터 저장

-> index 1, 3은 계속 empty -> 비효율적

Associative Cache Memory

Fully associative cache

main memory의 block은 cache의 어느 location이든 위치할 수 있다. -> cache에 대한 전수탐색 필요
cache의 모든 위치에 대해 comparator가 병렬적으로 연산을 수행 -> hardware cost가 매우 늘어남

Set associative cache

고정된 숫자의 장소에 main memory block 위치시키는 방법

N-way set-associative cache

N개의 block을 하나의 set으로 묶는 방법
set number는 (Block number) modulo (Number of sets in the cache)로 결정
교체는 least recently used(LRU) block within the set을 기반으로 결정

-> 2-way set associative: 1set 당 2block

-> 4-way set associative: 1set 당 4block

-> 8-way set associative: 1set 당 8block

associativity(자유도)가 커지면

이점: miss rate를 감소시킨다.
단점: hit time을 증가시킨다.

Misses in associative cache - 0, 8, 0, 6, 8

Fully associative cache

순서대로 cache에 저장

0, 8, 0, 6, 8
0 -> miss -> 0번 block에 Memory[0] 데이터 저장
8 -> miss -> 1번 block에 Memory[8] 데이터 저장
0 -> hit
6 -> miss -> 2번 block에 Memory[6] 데이터 저장
8 -> hit

2-way set associative cache

main memory block 개수 3개, 하나의 set 당 2개의 block 매핑 -> 총 set 개수는 2개

0 modulo 2 = 0
6 modulo 2 = 0
8 modulo 2 = 0
-> 모두 0번 set에 위치

0, 8, 0, 6, 8
0 -> miss -> 0번 set에 Memory[0] 데이터 저장
8 -> miss -> 0번 set에 Memory[8] 데이터 저장 -> 0번 set에는 2개의 block 정원이 가득 찬 상태
0 -> hit
6 -> miss -> 0번 set에 Memory[6] 데이터 저장(switch with Memory[8]) -> LRU에 의해 Memory[8]은 삭제
8 -> miss -> 0번 set에 Memory[8] 데이터 저장(switch with Memory[0]) -> LRU에 의해 Memory[0]은 삭제

4-way set associative cache

자유도(associativity)가 증가할수록 miss rate는 낮아진다.

Locating a Block in the Cache

4-way set associate cache - 4개의 comparator와 4-to-1 mux 필요

Choosing Which block to Replace

Fully associative cache

모든 block이 교체 후보

Set-associative cache

선택된 set에 포함된 blocks 중에서선택
Least recently used(LRU) - 가장 오래 사용되지 않은 block이 교체된다.

Direct-mapped cache

요청된 block이 저장될 위치의 Block이 교체된다.

Discussions(Size of Tags vs. Tag associativity)

1. Direct-mapped cache

total number of sets = 4096
cache index = 4096 -> cache index bit = 12
total number of tag bits for caches = (32 - cache index bit - block offset bit - offset bit) * total number of sets = (32 - 12 - 2 - 2) * 4096 = 16 * 4096 = 2^4 * 2^12 = 2^16 = 66K tag bits

2. 2-way set associative

total number of sets = 4096 / 2 = 2^11
total number of tag bits for caches = 개별 tag bit * N(= 2) * total number of sets = (32 - 11 - 2 - 2) * N * total number of sets = 17 * 2 * 2048 = 70K tag bits

3. 4-way set associative

total number of sets = 4096 / 4 = 1024
total number of tag bits for caches = 개별 tag bit * N(set 당 data 개수) * total number of sets = (28 - 10) * 4 * 1024 = 74K tag bits

4. Fully associative

total number of sets = 4096 / 4096 = 1
total number of tag bits for caches = (28 - 0) * 4096 * 1 = 115K tag bits

Multilevel Caches

대부분의 CPU는 추가적 단계의 caching 사용
primary cache -> second level cache 접근 시간 <<< main memory 접근 시간
primary & secondary cache에 모두에 원하는 데이터가 없을 시 -> 더 큰 miss penalty 발생

Performance of Multilevel cache (Example)

Proccesor with a base CPI(Clock per Instruction): 1.0 (primary cache로의 참조가 모두 hit일 경우)
Clock rate:4GHz (0.25 ns/clock cycle)
Miss rate per instruction: 2%
Main memory access time: 100 ns (all miss handling 포함)
A secondary cache access time: 5 ns

secondary cache에서 main memory로의 miss rate = 0.5%
Effective cpi? cpi와 유사하나 좀 더 실제적인 성능 평가가 반영됨

-> secondary cache를 사용하는 cpu가 9.0 / 3.4 = 2.6배 만큼 빠르다.

Virtual Memory

메인 메모리를 Hard disk의 cache로서 사용하는 기법
CPU hardware, OS에 의해서 관리됨
메인 메모리의 용량을 증가시키는 목적
CPU에서 main memory에 모든 정보가 있다고 생각하게 만드는 기술(but, 추가적인 정보는 hard disk에) -> not memory (실제로 존재 X)

Cache memory vs. Virtual Memory

Virtual Memory	Cache Memory
main memory의 용량을 증가	CPU 접근 속도를 향상
technique(not a memory unit)	memory unit
size: Virtual Memory > Cache Memory
main memory의 크기보다 큰 프로그램이 실행됨	cache에는 최근에 사용된 데이터가 복사됨

프로그램들은 main memory를 공유 가능

Virtual address: virtual memory의 주소
Physical address: main memory의 주소
virtual address는 physical address로 매핑해야 한다. -> CPU와 OS에서 수행

Address translation(=Address mapping)

Cache의 block - Virtual memory의 page
Cache의 miss - Virtual memory의 page fault
하나의 physical address는 두 개의 virtual addresses에 의해 공유될 수 있음 -> 2개의 프로그램이 같은 데이터를 공유

Virtual Memory: 32bits(2^32 = 4GB), Physical Memory : 30bits(2^30 = 1 GB)
Page size: 2^12 = 4096 = 4KB
20bits virtual pages -> 18bits physical pages로 매핑

Translation using Page table

Page Table

lookup table - 주어진 연산에 대해 미리 계산된 결과들의 집합(배열)
main memory에 위치함
프로그램마다 각자의 page table이 존재 -> 프로그램의 virtual address space를 메인 메모리로 매핑
Page table register (CPU에 존재) -> physical memory의 page table을 가리킴

page가 메모리에 존재할 시에

page table: physival page number + oter status bits(valid, dirty bits, ..) 저장

page가 메모리에 존재하지 않을 시(Page fault)

OS: page를 가져오고(fetching), page table을 갱신(updating)한다.
page table은 swap space(on disk)의 위치를 인용 가능하다.
swap space: virtual memory를 위해서 disk에서 제공하는 공간

page table은 virtual page number로 인덱싱 -> 이에 상응하는 physical address를 얻어야 함

Page Faults

vliad bit is on(1) -> page table에 virtual page number와 일치하는 physical page number 정보 존재
valid bit is off(0) -> page 정보가 disk에만 존재 -> OS가 제어권을 가져야 한다.
모든 페이지가 사용중 -> OS는 LRU에 따라 page 정보를 교체

Reference bit(use bit)

Reference bit 1: 현재 접근되고 있다는 의미 -> OS에 의해 주기적으로 0으로 초기화됨
Reference bit 0: 최근에 사용되지 않음

Disk에 쓰는 작업은 수백만 사이클이 걸린다.

Write through(바로 쓰기)는 실용적이지 않음 -> Write back 사용
Dirty bit -> 페이지가 쓰여질 때 사용됨
wtire through: 업데이트되자마자 메모리에서 Disk로 변경 사항 적용
write back: dirty bit 적용해놨다가 교체될 때 한 번에

Making Address Translation Fast Using TLB

page table은 main memory에 저장됨

프로그램에 의한 모든 메모리 접근은 최소 2배의 시간이 걸릴 수 있음
physical address를 얻기 위한 접근
실제 data를 얻기 위한 접근

Translation-lookaside buffer(TLB)

Page table의 Cache 버전
page table에 대한 접근을 피하기 위해 -> recently used address mappings를 유지하는 cache
Valid bit: 값의 존재 여부
Dirty bit: 값의 교체 여부
Ref bit: 값의 참조 여부 -> LRU와 연관

Handling TLB Miss and Page Faults

TLB miss - main memory에 page가 존재

메모리에 있는 값으로 TLB, Cache 업데이트

Page Fault - page가 main memory에 존재하지 않을 때 -> os가 page table을 update

Disk에 있는 값으로 Page Table, TLB, Main Memory, Cache 업데이트

virtual address를 이용 -> page table에서 디스크를 참조하는 page의 location을 탐색 - page table에 있긴 하다. valid가 0일뿐
교체할 physical page를 선택 -> main memory가 가득찼을 것이기 때문에 교체해줘야 함, dirty bit인 경우에 값이 수정된 것이기 때문에 disk에 수정된 값을 우선 저장한다.
선택된 physical page로 디스크로부터 참조된 page를 가져온다.
Making instructions restartable -> 가져온 것을 CPU로 전달.

저장용량: TLB < Cache < Page Table(main memory로 생각) < Disk

TLB	Page Table	Cache	Possible
Hit	Hit	Miss	가능: TLB, Page Table이 hit -> memory에는 있다는 뜻, 참조한 지 오래되어 cache에서는 사라졌을 수 있음.
Miss	Hit	Hit	가능: 최근에 부르기는 했으나, TLB에 담길 만큼 최근에 부르지는 않았을 경우. TLB보다 Cache의 크기가 더 크다.
Miss	Hit	Miss	가능: 최근에 불린 적은 없으나, main memory에는 존재
Miss	Miss	Miss	가능: tlb에도 없고, cache에도 없고, page table에도 없으면 -> disk에 있다.(page fault)
Hit	Miss	Miss	불가능: TLB가 hit -> 적어도 cache나 main memory에는 있다는 뜻
Hit	Miss	Hit	불가능: TLB가 hit -> 적어도 cache나 main memory에는 있다는
Miss	Miss	Hit	불가능: TLB가 hit -> 적어도 cache나 main memory에는 있다는

728x90

'3학년 1학기 전공 > 컴퓨터 구조' 카테고리의 다른 글

[컴퓨터 구조] Lecture 19: Memory Hierarchy - Part3 (0)	2024.06.08
[컴퓨터 구조] Lecture 17: Memory Hierarchy - Part1 (7)	2024.06.03
[컴퓨터 구조] Lecture 16: The Processor - Part5 (0)	2024.05.31
[컴퓨터 구조] Lecture 15: The Processor - Part4 (1)	2024.05.30
[컴퓨터 구조] Lecture 14: The Processor - Part3 (0)	2024.05.20

'3학년 1학기 전공/컴퓨터 구조' Related Articles

SYDev

[컴퓨터 구조] Lecture 18: Memory Hierarchy - Part2 본문

[컴퓨터 구조] Lecture 18: Memory Hierarchy - Part2

Misses in Direct-Mapped Cache

Example - 0, 8, 0, 6, 8

Associative Cache Memory

Fully associative cache

Set associative cache

N-way set-associative cache

associativity(자유도)가 커지면

Misses in associative cache - 0, 8, 0, 6, 8

Fully associative cache

2-way set associative cache

4-way set associative cache

자유도(associativity)가 증가할수록 miss rate는 낮아진다.

Locating a Block in the Cache

Choosing Which block to Replace

Fully associative cache

Set-associative cache

Direct-mapped cache

Discussions(Size of Tags vs. Tag associativity)

1. Direct-mapped cache

2. 2-way set associative

3. 4-way set associative

4. Fully associative

Multilevel Caches

Performance of Multilevel cache (Example)

Virtual Memory

Cache memory vs. Virtual Memory

프로그램들은 main memory를 공유 가능

Address translation(=Address mapping)

Translation using Page table

Page Table

page가 메모리에 존재할 시에

page가 메모리에 존재하지 않을 시(Page fault)

page table은 virtual page number로 인덱싱 -> 이에 상응하는 physical address를 얻어야 함

Page Faults

Reference bit(use bit)

Disk에 쓰는 작업은 수백만 사이클이 걸린다.

Making Address Translation Fast Using TLB

page table은 main memory에 저장됨

Translation-lookaside buffer(TLB)

Handling TLB Miss and Page Faults

TLB miss - main memory에 page가 존재

Page Fault - page가 main memory에 존재하지 않을 때 -> os가 page table을 update

저장용량: TLB < Cache < Page Table(main memory로 생각) < Disk

'3학년 1학기 전공 > 컴퓨터 구조' 카테고리의 다른 글

티스토리툴바