程序代写案例-ECE 6913|学霸联盟

程序代写案例-ECE 6913

时间：2022-05-11

Spring 2022 ECE 6913 INET, Quiz 2 Solutions

Problem 1. Your Company describes their latest Processor with the following features:
• 95% of all memory accesses are found in the cache.
• Each cache block is two words, and the whole block is read on any miss.
• The processor sends references to its cache at the rate of 109 words per second.
• 25% of those references are writes.
• Assume that the memory system can support 109 words per second, reads or writes.
• The bus reads or writes a single word at a time (the memory system cannot read or write two
words at once).
• Assume at any one time, 30% of the blocks in the cache have been modified.
• The cache uses write allocate on a write miss.

You are considering adding a peripheral to the system, and you want to know how much of the
memory system bandwidth is already used.

1.1 Calculate the percentage of memory system bandwidth used assuming the cache is Write Back.
1.2 Calculate the percentage of memory system bandwidth used assuming the cache is Write
Through.

Please be sure to state your assumptions and show all work

We know:
* Miss rate = 0.05
* Block size = 2 words (8 bytes)
* Frequency of memory operations from processor = 109
* Frequency of writes from processor = 0.25 ∗ 109
* Bus can only transfer one word at a time to/from processor/memory
* On average 30% of blocks in the cache have been modified (must be written back in the case of the
write back cache)
* Cache is write allocate
So:
Fraction of read hits = 0.75 ∗ 0.95 = 0.7125
Fraction of read misses = 0.75 ∗ 0.05 = 0.0375
Fraction of write hits = 0.25 ∗ 0.95 = 0.2375
Fraction of write misses = 0.25 ∗ 0.05 = 0.0125
Write through cache
• On a read hit there is no memory access
• On a read miss memory must send two words to the cache
• On a write hit the cache must send a word to memory
• On a write miss memory must send two words to the cache, and then the cache must send a word
to memory
Thus:
Average words transferred = 0.7125 ∗ 0 + 0.0375 ∗ 2 + 0.2375 ∗ 1 + 0.0125 ∗ 3 = 0.35
Average bandwidth used = 0.35 ∗ 109
Fraction of bandwidth used =
[0.35 x 109] / 109
= 0.35 (1)
Write back cache
On a read hit there is no memory access
On a read miss:
1. If replaced line is modified then cache must send two words to memory, and then
memory must send two words to the cache
2. If replaced line is clean then memory must send two words to the cache
On a write hit there is no memory access
On a write miss:
1. If replaced line is modified then cache must send two words to memory, and then memory must send
two words to the cache
2. If replaced line is clean then memory must send two words to the cache
Thus:
Average words transferred = 0.7125 ∗ 0 + 0.0375 ∗ (0.7 ∗ 2 + 0.3 ∗ 4) + 0.2375 ∗ 0 + 0.0125 ∗
(0.7 ∗ 2 + 0.3 ∗ 4) = 0.13
Average bandwidth used = 0.13 ∗ 109
Fraction of bandwidth used =
0.13 x 109/109
= 0.13 (2)
Comparing 1 and 2 we notice that the write through cache uses more than twice the cache-memory
bandwidth of the write back cache.

Problem 2. One difference between a write-through cache and a write-back cache can be in the
time it takes to write. During the first cycle, we detect whether a hit will occur, and during the
second (assuming a hit) we actually write the data.

Let’s assume that 50% of the blocks are dirty for a write-back cache. For this question, assume
that the write buffer for the write through will never stall the CPU (no penalty). Assume a cache
read hit takes 1 clock cycle, the cache miss penalty is 50 clock cycles, and a block write from the
cache to main memory takes 50 clock cycles. Finally, assume the instruction cache miss rate is
0.5% and the data cache miss rate is 1%. Assume that on average 26% and 9% of instructions in
the workload are loads and stores, respectively.

2.1 Estimate the performance of a write-through cache with a two-cycle write versus a write-back
cache with a two-cycle write.
CPU performance equation: CPUTime = IC ∗ CPI ∗ ClockTime
CPI = CPIexecution + StallCyclesPerInstruction
We know:
Instruction miss penalty is 50 cycles
Data read hit takes 1 cycle
Data write hit takes 2 cycles
Data miss penalty is 50 cycles for write through cache
Data miss penalty is 50 cycles or 100 cycles for write back cache
Miss rate is 1% for data cache (MRD) and 0.5% for instruction cache (MRI)
50% of cache blocks are dirty in the write back cache
26% of all instructions are loads
9% of all instructions are stores
Then: CPIexecution = 0.26 ∗ 1 + 0.09 ∗ 2 + 0.65 ∗ 1 = 1.09
Write through
StallCyclesPerInstruction = MRI ∗ 50 + MRD ∗ (0.26 ∗ 50 + 0.09 ∗ 50) = 0.425
so: CPI = 1.09 + 0.425 = 1.515 (1)
Write back
StallCyclesPerInstruction = MRI ∗ 50 + MRD ∗ (0.26 ∗ (0.5 ∗ 50 + 0.5 ∗ 100) +
0.09 ∗ (0.5 ∗ 50 + 0.5 ∗ 100)) = 0.5125
so: CP I = 1.09 + 0.5125 = 1.6025 (2)
Comparing 1 and 2 we notice that the system with the write back cache is 6%
slower.

Problem 3.
Consider the following RISC V Instruction sequence executing in a 5-stage pipeline:
or x13, x12, x11
ld x10, 0(x13)
ld x11, 8(x13)
add x12, x10, x11
subi x13, x12, 16
3.1 Identify all of the data hazards and their resolution with NOPs assuming no forwarding or
hazard detection hardware is being used
Hazards identified:
or x13, x12, x11
ld x10, 0(x13) EX to 1st RAW Hazard
ld x11, 8(x13) EX to 2nd RAW Hazard
add x12, x10, x11 MEM to 1st RAW [load-use-data] & MEM to 2nd Hazards
subi x13, x12, 16 Ex to 1st RAW Hazard
NOPS introduced to resolve Hazards:
or x13, x12, x11
NOPS
NOPS
ld x10, 0(x13) EX to 1st RAW Hazard resolution with 2 NOPs
ld x11, 8(x13) EX to 2nd RAW Hazard resolved as well from above 2 NOPs
NOPS
NOPS
add x12, x10, x11 MEM to 1st RAW [load-use-data] & MEM to 2nd Hazards
resolved with 2 NOPs
NOPS
NOPS
subi x13, x12, 16 Ex to 1st only RAW Hazard resolved with 2 NOPs

3.2 If there is forwarding, for the first seven cycles during the execution of this code, specify
which signals are asserted in each cycle by hazard detection and forwarding units in Figure
below.

Clock
Cycle
1 2 3 4 5 6 7 8 9 10
1 or IF ID EX MEM WB
2 ld IF ID EX MEM WB
3 ld IF ID EX MEM WB
4 NOP mandatory NOP for which no forwarding solution possible: load-data-use
5 add IF ID EX MEM WB
6 subi IF ID EX MEM WB

(1) A=x B=x (no instruction in EX stage yet)
(2) A=x B=x (no instruction in EX stage yet)
(3) A=0 B=0 (both operands of the or instruction: x11,x12 come from Reg File)
(4) A=2 B=0 (base (RS1) in first ld (x13)taken from EX/MEM of previous instruction)
(5) A=1 B=0 (base (RS1) in 2nd ld (x13)taken from MEM/WB of a previous instruction)
(6) A=x B=x (no instruction in EX stage yet because NOP introduced to resolve MEM to 1st
(7) A=0 B=1 (RS2 in the add instruction is x11 which is forwarded from MEM/WB of 2nd
ld, the result of the 1st ld (x10) has already been written into Reg File in CC 6
- so, no forwarding necessary for first operand)
(8) A=1 B=0 (RS1 of subi instruction forwarded from EX/MEM of add instruction)

Problem 4. Consider the following program and cache behaviors.

Data Reads per
1K instructions
Data Writes per
1K instructions
Instruction
Cache Miss Rate
Data Cache Miss
Rate
Block Size
(Bytes)
300 150 0.5% 5% 128

Suppose a CPU with a write-through, write allocate cache achieves a CPI of 2.

4.1 What are the read and write bandwidths (measured by bytes per cycle) between RAM and the
cache? (Assume each miss generates a request for one block.). For a write-allocate policy, a write
miss also makes a read request to RAM – please be sure to consider its impact on Read Bandwidth

Instruction Bandwidth:
When the CPI is 2, there are, on average, 0.5 instruction accesses per cycle.
0.5 instructions read from Instruction memory per cycle
0.5% of these instruction accesses cause a cache Read miss (and subsequent memory request).
[0.5 instr/cycle] x [0.005 misses/instruction] = missed instructions/cycle
Assuming each miss requests one block and each block is 128 bytes [16 words with 8 bytes (64 bits) per
word] , instruction accesses generate an average of
[0.5 instr/cycle] x [0.005 misses/instruction] x[128 bytes/miss] =
= 0.32 bytes/cycle of read traffic
Read Data bandwidth:
30% of instructions generate a read request from data memory.
[0.5 instr/cycle] x [0.3 Read Data Accesses/instruction] = [0.15 Read Data Accesses / cycle]
5% of these generate a cache miss;
[0.15 Read Data Accesses / cycle] x [0.05 misses / Read Data Access] = 0.0075 Read Misses/cycle
Assuming each miss requests one block and each block is 128 bytes [16 words with 8 bytes (64 bits) per
word] ,
[0.0075 Read Misses/cycle] x [128 Bytes/block] x [1 block/miss] = 0.0075 x 128 Bytes/cycle
= 0.96 Bytes/cycle
Write Data bandwidth:
15% of instructions generate a write request into data memory.
[0.5 instr/cycle] x [0.15 Write Data Accesses/instruction] = [0.075 Write Data Accesses / cycle]
All of the words written to the cache must be written into Memory:
[0.075 Write Data Accesses / cycle] x [8 bytes/word] x [1 word/write-through] = 0.6 Bytes/cycle

For a Write-allocate policy, a Write miss also makes a read request to RAM
[0.5 inst/cycle] x [0.15 Write Data Accesses/instruction] x [0.05 misses/Write Data Access] x [128 Bytes/miss]
= 0.48 Bytes/cycle
Assuming each miss requests one Word (8 bytes) since this is a write-through cache with only 1 word written
per miss into memory,
[0.00375 Write Misses/cycle] x [8 Bytes/word] x [1 word/miss] = 0.03 Bytes/cycle

Total Read Bandwidth
0.32 (Instruction memory) + 0.96 (data memory) + 0.48 (Write-miss in Write-through cache with Write
Allocate) Bytes/cyle = 1.76 Bytes/cycle
Total Write Bandwidth:
0.6 Bytes/cycle + 0.03 Bytes/cycle = 0.63 Bytes/cycle